Host authentication IP Allow/Deny - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Scrapy 0.24 Documentation

allowed_domains = ['mininova.org'] start_urls = ['http://www.mininova.org/today'] rules = [Rule(LinkExtractor(allow=['/tor/\d+']), 'parse_torrent')] def parse_torrent(self, response): torrent = TorrentItem() torrent['url'] built-in middlewares and extensions for: – cookies and session handling – HTTP compression – HTTP authentication – HTTP cache – user-agent spoofing – robots.txt – crawl depth restriction – and more • Rule(LinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))), # Extract links matching 'item.php' and parse them with the spider's method parse_item Rule(LinkExtractor(allow=('item\.php', ))

0 码力 | 222 页 | 988.92 KB | 1 年前
3
Scrapy 0.24 Documentation

['mininova.org'] start_urls = ['http://www.mininova.org/today'] rules = [Rule(LinkExtractor(allow=['/tor/\d+']), 'parse_torrent')] def parse_torrent(self, response): torrent = TorrentItem() built-in middlewares and extensions for: cookies and session handling HTTP compression HTTP authentication HTTP cache user-agent spoofing robots.txt crawl depth restriction and more Robust encoding support Rule(LinkExtractor(allow=('category\.php', ), deny= ('subsection\.php', ))), # Extract links matching 'item.php' and parse them with the spider's method parse_item Rule(LinkExtractor(allow=('item\

0 码力 | 298 页 | 544.11 KB | 1 年前
3
Scrapy 0.20 Documentation

= [’mininova.org’] start_urls = [’http://www.mininova.org/today’] rules = [Rule(SgmlLinkExtractor(allow=[’/tor/\d+’]), ’parse_torrent’)] def parse_torrent(self, response): sel = Selector(response) torrent built-in middlewares and extensions for: – cookies and session handling – HTTP compression – HTTP authentication – HTTP cache – user-agent spoofing – robots.txt – crawl depth restriction – and more • Rule(SgmlLinkExtractor(allow=(’category\.php’, ), deny=(’subsection\.php’, ))), # Extract links matching ’item.php’ and parse them with the spider’s method parse_item Rule(SgmlLinkExtractor(allow=(’item\.php’,

0 码力 | 197 页 | 917.28 KB | 1 年前
3
Scrapy 1.8 Documentation

setting a download delay between each request, limiting amount of concurrent requests per domain or per IP, and even using an auto-throttling extension that tries to figure out these automatically. Note: This middlewares for handling: – cookies and session handling – HTTP features like compression, authentication, caching – user-agent spoofing – robots.txt – crawl depth restriction – and more • A Telnet recommend that you install scrapy within a so-called “virtual environment” (virtualenv). Virtualenvs allow you to not conflict with already-installed Python system packages (which could break some of your

0 码力 | 335 页 | 1.44 MB | 1 年前
3
Scrapy 1.3 Documentation

setting a download delay between each request, limiting amount of concurrent requests per domain or per IP, and even using an auto-throttling extension that tries to figure out these automatically. Note: This middlewares for handling: – cookies and session handling – HTTP features like compression, authentication, caching – user-agent spoofing – robots.txt – crawl depth restriction – and more • A Telnet recommend that you install scrapy within a so-called “virtual environment” (virtualenv). Virtualenvs allow you to not conflict with already-installed Python system packages (which could break some of your

0 码力 | 272 页 | 1.11 MB | 1 年前
3
Scrapy 0.16 Documentation

= ['mininova.org'] start_urls = ['http://www.mininova.org/today'] rules = [Rule(SgmlLinkExtractor(allow=['/tor/\d+']), 'parse_torrent')] def parse_torrent(self, response): x = HtmlXPathSelector(response) built-in middlewares and extensions for: – cookies and session handling – HTTP compression – HTTP authentication – HTTP cache – user-agent spoofing – robots.txt – crawl depth restriction – and more • Rule(SgmlLinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))), # Extract links matching 'item.php' and parse them with the spider's method parse_item Rule(SgmlLinkExtractor(allow=('item\.php',

0 码力 | 203 页 | 931.99 KB | 1 年前
3
Scrapy 1.6 Documentation

setting a download delay between each request, limiting amount of concurrent requests per domain or per IP, and even using an auto-throttling extension that tries to figure out these automatically. Note: This middlewares for handling: – cookies and session handling – HTTP features like compression, authentication, caching – user-agent spoofing – robots.txt – crawl depth restriction – and more • A Telnet recommend that you install scrapy within a so-called “virtual environment” (virtualenv). Virtualenvs allow you to not conflict with already-installed Python system packages (which could break some of your

0 码力 | 295 页 | 1.18 MB | 1 年前
3
Scrapy 1.5 Documentation

setting a download delay between each request, limiting amount of concurrent requests per domain or per IP, and even using an auto-throttling extension that tries to figure out these automatically. Note: This middlewares for handling: – cookies and session handling – HTTP features like compression, authentication, caching – user-agent spoofing – robots.txt – crawl depth restriction – and more • A Telnet recommend that you install scrapy within a so-called “virtual environment” (virtualenv). Virtualenvs allow you to not conflict with already-installed Python system packages (which could break some of your

0 码力 | 285 页 | 1.17 MB | 1 年前
3
Scrapy 0.14 Documentation

['mininova.org'] start_urls = ['http://www.mininova.org/today'] rules = [Rule(SgmlLinkExtractor(allow=['/tor/\d+']), 'parse_torrent')] def parse_torrent(self, response): x = HtmlXPathSelector(response) built-in middlewares and extensions for: cookies and session handling HTTP compression HTTP authentication HTTP cache user-agent spoofing robots.txt crawl depth restriction and more Robust encoding Rule(SgmlLinkExtractor(allow=('category\.php', ), deny= ('subsection\.php', ))), # Extract links matching 'item.php' and parse them with the spider's method parse_item Rule(SgmlLinkExtractor(allow=('item\

0 码力 | 235 页 | 490.23 KB | 1 年前
3
Scrapy 0.22 Documentation

= [’mininova.org’] start_urls = [’http://www.mininova.org/today’] rules = [Rule(SgmlLinkExtractor(allow=[’/tor/\d+’]), ’parse_torrent’)] def parse_torrent(self, response): sel = Selector(response) torrent built-in middlewares and extensions for: – cookies and session handling – HTTP compression – HTTP authentication – HTTP cache – user-agent spoofing – robots.txt – crawl depth restriction – and more • Rule(SgmlLinkExtractor(allow=(’category\.php’, ), deny=(’subsection\.php’, ))), # Extract links matching ’item.php’ and parse them with the spider’s method parse_item Rule(SgmlLinkExtractor(allow=(’item\.php’,

0 码力 | 199 页 | 926.97 KB | 1 年前
3