Scrapy 0.24 Documentation
allowed_domains = ['mininova.org'] start_urls = ['http://www.mininova.org/today'] rules = [Rule(LinkExtractor(allow=['/tor/\d+']), 'parse_torrent')] def parse_torrent(self, response): torrent = TorrentItem() torrent['url'] built-in middlewares and extensions for: – cookies and session handling – HTTP compression – HTTP authentication – HTTP cache – user-agent spoofing – robots.txt – crawl depth restriction – and more • Rule(LinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))), # Extract links matching 'item.php' and parse them with the spider's method parse_item Rule(LinkExtractor(allow=('item\.php', ))0 码力 | 222 页 | 988.92 KB | 1 年前3Scrapy 0.24 Documentation
['mininova.org'] start_urls = ['http://www.mininova.org/today'] rules = [Rule(LinkExtractor(allow=['/tor/\d+']), 'parse_torrent')] def parse_torrent(self, response): torrent = TorrentItem() built-in middlewares and extensions for: cookies and session handling HTTP compression HTTP authentication HTTP cache user-agent spoofing robots.txt crawl depth restriction and more Robust encoding support Rule(LinkExtractor(allow=('category\.php', ), deny= ('subsection\.php', ))), # Extract links matching 'item.php' and parse them with the spider's method parse_item Rule(LinkExtractor(allow=('item\0 码力 | 298 页 | 544.11 KB | 1 年前3Scrapy 0.20 Documentation
= [’mininova.org’] start_urls = [’http://www.mininova.org/today’] rules = [Rule(SgmlLinkExtractor(allow=[’/tor/\d+’]), ’parse_torrent’)] def parse_torrent(self, response): sel = Selector(response) torrent built-in middlewares and extensions for: – cookies and session handling – HTTP compression – HTTP authentication – HTTP cache – user-agent spoofing – robots.txt – crawl depth restriction – and more • Rule(SgmlLinkExtractor(allow=(’category\.php’, ), deny=(’subsection\.php’, ))), # Extract links matching ’item.php’ and parse them with the spider’s method parse_item Rule(SgmlLinkExtractor(allow=(’item\.php’,0 码力 | 197 页 | 917.28 KB | 1 年前3Scrapy 1.8 Documentation
setting a download delay between each request, limiting amount of concurrent requests per domain or per IP, and even using an auto-throttling extension that tries to figure out these automatically. Note: This middlewares for handling: – cookies and session handling – HTTP features like compression, authentication, caching – user-agent spoofing – robots.txt – crawl depth restriction – and more • A Telnet recommend that you install scrapy within a so-called “virtual environment” (virtualenv). Virtualenvs allow you to not conflict with already-installed Python system packages (which could break some of your0 码力 | 335 页 | 1.44 MB | 1 年前3Scrapy 1.3 Documentation
setting a download delay between each request, limiting amount of concurrent requests per domain or per IP, and even using an auto-throttling extension that tries to figure out these automatically. Note: This middlewares for handling: – cookies and session handling – HTTP features like compression, authentication, caching – user-agent spoofing – robots.txt – crawl depth restriction – and more • A Telnet recommend that you install scrapy within a so-called “virtual environment” (virtualenv). Virtualenvs allow you to not conflict with already-installed Python system packages (which could break some of your0 码力 | 272 页 | 1.11 MB | 1 年前3Scrapy 0.16 Documentation
= ['mininova.org'] start_urls = ['http://www.mininova.org/today'] rules = [Rule(SgmlLinkExtractor(allow=['/tor/\d+']), 'parse_torrent')] def parse_torrent(self, response): x = HtmlXPathSelector(response) built-in middlewares and extensions for: – cookies and session handling – HTTP compression – HTTP authentication – HTTP cache – user-agent spoofing – robots.txt – crawl depth restriction – and more • Rule(SgmlLinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))), # Extract links matching 'item.php' and parse them with the spider's method parse_item Rule(SgmlLinkExtractor(allow=('item\.php',0 码力 | 203 页 | 931.99 KB | 1 年前3Scrapy 1.6 Documentation
setting a download delay between each request, limiting amount of concurrent requests per domain or per IP, and even using an auto-throttling extension that tries to figure out these automatically. Note: This middlewares for handling: – cookies and session handling – HTTP features like compression, authentication, caching – user-agent spoofing – robots.txt – crawl depth restriction – and more • A Telnet recommend that you install scrapy within a so-called “virtual environment” (virtualenv). Virtualenvs allow you to not conflict with already-installed Python system packages (which could break some of your0 码力 | 295 页 | 1.18 MB | 1 年前3Scrapy 1.5 Documentation
setting a download delay between each request, limiting amount of concurrent requests per domain or per IP, and even using an auto-throttling extension that tries to figure out these automatically. Note: This middlewares for handling: – cookies and session handling – HTTP features like compression, authentication, caching – user-agent spoofing – robots.txt – crawl depth restriction – and more • A Telnet recommend that you install scrapy within a so-called “virtual environment” (virtualenv). Virtualenvs allow you to not conflict with already-installed Python system packages (which could break some of your0 码力 | 285 页 | 1.17 MB | 1 年前3Scrapy 0.14 Documentation
['mininova.org'] start_urls = ['http://www.mininova.org/today'] rules = [Rule(SgmlLinkExtractor(allow=['/tor/\d+']), 'parse_torrent')] def parse_torrent(self, response): x = HtmlXPathSelector(response) built-in middlewares and extensions for: cookies and session handling HTTP compression HTTP authentication HTTP cache user-agent spoofing robots.txt crawl depth restriction and more Robust encoding Rule(SgmlLinkExtractor(allow=('category\.php', ), deny= ('subsection\.php', ))), # Extract links matching 'item.php' and parse them with the spider's method parse_item Rule(SgmlLinkExtractor(allow=('item\0 码力 | 235 页 | 490.23 KB | 1 年前3Scrapy 0.22 Documentation
= [’mininova.org’] start_urls = [’http://www.mininova.org/today’] rules = [Rule(SgmlLinkExtractor(allow=[’/tor/\d+’]), ’parse_torrent’)] def parse_torrent(self, response): sel = Selector(response) torrent built-in middlewares and extensions for: – cookies and session handling – HTTP compression – HTTP authentication – HTTP cache – user-agent spoofing – robots.txt – crawl depth restriction – and more • Rule(SgmlLinkExtractor(allow=(’category\.php’, ), deny=(’subsection\.php’, ))), # Extract links matching ’item.php’ and parse them with the spider’s method parse_item Rule(SgmlLinkExtractor(allow=(’item\.php’,0 码力 | 199 页 | 926.97 KB | 1 年前3
共 530 条
- 1
- 2
- 3
- 4
- 5
- 6
- 53