Statements (158)
Predicate | Object |
---|---|
gptkbp:instance_of |
gptkb:Inspector
|
gptkbp:can_be_extended_by |
custom middlewares
custom pipelines custom spiders |
gptkbp:developed_by |
gptkb:Scrapinghub
Scrapy contributors Scrapy maintainers |
gptkbp:first_released |
gptkb:2010
|
gptkbp:has |
gptkb:Documentation
tutorials community support web-based interface command line interface active community support for multiple data formats built-in support for exporting data scrapy shell spider management support for distributed crawling |
gptkbp:has_documentation |
https://docs.scrapy.org/en/latest/
|
gptkbp:has_feature |
Command line interface
asynchronous processing Asynchronous processing Middleware support Extensible architecture middleware support support for testing logging support support for extensions Support for caching Integration with other libraries support for command line interface Support for logging Support for testing caching support support for multiple data formats support for cookies support for webhooks item pipelines scrapy shell spider management Support for scraping web pages with dynamic loading Built-in support for handling requests Item pipelines Support for cookies and sessions Support for distributed scraping Support for proxies Support for scraping Atom feeds Support for scraping CSV Support for scraping Excel files Support for scraping HTML Support for scraping JSON Support for scraping Java Script-heavy websites Support for scraping RSS feeds Support for scraping XML Support for scraping dynamic content Support for scraping forms Support for scraping images Support for scraping multiple pages Support for scraping static content Support for scraping text files Support for scraping web pages with AJAX Support for scraping web pages with authentication Support for scraping web pages with captchas Support for scraping web pages with iframes Support for scraping web pages with rate limiting Support for scraping web pages with redirects Support for signals Support for throttling requests Support for user agents Support for web crawling built-in support for handling requests support for JSON and XML output support for cloud-based scraping support for distributed scraping support for exporting to CSV and JSON support for proxies support for scraping Java Script-heavy websites support for scraping dynamic content support for scraping with Playwright support for scraping with Puppeteer support for scraping with Selenium support for scraping with Splash support for scraping with headless browsers support for scraping with scraping applications support for scraping with scraping approaches support for scraping with scraping frameworks support for scraping with scraping libraries support for scraping with scraping methodologies support for scraping with scraping platforms support for scraping with scraping practices support for scraping with scraping services support for scraping with scraping solutions support for scraping with scraping strategies support for scraping with scraping techniques support for scraping with scraping technologies support for scraping with scraping tools support for scraping with web scraping services support for scraping with web services support for scraping with webhooks support for sessions support for signals support for user agents Support for scraping web pages with session management Built-in support for exporting data in various formats Selectors based on XPath and CSS Support for scraping APIs Support for scraping PDFs support for SQLite and Mongo DB support for XPath and CSS selectors support for scraping APIs support for scraping with APIs support for scraping with Graph QL APIs support for scraping with RESTful APIs support for scraping with SOAP APIs |
https://www.w3.org/2000/01/rdf-schema#label |
Scrapy
|
gptkbp:is_available_on |
gptkb:Py_PI
gptkb:Git_Hub |
gptkbp:is_compatible_with |
gptkb:Flask
gptkb:Django gptkb:Pandas |
gptkbp:is_integrated_with |
gptkb:Scrapy_Cloud
APIs databases |
gptkbp:is_known_for |
flexibility
high performance scalability ease of use |
gptkbp:is_part_of |
data science toolkit
|
gptkbp:is_supported_by |
gptkb:Scrapinghub
|
gptkbp:is_used_by |
gptkb:developers
gptkb:researchers data scientists |
gptkbp:is_used_in |
gptkb:market_research
gptkb:academic_research e-commerce price comparison content aggregation data journalism social media analysis news aggregation SEO analysis job scraping real estate data collection |
gptkbp:latest_version |
2.5.0
|
gptkbp:license |
gptkb:BSD_License
|
gptkbp:provides |
asynchronous processing
middleware support item pipelines |
gptkbp:repository |
https://github.com/scrapy/scrapy
|
gptkbp:supports |
gptkb:Python_3
Python 3.6+ |
gptkbp:used_for |
data mining
data extraction web crawling |
gptkbp:written_in |
gptkb:Python
|
gptkbp:bfsParent |
gptkb:Python
|
gptkbp:bfsLayer |
4
|