Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Developing a Crawler API with Scrapy and Klein ...

Betina Costa
February 09, 2020

Developing a Crawler API with Scrapy and Klein - PyCon Colombia 2020

Today we will develop an API to search phrases by tags on the site http://quotes.toscrape.com/ . Our API should receive a tag as parameter, scrap the page and return a json containing a list with quotes and authors that belonging to that tag.

Betina Costa

February 09, 2020
Tweet

More Decks by Betina Costa

Other Decks in Technology

Transcript

  1. PYCON COLOMBIA Tutorial Goal Today we will develop an API

    to search phrases by tags on the site http://quotes.toscrape.com/ . Our API should receive a tag as parameter, scrap the page and return a json containing a list with quotes and authors that belonging to that tag.
  2. PYCON COLOMBIA Workshop Summary Points to Cover Introduction and Setup

    Scrapy Spiders and Selectors Building the Spider Exercise Handle Scrapy async behaviour with Klein Building the API exercise Wrapping up and Questions
  3. What is Scrapy? IS A FREE OPEN SOURCE WEB- CRAWLING

    FRAMEWORK WRITTEN IN PYTHON iIt is currently maintained by Scrapinghub, a web-scraping development and services company. PYCON COLOMBIA
  4. Why Scrapy? It's open source and free to use; It's

    easy to build and scale; It has a tool called Selector for data extraction; Handles calls asynchronously and quickly; PYCON COLOMBIA
  5. Why Scrapy? It's open source and free to use; It's

    easy to build and scale; It has a tool called Selector for data extraction; Handles calls asynchronously and quickly; PYCON COLOMBIA
  6. Spiders and Selectors SPIDERS Spiders are classes that we define

    and that Scrapy uses to crawl information on websites. Scrapy comes with its own mechanism for extracting data. They’re called selectors because they “select” certain parts of the HTML document specified either by XPath or CSS expressions. SELECTORS
  7. Why Klein? KLEIN IS A MICRO-FRAMEWORK FOR DEVELOPING PRODUCTION-READY WEB

    SERVICES WITH PYTHON. It’s built on widely used and well tested components like Werkzeug and Twisted, and has near-complete test coverage. PYCON COLOMBIA
  8. Why Klein? KLEIN IS A MICRO-FRAMEWORK FOR DEVELOPING PRODUCTION-READY WEB

    SERVICES WITH PYTHON. It’s built on widely used and well tested components like Werkzeug and Twisted, and has near-complete test coverage. PYCON COLOMBIA TWISTED Twisted is an event-driven networking engine written in Python
  9. Why Klein? REMEBER THAT SCRAPY HANDLES CALLS ASYNCHRONOUSLY? So, for

    that reason it doesn't usually talks very well with frameworks that are usually used to making requests synchronously. But Klein can helps with that! PYCON COLOMBIA