Amazon2csv: Amazon products scraper to CSV (no API token required)

  • Why not use the API? Disclaimer: I'm the author of python-amazon-simple-product-api [1]

    [1] https://github.com/yoavaviram/python-amazon-simple-product-a...

  • Scraping Amazon is fun and all, but when you start overdoing it they rate-limit your IP and show you my worst nightmare: the Dogs of Amazon (a 500 page with pictures)

    Why do I know this? Because I'm the CTO at Nazdeeq.com where we let users buy Amazon products from countries where they don't ship easily, like Pakistan.

    Edit: totally open to partnerships in more countries

  • The issue with those tools is that Amazon changes the product layout very often and heavily conducts A/B tests. I’ve once even heard that computer vision is the most stable way to scrape Amazon. I guess this library will stop working rather soon.

  • I remember trying to build a scraper for Amazon. I quickly discovered that there are many types of item pages, and they change over time too. A/B testing probably. Just to get the price of the product out of their HTML markup reliably was a nightmare, I had to build a huge tree of if-this-then-maybe-that logic.

  • The company I work for (zinc.io) has this: https://zincapi.com/

    We brand it as an ordering API, but we also offer retrieving the product data (item details/pricing.) We put a LOT of engineering resources into data quality and maintenance, as the API is core to our flagship product, PriceYak. If you have questions or want a token, email adam@zinc.io and mention this post.

  • If you're using this for anything serious, it's probably better to sign up for the keepa API at about $50/month and they scrape Amazon for you. Worth it to not need to deal with the complexities.

  • Nice. From my experience I've found Parsel [1] (used by scrapy) to be an easier to use HTML parsing library than Beautiful Soup. That's just imo.

    [1] https://github.com/scrapy/parsel

  • Hm, another no-API option (at least if you are on WordPress) is: https://wpcommission.com

  • So how many calls is one allowed before getting banned? Any guidelines on how to use this without breaching T&Cs?

  • Am I the only one who thinks this is rather weird, or at least unconventional code for a scraper in Python?

  • It is also illegal to scrape AZ, since if you scrape it , it means you don’t own this content and you are just stilling products data added to the site by produsts proper owners.