QVC Sues Shopping App for Web Scraping That Allegedly Triggered Site Outage

  • My main focus for the entirety of my career has been on internet facing consumer web applications. I have seen many, many, DOS attacks from IRC bots to Ukrainian web scrapers to Chinese get-lucky wordpress exploit scanners. Most of these can be ignored and blocked with little effort.

    By FAR the most annoying of any of these is when Google, Bing and/or Yahoo decide to wake up and crawl your infrastructure with little regard to your robots.txt or webmaster settings, if available. I think they have got better in recent years, but they used to be the absolute worst. It came down to: Let us DOS you, or have your ranking suffer. Suing Google, Bing, Yahoo isn't exactly an option.

    Some context: I was the lead architect/engineer combo for a CMS that hosted ~500k domains for a fairly large international company. Some days I could login and see them crawling every domain from A-Z. Some days I would get caught by Google and Bing at the same time. They were the largest consumers of data on this system.

  • Result.ly are really a bunch of jerks. One of the most common sense things you can possibly do while crawling a website is monitor the response time and/or error rates from the sites you are crawling. If those are going up, your crawl rate should go down or go to 0.

    There is one form of internet justice, which is QVC should file abuse complaints to the ISPs that host those IPs. I've found abuse complaints are the best way to stop people from using IPs for bad activities (excessive scraping, spamming, etc).

  • > Of these and other causes of action typically alleged in these situations, the breach of contract claim is often the clearest source of a remedy.

    That's a strange claim given that we're talking about a "contract" which QVC has no proof that the other party read or agreed to, and which there has been no explicit exchange ("offer" and "acceptance").

    Are web-site contracts/terms even enforceable at all? According to this article[0]/case law likely not. Strange thing for a lawyer to say, but this article makes a lot of strange claims that seem inconsistent with US case law.

    [0] http://www.forbes.com/sites/oliverherzfeld/2013/01/22/are-we...

  • Having been on both sides of the coin, once you hit 600 reqs/s without a prior arrangement, that almost qualifies as a DoS attack. If they'd maintained 200-300 req/min would have been pretty acceptable.

  • Honestly, you really shouldn't have to hit "36,000 requests per minute" scraping a website for price updates. Can someone explain if there is any scenario in which this is reasonable? Do QVC's prices change that often?

  • I have mixed feelings about this. On the one hand, the bot seems to have been a really bad netizen. On the other hand I hate the idea of there being a precedence that you can be sued for automating get requests.

  • Agree with the suit but QVC (by this time) should have rate limiting / throttling per IP.

    (waits for somebody to claim that each request came from a different proxy)