Thomson Reuters wins first major AI copyright case in the US

  • https://archive.is/mu49I

  • Here's the full decision, which (like most decisions!) is largely written to be legible to non-lawyers: https://storage.courtlistener.com/recap/gov.uscourts.ded.721...

    The core story seems to be: Westlaw writes and owns headnotes that help lawyers find legal cases about a particular topic. Ross paid people to translate those headnotes into new text, trained an AI on the translations, and used those to make a model that helps lawyers find legal cases about a particular topic. In that specific instance the court says this plan isn't fair use. If it was fair use, one could presumably just pay people to translate headnotes directly and make a Westlaw competitor, since translating headnotes is cheaper than writing new ones. And conversely if it isn't fair use where's the harm (the court notes no copyright violation was necessary for interoperability for example) -- one can still pay people to write fresh headnotes from caselaw and create the same training set.

    The court emphasizes "Because the AI landscape is changing rapidly, I note for readers that only non-generative AI is before me today." But I'm not sure "generative" is that meaningful a distinction here.

    You can definitely see how AI companies will be hustling to distinguish this from "we trained on copyrighted documents, and made a general purpose AI, and then people paid to use our AI to compete with the people who owned the documents." It's not quite the same, the connection is less direct, but it's not totally different.

  • > Thomson Reuters prevailed on two of the four factors, but Bibas described the fourth as the most important, and ruled that Ross “meant to compete with Westlaw by developing a market substitute.”

    Yep. That's what people have been saying all along. If the intent is to substitute the original, then copying is not fair use.

    But the problem is that the current method for training requires this volume of data. So the models are legitimately not viable without massive copyright infringement.

    It'll be interesting to see how a defendant with a larger wallet will fare. But this doesn't look good.

    Though big-picture, it seems to me that the money-ed interests will ensure that even if the current legal landscape doesn't allow LLM's to exist, then they will lobby HARD until it is allowed. This is inevitable now that it's at least partially framed in national security terms.

    But I'd hope that this means there is a chance that if models have to train on all of human content, the weights will be available for free to all humans. If it requires massive copyright infringement on our content, we should all have an ownership stake in the resulting models.

  • This isn't really about "AI". It's about copying summaries. Google was fined for this in France for copying news headlines into their search results, and now has to pay royalties in the EU. Westlaw is a summarizing and indexing service for court case results. It's been publishing that info in book form since 1872.

    Ross was trying to compete with Westlaw, but used Westlaw as an input. West's "Key Numbers" are, after a century and a half, a de-facto standard.[2] So Ross had to match that proprietary indexing system to compete. Their output had to match Westlaw's rather closely. That's the underlying problem. The court ruled that the objective was to directly compete with Westlaw, and using Westlaw's output to do that was intentional copyright infringement.

    This looks like a narrow holding, not one that generally covers feeding content into AI training systems.

    [1] https://apnews.com/article/google-france-news-publishers-cop...

    [2] https://guides.law.stanford.edu/cases/keynumbersystem

  • Great. The stated goal of a lot of these companies seems to be “train the model on the output of humans, then hire us instead of the humans”.

    It’s been interesting that media where watermarking has been feasible (like photography) have seen creators get access to some compensation, while text based creators get nothing.

  • Interesting to note from this 2020 story (when ROSS shut down) that the company was founded in 2014 and went out of business in 2020: https://www.lawnext.com/2020/12/legal-research-company-ross-...

    The fact that it took until 2024 for the case to resolve shows how long the wheels of justice can take to turn!

  • Note this case is explicitly NOT about large language model type AI - Ross' product is just a traditional search engine (information retrieval system), not a neural transformer a la ChatGPT.

    About judge Bibas: https://en.wikipedia.org/wiki/Stephanos_Bibas

  • The fair use aspect of the ruling should send a chill down the spines of all generative AI vendors. It's just one ruling but it's still bad.

  • At the heart of this is a very greedy racket:- court reporters who 'own' the copyright to every word spoken by anyone in court that they transcribe to a transcript that they do not own the source to (judges/witnesses/lawyers/defendants in truth own it) They then milk huge fees for these transcripts and limit use/access/derivative works with huge fees. An AI verbatim transcriber would up end them, so that will be prevented, as will anything that shakes the tree.

  • I spontaneously feel like this is bad news for open AI, while playing in the hands of corporate behemoths able to strike expensive deals with major publishers and top it off with the public domain.

    I’m not sure this signals the end of AI and a victory for the human, but rather who gets to train the models?

  • Great decision for humans.

    Is this type of risk the reason why OpenAI masquerades as a non-profit?

  • Ross intelligence was creating a product that would directly compete against Thomson Reuters. Pretty clearly not fair use.

  • It would be quite an interesting result if we could have true General AI, but we don't simply because of copyright.

    I'm aware this isn't a concern yet, but imagine if the future played out this way....

    Or worse: Only those with really deep pockets can pay to get AI, and no one else can, simply because they can't afford the copyright fees.

  • Westlaw is to the legal profession what ResearchGate and others are to science research. They profit from information from the commons, and charge as much as the market will bear.

    Only one of the many reasons the legal profession is so expensive.

  • Almost every article I read on fair use talked like I could only use small amounts while not competing with them. AI people focus on a tiny number of precedents that they stretch very far. A reasonable person wouldn’t come up with their interpretation of fair use after looking at how most examples play out in court.

    It shouldn’t surprise the writer that the AI companies’ versions of fair use didn’t hold much weight. They should assume that would be true. Then, be surprised any time a pro-AI ruling goes against common examples in case law. The AI companies are hoping to achieve that by throwing enough money at the legal system.

  • From p. 6:

    "But a headnote can introduce creativity by distilling, synthesizing, or explaining part of an opinion, and thus be copyrightable."

    Does this set a precedent, whereby AI-generated summaries are copyrightable by the LLM owners?

  • Ross Intelligence was more a search interface with natural language and, probably, vector based similarity. So I suspect they were hosting and using the corpus in production, not just training a model on it.

  • How does this affect LLM systems that already have their corpus integrated?

  • Thomson Reuters chose to sue Ross Intelligence, not a company like Google or even OpenAI. I wonder how deeper pockets would have affected the outcome.

    I wonder how the politics played out. The big AI companies could have funded Ross Intelligence, who could have threatened to sabotage their legal strategies by tanking and settling their own case in TR's favor.

  • Does anyone think Deepseek or other non-western AIs will respect copyright?

    This is going to make Deepseek and its kin much more valuable.

  • If those 4 aspects are used to judge whether "fair use", I'd say that's the nail in the coffin, because of course it isn't fair use and that's totally fair. Here I was thinking "transformative" was somehow a sticking point in all this.

  • If copyright forces a diversity of AIs. That would be good.

    Every AI company using its own created training, resulting in AIs that are similar but not identical, is in my opinion much better than one or very few AIs.

  • Establishing precedent by defeating an already dead company in court is neither impressive nor likely to hold up for other companies.

  • See. The fair-use excuses that the AI proponents here were trying to hang on to for dear life have fallen flat on this ruling.

    This is going to be one of many cases in which there will be licensing deals being made out of this to stop AI grifters claiming 'fair use' to try to side-step copyright laws because they are using a gen AI system.

    OpenAI ended up paying up for the data with Shutterstock and other news sources. This will be no different.

  • I can't understand how some commenters frame such a result as not good. The big players will have no problem licensing large corpora to train their models, while my tiny site won't be vacuumed (legally at least) by scrapers if I won't agree.

    My willingness to upload my projects anywhere is in the historical lows given the current state, honestly.

  • Fantastic news!

  • seems like delaware can't scare tech companies out of re-incorporating any faster

  • [dead]

  • [dead]

  • Thanks. The article wasn't loading for me, just the headline and image and footer. I was about to leave thinking that's all there is.

  • [dead]