Background
GEMA, acting as the collecting society for works including the songs “Atemlos”, “Männer” and “Über den Wolken”, brought an action against OpenAI. The lyrics had been included in the training data for GPT-4 and GPT-4o without any licence. Users could obtain largely complete lyrics by entering straightforward prompts such as “What is the text of [song title]?”. While it was undisputed that the lyrics formed part of the training set, OpenAI argued that the model does not store the texts themselves but merely statistical patterns, so that the outputs should be considered new creations rather than reproductions.
Decision
The court largely upheld GEMA’s claims. OpenAI was ordered to cease and desist from using the works in question in its models, to provide information on the use and scope, and to pay damages in principle. The court assumed at least negligent conduct on the part of OpenAI.
Reasoning: Memorisation, TDM exception & outputs
The judges consider that particularly frequent sequences from the training data can be memorised by the model: they are weighted so strongly that the exact token sequence effectively remains embedded in the model and can be reproduced. This internal, technically retrievable storage is treated as a copyright-relevant reproduction, comparable to fragmented file formats that can be reconstructed using appropriate tools.
The text and data mining (TDM) exception applies only in part:
- Converting works into a machine-readable format and analysing them during training can be covered by the exception.
- However, the permanent memorisation of protected content within the model, remaining available for later reproduction, is no longer justified by TDM. At this stage, the use interferes with the rightsholder’s economic exploitation rights.
The court also finds infringements in the outputs:
Near-verbatim lyrics generated by the model are stored in users’ working memory and chat histories, which constitutes reproductions of the works. OpenAI, as the operator and developer of the model, is considered responsible rather than the individual users. In addition, operating the service amounts to making the works available to the public, since an indefinite number of users can access the memorised content – a “new public”, even if the lyrics were previously available online from lawful sources.
Remedies & outlook
OpenAI is liable for damages (to be quantified separately) and must provide information and cease further infringing uses. The court rejected arguments that an injunction would be disproportionate: OpenAI can retrain models on licensed datasets or design models that do not rely on the contested content. A grace period was denied, as the company had already been warned in November 2024.
To the point
The judgment sends a clear signal:
Using copyrighted works as training data for generative AI without robust licensing and technical safeguards against memorisation creates substantial copyright risk – especially where models can reproduce protected content almost verbatim in response to simple prompts.
Case No.: 42 O 14139/24
Source: Justiz Bayern