The controversy arose in Hungary, where the publishing house Like Company—specializing in SEO content and regional news about Lake Balaton—sued Google for alleged copyright infringement.
The dispute arose from an article published on its website balatonkornyeke.hu, which detailed details about the Hungarian singer Kozso and his unusual proposal to introduce dolphins to the lake. This content was subsequently summarized by the chatbot Gemini, at the request of a user who requested a Hungarian summary of the article.
The publisher considered the chatbot’s response to be an unauthorized act of reproduction, as well as a public disclosure of its protected content. Furthermore, they claimed that their work had been used to train the AI model, which would constitute a further infringement.
The keys to the legal conflict
The lawsuit raises multiple legal questions that must now be resolved by the CJEU through a preliminary ruling, in order to harmonize the interpretation of European copyright law. The questions raised include:
1. Does a chatbot’s output constitute public communication?
Can generating a summary similar to the original text be considered an act of public communication, even if the content is created by a predictive model?
2. Does LLM model training infringe copyright?
Is analyzing and learning from protected texts to train an AI model equivalent to unauthorized reproduction?
3. Can the text mining exception be applied?
If reproduction is deemed to have occurred, could it be covered under the exception provided for text and data mining (TDM) if the sources were lawfully accessible?
4. Is the chatbot output a new reproduction?
When a model generates a text that replicates part of a protected work, is this considered a new reproduction in the legal sense of the term?
The role of data mining and the RAG model
This case also brings into focus the debate the use of technologies such as Retrieval Augmented Generation (RAG), which combines the active search for information in real time (for example, on Google Search) with the generation of content from it.
Unlike traditional training, which occurs prior and in isolation, RAG allows the model to use external sources when generating a response. In the case of the Hungarian article, the chatbot’s summary might not come from the training process, but from a real-time query of the original page.
This nuance complicates the attribution of responsibility: is the chatbot a mere intermediary or does it act as an active agent that reproduces content?
What’s at stake for intellectual property
Beyond the resolution of this specific case, the CJEU faces fundamental questions that will shape the future of intellectual property in the era of AI:
Should AI be considered a reproduction tool?
Can a language model be treated as a database?
What level of originality is required for AI-generated content?
What limits should be placed on data mining?
The answers to these questions will directly impact publishers, technology platforms, legal professionals, and, of course, the creation of digital content.
Conclusion
The case of Hungary versus Google is not just a clash between a publishing house and a tech giant. It clearly demonstrates the regulatory and interpretive gap that still persists in Europe regarding Artificial Intelligence and intellectual property.
