Technical Challenges of Rightsholders’ Opt-out From Gen AI Training after Robert Kneschke v. LAION
Keywords:
Artificial Intelligence, Web Scraping, Text and Data Mining, Machine-readable, CopyrightAbstract
This paper explores the evolving legal landscape surrounding generative AI model training on publicly available - often copyrighted - data, spotlighting the challenges in the wake of recent decision of German Court in Robert Kneschke v. LAION. On top of already explored implementation of copyright reservations by machine-to-machine and human-to-machine communication, this paper explores potential gaps and technical challenges stemming from the text and data mining exception including technical issues surrounding Robots.txt as well as data memorisation and regurgitation of verbatim snippets in AI outputs.
The Robert Kneschke v. LAION case exemplifies how non-profit organizations may leverage the TDM exceptions and offers insights that could influence commercial development of Gen AI. While the TDM exceptions may seem workable in theory, implementing them in practice presents a variety of practical challenges. Practical implications, such as requirements for “machine-readable” opt-out options for rightsholders considering current technological landscape, may ultimately reduce the practical benefits of these exceptions. Dataset creation and AI model training in practices occurs via chain of parties from copyright holders, licensors or publishers, non-profit organisations populating datasets to commercial AI developers which may bring additional interpretational issues and gaps when applying exception for research purposes or searching for validly applied opt-out. This paper discusses legal requirements and interpretation introduced by Robert Kneschke v. LAION and presents practical and technical implications stemming from the TDM exceptions and suggests possible outcomes thereof.