Navigating the Legal Landscape: Technical Implementation of Copyright Reservations for Text and Data Mining in the Era of AI Language Models

Authors

  • Lisa Löbling
  • Christian Handschigl
  • Kai Hofmann
  • Jan Schwedhelm

Keywords:

Copyright Law, Text and Data Mining (TDM), Artificial Intelligence (AI), Data Indexing and Crawling Restrictions, Machine-Readable Standard

Abstract

The profound advancements in AI-driven language models, exemplified by ChatGPT, owe their existence to vast quantities of text and data utilized in their training. However, the origins of this data and its suitability for training AI models raise considerations in the domain of Text and Data Mining (TDM) and its associated copyright requirements.

European and German regulation provide an opt-out system for TDM: Freely available works may be used for TDM if they have not been reserved by the rightsholder. A reservation of use is effective only if it is made in a machine-readable format. On the one hand, state-of-the-art language models use large amounts of text data from different domains. On the other hand, no (de facto) standard for reservations of use has yet been established. In this paper, we will therefore: 

• discuss the legal requirements,

• give an insight into how usage reservations are dealt with in practice and

• suggest a possible standard.

Downloads

Published

2024-02-29