Artificial Intelligence and the Law of Machine-Readability: A Review of Human-to-Machine Communication Protocols and their (In)Compatibility with Article 4(3) of the Copyright DSM Directive
Keywords:
AI web-scraping, text and data mining, machine-readability, rights reservation, H2M communicationAbstract
Many legal scholars critique the supposed ineffectiveness of European copyright regulation regarding commercial text and data mining. At the same time, tech-savvy entrepreneurs keep proposing new standards to effectuate them at a rate that has been described as “exponential”. The present paper reconciles these complementary perspectives. In the first (doctrinal) part, it develops a framework for article 4(3) of the Copyright DSM Directive by arguing that: (1) Web-scraping for AI training is a use case of TDM. (2) European TDM regulation seeks to protect fundamental rights and to uphold incentives of both AI developers and rightholders. (3) To ensure balanced protection, the legislator provided for a “reservation of rights” as an exception-exception similar to one found in the Berne Convention. (4) This reservation instrument gets criticized on account of being either unduly effective or largely ineffective – a tie that can only be broken by clarifying the doctrinal hurdles raised by the Directive. (5) The Directive establishes two standards that reservations need to fulfil simultaneously: They have to be explicit (specific for a given content and use) and automatable (employing a well-defined technical protocol). In the second half of the paper, it uses these standards to assess seven communication protocols commonly proposed to reserve TDM rights. It concludes that only some qualify as “machine-readable” in a legal sense at all, and that the proliferation of standards currently precludes any effective reservation of TDM rights. This may, however, come with a silver lining.