Data-mining reveals that 80% of books published 1924-63 never had their copyrights renewed and are now in the public domain

Cory Doctorow

10:22 am Thu, Aug 1, 2019

This January, we celebrated the Grand Re-Opening of the Public Domain, as the onerous terms of the hateful Sonny Bono Copyright Act finally developed a leak, putting all works produced in 1923 into the public domain, with more to follow every year — 1924 goes PD in 2020, and then 1925, etc.

But there’s another source of public domain works: until the 1976 Copyright Act, US works were not copyrighted unless they were registered, and then they quickly became public domain unless that registration was renewed. The problem has been to figure out which of these works were in the public domain, because the US Copyright Office’s records were not organized in a way that made it possible to easily cross-check a work with its registration and renewal.

For many years, the Internet Archive has hosted an archive of registration records, which were partially machine-readable.

Enter the New York Public Library, which employed a group of people to encode all these records in XML, making them amenable to automated data-mining.

Now, Leonard Richardson (previously) has done the magic data-mining work to affirmatively determine which of the 1924-63 books are in the public domain, which turns out to be 80% of those books; what’s more, many of these books have already been scanned by the Hathi Trust (which uses a limitation in copyright to scan university library holdings for use by educational institutions, regardless of copyright status).

“Fun facts” are, sadly, often less than fun. But here’s a genuinely fun fact: most books published in the US before 1964 are in the public domain! Back then, you had to send in a form to get a second 28-year copyright term, and most people didn’t bother.

Secretly Public Domain [Leonard Richardson/Crummy]

(via Four Short Links)

(Image: Deborah Fitchett, CC-BY)