Copyright filters are automatically removing copies of the Mueller Report


During the bitter debate over the EU's Copyright Directive, with its mandate for copyright filters that would automatically censor anything that anyone claimed to be infringing, opponents repeatedly warned that these filters would be trivial to abuse.


That's because rightsholder groups would insist that anything they claimed as their copyrights would need to be censored immediately, not after some human had had a chance to review it (even giving it a once-over might delay a blockade of a pre-release leak, to say nothing of the many days it might take a skilled legal practicioner or archivist to assess whether it would be appropriate to censor a piece of media).


This is an invitation to sloppy and malicious overreporting of copyrighted works, resulting in massive, illegitimate censorship. For example, newscasters routinely upload their entire evening broadcast to Youtube's Content ID filter, meaning that any public domain footage or third-party materials (including clips from Youtube videos) are marked as their copyrights — that's how NASA came to be blocked from uploading its own Mars lander footage.


But it gets worse: the laws and threats that prompt tech companies to institute copyright filters are aimed at preventing infringement at any cost. That means that even if you have a repeat offender who routinely claims copyright to things they don't own, you can't stop taking requests from them, because if they ever do have a valid claim, they can sue you for ignoring it.


The world is full of sloppy, brutal copyright bounty hunters that use a variety of tactics to remove their clients' materials, and whose lax standards mean that they often use those tactics to remove materials that their clients have falsely claimed to own. Think of how the Social Element Agency claimed that a tweet complaining about their sloppy copyright takedowns was a copyright infringement, and got Twitter to censor it.

The Social Element Agency firehoses copyright claims all over the internet, and some of them are valid. If, after receiving hundreds or thousands (or millions) of bogus claims from the Social Element Agency, Twitter was to block them from submitting any more claims, then the new EU Copyright Directive would make Twitter liable for any material that infringed the Agency's clients' rights. That means that there is no cost to being a bad actor, to committing rampant copyfraud. Bounty hunters and rightsholders can be as reckless as they like in claiming copyrights and they'll still be able to load their works into the copyright filters, no matter how much copyfraud they've committed in the past. And since vetting your clients' claims costs money, it will always be cheaper to be reckless than to be careful, and the companies that spend the least on checking their copyright claims before submitting them will earn the most money, and grow the fastest.

This dynamic plays out all the time, including this week, when the text-hosting platform Scribd started to mass-delete copies of the Mueller Report that its users had uploaded. The Mueller Report, being a work produced by the US government, is in the public domain, which means that anyone can publish it. There are several publishers making copies for sale already.


One or more of these publishers uploaded their copy of the Report to Scribd's copyright filter, a fully automated system that does not include human review. We don't know why the publisher uploaded something they didn't have the rights to. Maybe they were being malicious and wanted to drive sales of their report; or maybe they just automatically upload everything they publish to every copyright filter they can find, and don't bother to pay anyone to make sure they're not claiming copyright over something they don't own.


Whatever the reason, this immediately triggered mass takedowns of dozens of users' copies of the report. Once Scribd received users complaints and was embarrassed by public disapprobation, it unblocked the text, and that's fine — until the next time it happens.

Scribd is a relatively small platform. What happens when a broadcaster claims copyright over a key Trump gaffe on the eve of an election, and it doesn't get unblocked until the election is over? What happens when a stock art company's claims take down a photo of police brutality at a public demonstration because a bus-ad in the background uses one of its photos? What happens when your kid's first steps can't be shared with your family back home because they happened in a room with a cartoon playing on the TV?


In summary:

1. Scribd admits there’s no way to making an automated system that is 100% correct all the time…

2. …has decided that it’s too popular a service to have humans vet its algorithms decisions, and…

3. …will update its systems after the fact when things go awry.

The Council of the European Union recently approved legislation known as Article 13. It’s a law that requires internet platforms to police content for copyright violations before it goes up, rather than only after it is reported as infringing by a third party as they do now. In practice, it is expected to lead to more scenarios like the one here: public domain and other legal uses of work get blocked or taken down by an unaccountable system of corporate dragnets based on code so poorly written and algorithms so poorly trained that it mistakes the most talked-about and most widely shared public-domain documents for copyrighted works.


Scribd taking down the Mueller Report is the future the EU has voted for [David Yanofsky/Quartz]