Mechanical Turkers paid $1 are better at predicting recidivism than secret, private-sector algorithms

Clive Thompson

4:30 am Thu, Jan 18, 2018

It looks like everyday folks — recruited on Mechanical Turk, no less — are better at predicting recidivism than the elite, secretive private-sector algorithms used by courts.

The software, COMPAS, has passed judgment on 1 million offenders since it was introduced in 1998. It uses 137 bits of data about offenders to predict their chance of recidivism, but its accuracy is dodgy: A Pro Publica investigation of its output suggests it isn’t much better than coin toss, and also that it’s racist, being nearly twice as likely to suggest blacks will reoffend than whites with comparable criminal records.

It would be nice to examine the algorithm to see how it’s weighing the variables, but no dice: Northpointe, the firm that runs COMPAS, says it’s a proprietary secret.

But now there’s yet more evidence that COMPAS’s algorithmic precision isn’t great. Two Dartmouth college researchers did a shoot-out. They took 1,000 defendants and examined COMPAS’ predictions for recidivism. Then they paid 400 people on Mechanical Turk to look at the defendants’ files and make their own prediction. The researchers gave the turks only seven pieces of data about each defendant, none of which was race. The researchers also had the real-life data on whether the defendants really had, in fact, reoffended — so they could see how well COMPAS stacked up against random folks on Amazon Turk.

The result?

The randos did better than the software. The turks’ predictions were right 67% of the time; the software, 65%. As Wired reports:

“There was essentially no difference between people responding to an online survey for a buck and this commercial software being used in the courts,” says Farid, who teaches computer science at Dartmouth. “If this software is only as accurate as untrained people responding to an online survey, I think the courts should consider that when trying to decide how much weight to put on them in making decisions.” [snip]
“Julia and I are sitting there thinking: How can this be?” Farid says. “How can it be that this software that is commercially available and being used broadly across the country has the same accuracy as mechanical turk users?”

The full paper, detailing the research, is here.

(CC-licensed image via Sam Howzit)