Here’s a summary of “Designing Incentives for Inexpert Human Raters,” a paper presented at the ACM Conference on Computer Supported Cooperative Work by John Horton, Daniel Chen and Aaron Shaw. The authors — two economists and a sociologist — designed a bunch of different incentive structures for encouraging high-quality work from Mechanical Turk-style systems and evaluated their relative efficacy. They concluded that social/psychological incentives have no measurable impact on performance, while a few well-constructed economic incentives make a substantial difference.
From these results, we concluded that our horse race had two clear front-runners: the “Bayesian Truth Serum” (BTS) and “Punishment – disagreement” conditions, each of which improved average worker performance by almost half of a correct answer above the 2.08 correct answers in the control group. A few of the other financial and hybrid incentives had fairly large point estimates, but were not significantly different from control once we adjusted the test statistics and corresponding p-values to account for the fact that we were making so many comparisons at once (apologies if this doesn’t make sense — it’s yet another precautionary measure to avoid upsetting the stats nerds among you). In a tough turn for the sociologists and psychologists, none of the purely social/psychological treatments had any signficant effects at all.
Why do BTS and punishing workers for disagreement succeed in improving performance significantly where so many of the other incentive schemes failed? The answer hinges on the fact that both conditions tied workers’ payoffs to their ability to think about their peers’ likely responses. (We elaborate on the argument in more detail in the paper.)
Designing Incentives for Crowdsourcing Workers
(via O’Reilly Radar)