Train your AI with the world's largest data-set of sarcasm, courtesy of redditors' self-tagging

Cory Doctorow 12:23 pm Mon May 1, 2017

Redditors' convention of tagging their sarcastic remarks is a dream come true for machine learning researchers hoping to teach computers to recognize and/or generate sarcasm. The Self-Annotated Reddit Corpus (SARC)…

Redditors' convention of tagging their sarcastic remarks is a dream come true for machine learning researchers hoping to teach computers to recognize and/or generate sarcasm.

The Self-Annotated Reddit Corpus (SARC) is a corpus with 1.3 million sarcastic remarks ("10 times more than any previous dataset") that were tagged by redditors and stored in the database along with "user, topic, and conversation context."

Reddit comments from December 2005 have been
made available due to web-scraping 4
; we construct
our dataset as a subset of comments from
2009-2016, comprising the vast majority of comments
and excluding noisy data from earlier years.
For each comment we provide a sarcasm label, author,
the subreddit it appeared in, the comment score as voted on by users, the date of the comment,
and the parent comment or submission.

A Large Self-Annotated Corpus for Sarcasm

[Mikhail Khodak, Nikunj Saunshi and Kiran Vodrahalli/Princeton University]

(via Marginal Revolution)

Larry Tesler, the father of cut, copy, paste, has died

Larry Tesler, the Xerox PARC computer scientist who coined the terms cut, copy, and paste, has died. Born in 1945 in New York, Tesler went on to study computer science… READ THE REST
"Edge AI": encapsulating machine learning classifiers in lightweight, energy-efficient, airgapped chips

Writing in Wired, Boing Boing contributor Clive Thompson discusses the rise and rise of "Edge AI" startups that sell lightweight machine-learning classifiers that run on low-powered chips and don't talk… READ THE REST
The bubbles in VR, cryptocurrency and machine learning are all part of the parallel computing bubble

Yesterday's column by John Naughton in the Observer revisited Nathan Myhrvold's 1997 prediction that when Moore's Law runs out — that is, when processors stop doubling in speed every 18… READ THE REST
Short Post, just one paragraph

Disclosure: Boing Boing earns a commission on purchases made through links in this post. Dessert cheesecake wafer bear claw fruitcake. Fruitcake chupa chups donut candy canes marzipan. Apple pie sweet… READ THE REST
Save 50% on a 1-year subscription to Dashlane's premium password manager

Disclosure: Boing Boing earns a commission on purchases made through links in this post. We all know vital information about ourselves and our private digital accounts can be compromised by… READ THE REST
The Bite Helper removes the itch of a mosquito bite in seconds

Disclosure: Boing Boing earns a commission on purchases made through links in this post. While mosquitoes have certainly earned their title as the deadliest animal on earth, their impact on… READ THE REST