If you think you've anonymized a data set, you're probably wrong


Using some clever computing, Atockar took the NYC Taxicab Dataset and not only calculated the annual income of every hack in New York, but also figured out who goes to strip clubs, what celebrities' home addresses were, and how they tipped.

In order to do this, I spent some of the most riveting hours of my professional career searching through images of “celebrities in taxis in Manhattan in 2013″ to find enough information to identify the correct record in the database. I had some success – combining the below photos of Bradley Cooper and Jessica Alba with some information from celebrity gossip blogs allowed me to find their trips, which are shown in the accompanying maps.

In Brad Cooper’s case, we now know that his cab took him to Greenwich Village, possibly to have dinner at Melibea, and that he paid $10.50, with no recorded tip. Ironically, he got in the cab to escape the photographers! We also know that Jessica Alba got into her taxi outside her hotel, the Trump SoHo, and somewhat surprisingly also did not add a tip to her $9 fare. Now while this information is relatively benign, particularly a year down the line, I have revealed information that was not previously in the public domain. Considering the speculative drivel that usually accompanies these photos (trust me, I know!), a celebrity journalist would be thrilled to learn this additional information.

Riding with the Stars: Passenger Privacy in the NYC Taxicab Dataset [Atockar/Neustar]

(via JWZ)