Researchers think that adversarial examples could help us maintain privacy from machine learning systems

Machine learning systems are pretty good at finding hidden correlations in data and using them to infer potentially compromising information about the people who generate that data: for example, researchers fed an ML system a bunch of Google Play reviews by reviewers whose locations were explicitly given in their Google Plus reviews; based on this, the model was able to predict the locations of other Google Play reviewers with about 44% accuracy.

This has grave implications for privacy, as it can be used to de-anonymize or re-identify anonymized data-sets (for example, some of these statistical methods are used to make very accurate guesses about which pages are being transited through encrypted Tor connections).


But machine learning systems are also plagued by adversarial examples: tiny preturbations in data that would not confuse humans but which sharply decrease the accuracy of machine learning systems (from fake roadsigns projected for an eyeblink to changes to a single pixel in an image.


Wired's Andy Greenberg reports on two scholarly efforts to systematize a method for exploiting adversarial examples to make re-identification and de-anonymization attacks less effective.

The first is Attrigard, which introduces small amounts of random noise into user activity such that "an attacker’s accuracy is reduced back to that baseline." The second is Mockingbird, designed to make it much harder to guess which encrypted pages and sites are traversing the Tor network, though Mockingbird requires a much larger injection of noise, incurring a 56% bandwidth penalty in Tor's already bandwidth-starved environment.

Interestingly, both methods include theoretical ways to stay ahead of any countermeasures taken by would-be privacy invaders.

The cat-and-mouse game of predicting and protecting private usre data, Gong admits, doesn't end there. If the machine-learning "attacker" is aware that adversarial examples may be protecting a data set from analysis, he or she can use what's known as "adversarial training"—generating their own adversarial examples to include in a training data set so that the resulting machine-learning engine is far harder to fool. But the defender can respond by adding yet more adversarial examples to foil that more robust machine-learning engine, resulting in an endless tit-for-tat. "Even if the attacker uses so-called robust machine learning, we can still adjust our adversarial examples to evade those methods," says Gong. "We can always find adversarial examples that defeat them."


Blind Spots in AI Just Might Help Protect Your Privacy [Andy Greenberg/Wired]