Model stealing, rewarding hacking and poisoning attacks: a taxonomy of machine learning's failure modes


A team of researchers from Microsoft and Harvard's Berkman Center have published a taxonomy of "Failure Modes in Machine Learning," broken down into "Intentionally-Motivated Failures" and "Unintended Failures."


Intentional failures are things like "Preturbation attacks" (previously), which make tiny changes to inputs that produce wildly inaccurate classifications; "Membership inference" (using a model to learn whether a given person's data was used to train it) and "Backdoor ML" (implanting a backdoor to the algorithm that attackers activate with specific triggers).


Unintended failures include my personal favorite, "Reward hacking" (previously) (reinforcement learning systems discover a way to game the system and cheat their way to success); "Natural adversarial examples" ("Without attacker perturbations, the ML system fails owing to hard negative mining"); and that old favorite, "Incomplete testing" ("The ML system is not tested in the realistic conditions that it is meant to operate in").

For engineers, we recommend browsing through the overview of possible failure modes and jumping into the threat modeling document. This way, engineers can identify threats, attacks, vulnerabilities and use the framework to plan for countermeasures where available. We then refer you to the bug bar that maps these new vulnerabilities in the taxonomy alongside traditional software vulnerabilities, and provides a rating for each ML vulnerability (such as critical, important). This bug bar is easily integrated into existing incident response processes/playbooks.

For lawyers and policy makers, this document organizes ML failure modes and presents a framework to analyze key issues relevant for anyone exploring policy options, such as the work done here[5],[6]. Specifically, we have categorized failures and consequences in a way that policy makers can begin to draw distinctions between causes, which will inform the public policy initiatives to promote ML safety and security. We hope that policy makers will use these categories begin to flesh out how existing legal regimes may (not) adequately capture emerging issues, what historical legal regimes or policy solutions might have dealt with similar harms, and where we should be especially sensitive to civil liberties issues.

Failure Modes in Machine Learning [Ram Shankar, Siva Kumar, David O’Brien, Jeffrey Snover, Kendra Albert and Salome Viljoen/Microsoft]


(via Schneier)

(Image: I made a robot to help me argue on the internet
, Simone Giertz/Youtube
)