Michael I Jordan is an extremely accomplished computer scientist who is also deeply skeptical of claims made by Big Data advocates as well as people who believe that machine intelligence, AI and machine vision are solved, or nearly so.
In a spectacular interview conducted by Lee Gomes (whose work I've admired since he was at the WSJ — he makes a brief cameo in Makers), Jordan excoriates trendy ideas from the computer science world, offering cogent critiques that are as smart as they are necessary:
Michael Jordan: In a classical database, you have maybe a few thousand people in them. You can think of those as the rows of the database. And the columns would be the features of those people: their age, height, weight, income, et cetera.
Now, the number of combinations of these columns grows exponentially with the number of columns. So if you have many, many columns—and we do in modern databases—you’ll get up into millions and millions of attributes for each person.
Now, if I start allowing myself to look at all of the combinations of these features—if you live in Beijing, and you ride bike to work, and you work in a certain job, and are a certain age—what’s the probability you will have a certain disease or you will like my advertisement? Now I’m getting combinations of millions of attributes, and the number of such combinations is exponential; it gets to be the size of the number of atoms in the universe.
Those are the hypotheses that I’m willing to consider. And for any particular database, I will find some combination of columns that will predict perfectly any outcome, just by chance alone. If I just look at all the people who have a heart attack and compare them to all the people that don’t have a heart attack, and I’m looking for combinations of the columns that predict heart attacks, I will find all kinds of spurious combinations of columns, because there are huge numbers of them.
So it’s like having billions of monkeys typing. One of them will write Shakespeare.
Machine-Learning Maestro Michael Jordan on the Delusions of Big Data and Other Huge Engineering Efforts [Lee Gomes/IEEE Spectrum]
(via O'Reilly Radar)
(Image: Needle in a Haystack, James Lumb, CC-BY)