A couple of years ago, documents obtained by national security whistle-blower Edward Snowden and published by the Intercept revealed that the U.S. National Security Agency is collecting metadata from a huge number of cellular phones in Pakistan. The program, cheekily called SKYNET after the humanity-destroying artificial intelligence from the Terminator franchise, tracks movements and known associates, then an algorithm analyzes all that Big Data and flags potential terrorists to be targeted for drone strikes. The problem is, a data expert told Ars Technica this week, that algorithm is “completely bullshit.”
Patrick Ball, a data scientist and Ph.D. who directs the Human Rights Data Analysis Group, says fundamental flaws in SKYNET’s methodology may be causing thousands of innocents to be falsely identified as terrorists.
The technical details are spelled out at Ars, but the gist of Ball’s argument is that for a machine to effectively identify a terrorist, it needs known terrorists as input. But there just aren’t that many known terrorists, especially in comparison to the number of phones the NSA is monitoring in Pakistan — at least 55 million.
According to Ball and to documents published by the Intercept, the NSA’s pool of known terrorists is seven. They apparently trained the algorithm by “feeding it six of the terrorists and tasking SKYNET to find the seventh” among a random group of 100,000 citizens, Ars reports.
But Ball says this doesn’t work, because the terrorist cluster is so small and densely connected that it always sticks out from the bigger group. To be truly accurate, the NSA would have to mix all the terrorists into the population before choosing the random sample of 100,000 — but they can’t, because they just don’t have enough terrorists.
As a result, Ball says, SKYNET’s reported numbers for false positives are “ridiculously optimistic.” And optimistically, the NSA reports a 0.008 percent false-positive rate, which represents 15,000 people who could have been mislabeled as terrorists and potentially targeted for assassination by drone. (Thus far, as many as 4,000 Pakistani civilians have been reported killed in drone strikes.)
Big Data being used to show you ads or recommend friends can certainly feel intrusive, but when Facebook gets it wrong, the worst consequence is that icky, uncanny valley feeling. But that’s nothing compared to what can happen when machine learning goes wrong for a military-intelligence app in Pakistan. It can literally be life or death.