Big Data, Truth and the Imagination

A nice post by Quentin Hardy in the NYT’s “Bits” blog today summarizes an excellent keynote by Microsoft’s Kate Crawford at a conference at Berkeley in which she busted many of the myths of what she (and Dilbert) branded “Big Data fundamentalism — the idea with larger data sets, we get closer to objective truth.”  Hardy notes the likelihood of such fundamentalism given that the “promise of certainty has been a hallmark of the technology industry for decades.”

Connoisseurs of technological and cultural disruptive innovation see big data differently. A self-driving car or drone had better be certain about a huge number of elements of objective truth, or it will plow right into many such elements in no time. But what is exciting is that it changes how we transport and/or view everything, challenging many of the truths we have constructed about the human condition as well as human rights.

Different uses of big data need different levels and types of assurance of truth. As I pointed out in another post, litigators picking apart a conclusion derived through patterns found through massive parallel processing of information stores that score high on all three of the “Vs” are likely to:

immediately perceive how easy it would be to pick apart the analytic process to expose its flaws. Not much big data analysis will survive that type of scrutiny well, because it is not following established and recognized protocols; it is rather laying data over data to find new patterns that may yield new insights, but in themselves prove very little. By contrast, consider predictive coding in e-discovery. Predictive coding moves forward very slowly and deliberately because it has to be agreed to by opposing counsel or sanctioned by a judge in the context of an adversarial proceeding. Is it possible that a big data project incorporating hundreds of varieties of data, extraordinary velocity and huge volume will ever get there? The ends, the means, and the toleration of “messiness” or lack thereof all differ fundamentally.

As a privacy and information security lawyer, I tend to see  the types of big data analysis and their relationship to different levels and types of truth as bearing some resemblance to the much less world-changing marketplace of cloud computing services and its relationship to information governance.  In cloud computing, one can now find a full range of levels of information assurance, from the fastest and cheapest public clouds to products that can satisfy almost any regulatory requirement or security standard.   In the same way, the big data marketplace offers a spectrum from predictive coding with repeatable processes even the legal system can understand to experimentation combining diverse information so rapidly that it has challenged the scientific method itself. As a lawyer helping clients look at legal risks associated with big data analysis,

I like to distinguish between (1) the litigation level of assurance, in which each step in the analysis must be not only defensible but affirmatively proven and specifically authorized, (2) the governance, risk and compliance level of assurance, which may look at large, varied and high-velocity datasets to find patterns that must generally be defensible, but where you may have the room to lay data over data in ways to which the data subjects would generally consent to detect patterns that make sense of the data in new ways, and (3) wide-open big data analysis, where the limits on discoveries and on the secondary and tertiary uses of the data and beyond are the limits of imagination.

That last crack about the limits of imagination was designed to raise concerns among my fellow privacy lawyers about purpose-based and context-based limitations on the uses of personal information, but it was really much too modest a statement about what we need big data to do for us. If you have any doubt about the help we need from big data in expanding those limits of imagination, consider that mobile health thought and market leader Qualcomm is now offering a $10 million prize named after the tricorder, which was invented in the 1960s without much big data. So although Ms. Crawford is certainly right that “We need to think about how we will navigate these systems. Not just individually, but as a society,” and nobody should ever get down on their knees and pay as Dilbert’s boss does, the most fun and benefit may come with the ideas we get that are new. Or, as the poet of the imagination and pragmatic lawyer Wallace Stevens once said, “The final belief is to believe in a fiction, which you know to be a fiction, there being nothing else. The exquisite truth is to know that it is a fiction and that you believe in it willingly.”

tricorder-0212-mdn