How Big Data is Blowing Up Information Governance

Governance over shared and open data and high-velocity and agile processes is not duck soup, but it is something organizations being transformed by big data and the industrial internet have to do.  It means big changes in information security, privacy, records management, defensible disposal, intellectual property and trade secrets protection programs that few organizations, lawyers, regulators or managers have faced yet.   This post will begin to lay those changes out.

Most lawyers still advocate records management and defensible disposal programs that get rid of all non-record documents as soon as they are no longer needed, even as most of those documents are increasingly to be found in dynamic databases some but not all of which will be increasing sources of value with the use of new analytics tools.  The logic of defensible disposal has not changed regarding many types of documents the retention of which pose greater risks than value; that logic  just needs to be balanced by intelligent intuitions of how certain documents with very limited value now may become much more valuable in the future.

I consider some such dynamic databases of such sleeper documents as candidates for data lakes.  Call them “lakes” because unlike in orderly warehouses, big things swim at you fast when you so much as cast your line.   Data lakes need two things from a privacy and security standpoint to compensate for traditional controls on collection such as data minimization and location of  trade secrets in more controlled environments.  The rationales for those traditional controls on collection are that  (1) decisions on use are not trusted and (2) large databases become malware targets and breach victims. Thus the two things data lakes most need are (1) trustworthy, transparent and accountable controls, decisions and decisionmakers regarding use and (2) really good information security.   So much easier said than done, I know, but I also know a number of organizations that are getting incredibly serious about  not  just  the second one, but the first one, trusted controls.  Trusted controls in the big data world are by no means just about privacy, but ultimately matters of the ethics and compliance of algorithms.  Big data classifies in ways that run afoul of  employment laws, human rights laws and a multitude of other standards.  An important role for the lawyers and compliance and ethics officers going forward is assessing the fairness and appropriateness of algorithms, or of the algorithms that create the algorithms.

The records and information management programs of the present and future need to balance defensible disposal with data lakes.  In some cases, we have used a zoning-like approach in which certain areas are designated for data lakes, and the construction of the two critical controls then needs to begin.

Of course the big data initiative usually involves not just those lakes, but the integration of new varieties of information from outside of the organization.  This I call bringing in the internet ocean.   Often, the data being brought “in” is vast, high-velocity and very unfamiliar to the organization; it does not meet the organizations quality standards and is initially unintelligible in some respects; it often needs to be tagged and valued as new “data asset classes.”   The big data tools encourage the incorporation of as much of this “great unwashed” data as the organization wants into its information stores, but lawyers should watch out carefully for the movement of massive databases into their clients’ possession or control, lest a preservation obligation require a legal hold that will create a “digital landfill” much more massive than was previously imaginable.

Then there are the challenges to data rights posed by sharing and agile processes, that will be explored more fully in subsequent posts.   Intellectual property law will continue to have a very tough time keeping up with data asset protection needs, so the protections will default to trade secrets and very carefully crafted ownership and use rights for original data, usage data and inferences (or derived data) in contract terms.  The newly-filed Zettaset v. Intel case, in which Intel is accused of appropriating Zettaset big data security trade secrets, may tell us a lot about the trade secrets landscape going forward in this regard.   In any event, organizations will need data asset protection programs focused more on trade secrets and contract terms than in the past.

Big data is blowing up information governance programs, however, not just through rendering old programs inadequate, but by pointing the way to new, big-data driven compliance and risk management programs.  This is most obvious now in information security, where traditional intrusion detection has been supplemented with exfiltration prevention based on advanced analytics.   The “100% auditability” of big data demands very important choices by all those responsible for information governance about where to create visibility.

DataTree