[This new article in Law 360 incorporates insights from a number of previous blog posts to build the case for one big data information strategy incorporating both governance and data asset protection. Apologies and caveats to regular readers for the redundancies. As always, criticism eagerly sought.]
Law360, New York (January 07, 2014, 1:22 PM ET) — Governance over shared and open data and high-velocity and agile processes is not duck soup, but it is something organizations being transformed by big data and the industrial Internet have to do. It means big changes in information security, privacy, records management, defensible disposal, intellectual property and trade secrets protection programs that few organizations, lawyers, regulators or managers have faced yet.
Most lawyers still advocate records management and defensible disposal programs that get rid of all non-record documents as soon as they are no longer needed, even as most of those documents are increasingly to be found in dynamic databases some but not all of which will be increasing sources of value with the use of new analytics tools. The logic of defensible disposal has not changed regarding many types of documents the retention of which pose greater risks than value; that logic just needs to be balanced by intelligent intuitions of how certain documents and data with very limited value now may become much more valuable in the future.
I consider some such dynamic databases of such sleeper documents as candidates for data lakes. Call them “lakes” because unlike in orderly warehouses, big things swim at you fast when you so much as cast your line. Data lakes need two things from a privacy and security standpoint to compensate for relaxation of traditional controls on collection such as data minimization and location of trade secrets in more controlled environments. The rationales for those traditional controls on collection are that:
- decisions on use are not trusted, and
- large databases become malware targets and breach victims.
Thus the two things data lakes most need are:
- trustworthy, transparent and accountable controls, decisions and decision-makers regarding use, and
- really good information security.
So much easier said than done, of course, but I also know a number of organizations that are getting incredibly serious about not just the second one (and see below on how hard that is), but the first one, trusted controls. Trusted controls in the big data world are by no means just about privacy, but ultimately matters of the ethics and compliance of algorithms. Big data classifies in ways that run afoul of employment law, human rights law, sector-specific prohibitions on discrimination and a multitude of other standards. An important role for the lawyers and compliance and ethics officers going forward is assessing the fairness and appropriateness of algorithms.
The records and information management programs of the present and future need to balance defensible disposal with data lakes. In some cases, we have used a zoning-like approach in which certain areas are designated for data lakes, and the construction of the two critical controls then needs to begin.
Of course the big data initiative usually involves not just those lakes, but the integration of new varieties of information from outside of the organization. This I call bringing in the Internet ocean. Often, the data being brought “in” is vast, high-velocity and very unfamiliar to the organization; it does not meet the organizations quality standards and is initially unintelligible in some respects; it often needs to be tagged and valued as new “data asset classes.”
The big data tools encourage the incorporation of as much of this “great unwashed” data as the organization wants into its information stores, but lawyers should watch out carefully for the movement of massive databases into their clients’ possession or control, lest a preservation obligation require a legal hold that will create a “digital landfill” much more massive than was previously imaginable.
Then there are the challenges to data rights posed by sharing and agile processes. Intellectual property law will continue to have a very tough time keeping up with data asset protection needs, so the protections will default to trade secrets and very carefully crafted ownership and use rights for original data, usage data and inferences (or derived data) in contract terms. Organizations will therefore need data asset protection programs focused more on trade secrets and contract terms than in the past.
Big data is blowing up information governance programs, however, not just through rendering old programs inadequate, but by pointing the way to new, “big data”-driven compliance and risk management programs. This is most obvious now in information security, where traditional intrusion detection has been supplemented with exfiltration prevention based on advanced analytics. The “100 percent auditability” of big data demands very important choices by all those responsible for information governance about where to create visibility.
Next, we will apply these concepts to a common scenario.
How to Build Your New Governance and Data Asset Protection Programs
This section will get into the new information risk management concretely, focusing on how you can use the big data initiative to define new trade secrets and new ways of protecting them, with implications for your contracts involving data and your development of data asset protection plans. It will also address what should and rarely does happen before bringing in external data sources and streams.
Let us say your organization is starting to figure out how to get more value from its own databases. In most cases, it is important to recognize that patent and copyright law (in the U.S. vs. Europe, where the Database Directive (96/9/EC) provides copyright-like protection to “authors” invested in the contents or presentation of their databases) are likely to offer only limited protections, so most of your efforts to protect the information must focus on careful definition and protection of trade secrets and contractual rights associated with the raw and inferred data and databases.
So consider this approach: As your organization is identifying types of data and repositories that are of interest for the big data initiative, it may be viewed as essentially defining those types and repositories trade secrets requiring special new protection. For trade secret protection under the Uniform Trade Secrets Act, you need to show reasonable secrecy measures and economic value from those secrecy measures, and secrecy can be achieved through agreements, policy, training and infrastructure. Therefore:
- Everybody handling those types of data you anticipate using now or in the future could get confidentiality agreements beyond their general obligations to protect company assets;
- Careful protection of ownership and use rights and clean data destruction of the raw, usage and derivative data in any contracts with analytics vendors is critical to protecting both the data from security and privacy standpoints and its trade secret status;
- Policies could be modified to focus on data asset protection from a trade secrets perspective, requiring secrecy and protection;
- The information security levels assigned to those types of data could be the levels accorded sensitive information (more on this in c., below);
- Particularly if the data sources contain personal information, focusing on trustworthy, transparent and accountable controls, decisions and decision-makers on use and/or the ever-changing standard of reasonableness in anonymization will only become more critical from a privacy standpoint and will bolster trade secrets arguments as well; and
- Training programs could stress the designation of data as trade secrets and the importance of continued efforts to protect the data as trade secrets.
Even if the initiative begins with a focus on extracting value from data already possessed by the organization, that focus likely leads to incorporation of new data types, such as machine-to-machine and social data, and other data streams from outside the organization. Legal needs to weigh in before appropriations, for many reasons, including:
- The ownership and use rights associated with the external data, and the ways in which they affect the ownership and use rights and trade secret status of derived data and inferences as well as internal data, are critical;
- If the external data is brought into the organization’s custody and control, as the big data storage/analytics tools encourage, and any of it might subject to existing preservation obligations resulting from reasonably likely or pending litigation or investigations, the organization may be forced to expand its legal holds and begin to grow “digital landfills” of unprecedented size;
- The organization may have regulatory or other duties, such as privacy or information security obligations, to understand, manage and/or protect the information once it possesses and controls it; and
- Antitrust concerns should be examined in some cases.
Drawing the Information Security Line, and Current Challenges
Once we focus on all the shared and open data, we are ready to draw the critical information security line between the more highly secured data and everything else. On one side of the line are the trade secrets and IP and the protected personal information (PPI), and any other sensitive or confidential information or other information you are obliged to or want to protect. On the other side are the disposable data and much of the open and shared data (which overlap with the PPI, requiring secure destruction of the latter). The reason this line is particularly difficult in a big data context is because most big data tools were constructed for extremely fast parallel processing and inexpensive storage of massive volumes of data, presenting the security challenges that can be summarized as follows:
- Massive parallel processing through rapid incorporation of nodes; therefore
- No inherent authentication of nodes; therefore
- Danger of rogue nodes; plus
- No role-based access once you’re into a node or cluster; plus
- No encryption between nodes.
Of course, this very architecture also enables the smart data security of focused data loss prevention mentioned in Section 1, but 2014 will be a big year for big data system “hardening,” possibly including node authentication, logging tools, security between nodes, file-layer encryption on each node and pre-deployment validations. The importance not only of information security but trade secret protection make these important due diligence issues for 2014 initiatives.
Refining Defensible Disposal
As you identify all the data types you may want and need to protect as trade secrets and those that have continuing value as shared and/or open data, you can also use that knowledge to improve or jump-start a defensible disposal program for the other data stores, and particularly the ones that come to appear worthless as you’re examining the new trade secrets.
In the longer term, these insights and the new trade secrets will help your records and document management programs and database governance programs to balance “data lakes” and defensible disposal, through making better-informed judgments about information and data that has ongoing value, also enabling more defensible and informed judgments about the useless data — or the data types the cost or risk of harm of which exceeds their worth — that can and should be destroyed.
Summary: A Recipe for New Information Governance and Data Asset Protection
- Distill your new trade secrets as you make your big data plans, using the recipe described above
- Bring in open, shared and/or Internet data to taste, defining ownership and usage rights — particularly in the inferences — carefully, and not necessarily integrating them into your information stores if you want to avoid preservation obligations.
- Do what you can to specially protect the trade secrets, IP and protected personal information (PPI), and any other sensitive or confidential information or other information you are obliged to or want to protect.
- Use the greater visibility into the data stores to identify areas of data that are very unlikely to have any value in any business, investigative or litigation context, which become your most cost-effective candidates for defensible disposal.
- Shake it well, try lots of different ways per second of cooking and combining it and detecting patterns, and please let me know what you’re coming up with that’s most useful to you.
Jon Neiditz is a partner in Kilpatrick Townsend’s Atlanta office, where he leads the firm’s privacy and information security practice.