Baking Your Big Data Information Governance Program; In Celebration of 2013

We all know the feeling; relatives on their way over, and no time left to bake a big data information governance program from scratch.  Got you covered; knew you were coming, so the elves baked one for you while you slumbered snugly; dropped it down your digital chimney faster than you could click on Delivery Drone Shipping Option.  Not quite heat ‘n’ serve, but hope it will open the door to some useful discoveries.

We started this blog in April by sharing the recognition that the traditional privacy rules need massive change thanks to big data, and a call for more systematic big data strategy.  A substantial part of the year (a series of 13 posts) chronicled what I called in June “the most consequential revelations about the federal government of our lifetimes,” which have already inspired not only the expected global economic changes, but a response  that in my humble opinion should give us great faith even now in the future of Democracy in America.  Getting back to our knitting, we have more recently been focused how big data is blowing up information governance beyond algorithm compliance and ethics (not just privacy), including transforming records management and defensible disposal, and how, because IP in the US can’t keep up with big data, trade secrets and contractual protection of data become important to strategy (and in part just because of how counter-intuitive and Scrooge-like that perspective is to many big data practitioners).  But in this season of sharing, the incompleteness of the trade secrets approach to big data is so stark; so much of big data is necessarily shared and open data, regardless of how much of your own data you choose to share, particularly when you “bring in the internet ocean.”

Once we focus on all the shared and open data, we are ready to draw the critical  information security line between the more highly secured data and everything else.  On one side of the line are the trade secrets and IP and the protected personal information (PPI), and any other sensitive or confidential information or other information you are obliged to or want to protect.  On the other side are the disposable data and much of the open and shared data (which overlap with the PPI, requiring secure destruction of the latter).   “Duh,” you say, “for this I gave up watching the Duck Dynasty marathon?”  The reason this line is particularly difficult to draw in a big data context is because the big data tools were constructed to make information security more difficult, might be said to be now where cloud computing was, well, a while ago.  The security challenges can be summarized as follows:

  • Massive parallel processing through rapid incorporation of nodes; therefore
  • No inherent authentication of nodes; therefore
  • Danger of rogue nodes; plus
  • No role-based access once you’re into a node or cluster; plus
  • No encryption between nodes.

Apache-Big-Data-Architecture-300x201-1

So there is a lot for us information risk managers to worry about.  To continue geeking out for the remainder of this paragraph, there are:

  • Node authentication (E.g., Kerberos), balancing performance issues;
  • Logging tools that leverage the cluster to store events so they scale with the clusters (e.g., Splunk, open source alternatives);
  • Big data monitoring tools (that scale like big data and use big data velocity capabilities, for, e.g., malware detection or data loss prevention);
  • Security between nodes, not just between cluster and client, as the real “data in transit” issue;
  • File layer encryption on each node with good key management, plus other encryption options, all weighed against performance issues; and
  • Pre-deployment validation.

OK, sorry, so I owe you more plain English and explanations than that, but suffice it to say that 2014 is going to be a big year in information security for big data, now that these vulnerabilities and opportunities for improvement are out in the open for all to see.   Right now, I want to give her/him among you whose inner ascetic, like mine, is being inspired by this materialistic holiday to reach for ideas, something to replace the visions of sugarplums that may still be dancing in your heads.  So here’s one possible family recipe for big data information governance:

  • Distill your new trade secrets as you make your big data plans, using the recipe described in the last post.
  • Bring in open, shared and/or internet data to taste, defining ownership and usage rights–particularly in the inferences–carefully, and not necessarily integrating them into your information stores if you want to avoid preservation obligations.
  • Do what you can to counter the temporary insecurity of big data tools to specially protect the trade secrets, IP and protected personal information (PPI), and any other sensitive or confidential information or other information you are obliged to or want to protect.
  • Use the greater visibility into the data stores to identify areas of data that are very unlikely to have any value in any business, investigative or litigation context, which become your most cost-effective candidates for defensible disposal.
  • Shake it well, try lots of different ways per second of cooking and combining it and detecting patterns, and please let me know what you’re coming up with that’s most useful to you.

Hope your holiday is filled with wonderful discoveries.

Evie-300x210