LegalTech just concluded in New York and one of the popular hot buttons many vendors were talking about was the idea that too much corporate, especially valueless, ungoverned, unstructured information is both risky as well as costly to organizations… I agree. The answer to this “infobesity” (the unrestricted saving of ESI because storage is supposedly cheap and saving everything is easier than checking with others to see if its ok to delete) is a defensible process to systematically dispose of information that’s not subject to regulatory requirements, litigation hold requirements or because it still has business value. In a 2012 CGOC (Compliance, Governance and Oversight Counsel) Summit survey, it was found that on the average 1% of an organization’s data is subject to legal hold, 5% falls under regulatory retention requirements and 25% has business value. This means that 69% of an organization’s ESI can be disposed of.
Several vendors at LegalTech were highlighting Defensible Disposal solutions, also known as defensible disposition and defensible deletion, as the answer to the problem of infobesity. Defensible Disposal is defined by many as a process (manual, automated or both) of identifying and permanently disposing of unneeded or valueless data in a way that will standup in court as reasonable and consistent. The key to this process is to be able to identify valueless information (not subject to regulatory retention or legal hold) with enough certainty to be able to actually follow through and delete the data. This may sound easy… its not. Many organizations are sitting on huge amounts of data because their legal department doesn’t want to be accused of spoliation, so has standing orders to “keep everything forever”. Corporate legal has to be convinced that the defensible disposal processes and solutions billed as being the answer to infogluttony can actually tell the difference, accurately and consistently, between information that should be kept and that information that’s truly valueless.
To automate this defensible disposal process, the solution needs to be able to be able to understand and differentiate content conceptually; that an apple is a fruit as well as a huge high tech company. The automated classification/categorization of content cannot accurately or consistently differentiate the meaning in unstructured content by just relying on keywords or simple rules.
An even less consistent approach to categorization is to base it on simple rules such as “delete everything from/to Bill immediately” or “keep everything to/from any accounting employee for 3 years”. This kind of rules based retention/disposition process will quickly have your GC explaining to a Judge why data that should have been retained was “inadvertently” deleted.
To truly automate disposal of valueless information in a consistently defensible manner, categorization applications must have the ability to first, conceptually understand the meaning in unstructured content so that only content meeting your intended intentions, regardless of language, is classified as “of value” to the organization not because it shares a keyword with other records but because it truly meets your definition of content that needs to be kept. Second, because unstructured data by definition is “free-flowing” (not structured into specific rows and columns) extremely high categorization accuracy rates and defensibly can only be achieved with defensible disposal solutions which incorporate an iterative training processes including “train by example” in a human supervised workflow.