Information Governance and Predictive Coding

Predictive coding, also known as computer assisted coding and technology assisted review, all refer to the act of using computers and software applications which use machine learning algorithms to enable a computer to learn from records presented it (usually from human attorneys) as to what types of content are potentially relevant to a given legal matter. After a sufficient number of examples are provided by the attorneys, the technology is given access to the entire potential corpus (records/data) to sort through and find records that, based on its “learning”, are potentially relevant to the case.

This automation can dramatically reduce costs due to the fact that computers, instead of attorneys conduct the first pass culling of potentially millions of records.

Predictive coding has several very predictable dependencies that need to be addressed to be accepted as a useful and dependable tool in the eDiscovery process. First, which documents/records are used and who chooses them to “train the system”? This training selection will almost always be conducted by attorneys involved with the case.

The second dependency revolves around the number of documents used for the training. How many training documents are needed to provide the needed sample size to enable a dependable process?

And most importantly, do the parties have access to all potentially relevant documents in the case to draw the training documents from? Remember, potentially relevant documents can be stored anywhere. For predictive coding, or any other eDiscovery process to be legally defensible, all existing case related documents need to be available. This requirement highlights the need for effective information management by all in a given organization.

As the courts adopt, or at least experiments with predictive coding, as Judge Peck did in Monique Da Silva Moore, et al., v. Publicis Groupe & MSL Group, Civ. No. 11-1279 (ALC)(AJP) (S.D.N.Y. February 24, 2012, an effective information management program will become key to he courts adopting this new technology.

How easy is eDiscovery in SharePoint 2010?

There has been nagging questions surrounding SharePoint and its ability to allow complete and effective eDiscovery searches of all potentially responsive content in the repository. The below description is from the Microsoft Enterprise Content Management (ECM) Team Blog.

From the Microsoft blog:=================================================================

Hi everyone, I am Quentin Christensen and I work on document and records management functionality for SharePoint. Electronic discovery (commonly referred to as eDiscovery) is an area we are supporting with new set of capabilities in SharePoint Server 2010. In case you are not familiar with eDiscovery, it is the process of finding, preserving, analyzing and producing content in electronic formats as required by litigation or investigations. eDiscovery is an important concern for all of our customers and given that SharePoint has grown to be an integral part of collaboration, document, and records management for many organizations, we recognize the need to support the eDiscovery process for SharePoint content.

Microsoft Office SharePoint Server 2007 included a hold feature that could be used for eDiscovery, but it was scoped to the Records Center site template. With SharePoint Server 2010 the eDiscovery capabilities have been greatly expanded to provide more functionality and the power to use these features across your entire SharePoint deployment.

In this post, I want to highlight three major improvements in SharePoint that support eDiscovery. You can:

  • Manage holds and conduct eDiscovery searches on any site collection
  • Use SharePoint Server Search or FAST Search for SharePoint out of box to search and process content
  • Automatically copy eDiscovery search results to a separate repository for further analysis

Read on to learn how SharePoint Server 2010 can support your eDiscovery initiatives and provide you with the tools you need to manage holds, identify, and collect SharePoint content.

The eDiscovery Process

The Electronic Discovery Reference Model from EDRM ( provides an overview of the different parts of the eDiscovery process:

imageSharePoint Sever 2010 addresses the Information Management, Identification, Preservation and Collection stages. While this blog post will focus mostly on the identification, preservation and collection components, SharePoint provides a rich Information Management platform for Collaboration, Social Computing, Document Management and Records Management.  This means that you can take a proactive approach to eDiscovery by putting a governance framework in place and using appropriate disposition policies to expire content. Managing content and deleting it when it is no longer needed will reduce the amount of content that must be indexed and searched, and collected for eDiscovery.  The result is that eDiscovery costs can be dramatically reduced, changing the problem from finding a needle in a hay stack to finding a needle in a hay bale. Ultimately, the key to achieving legal compliance for eDiscovery obligations is built upon a foundation of robust Information Management.

When an eDiscovery event occurs, such as a receipt of complaint, discovery, or notice of potential legal claim, the identification stage begins. Content that may be subject to eDiscovery must be identified and searches are conducted to find that content. That content needs to be preserved and at some point, the content will be collected.


The eDiscovery Features

Hold and eDiscovery

Hold and eDiscovery is a site level feature that can be activated on any site.

imageActivating this feature creates a new category in Site Settings that provides links to Holds and Hold Reports lists. There is also a page to discover and hold content that allows you to search for content and add it to a hold. Once the Hold and eDiscovery feature is activated you can create holds and add to hold any content in the site collection. By default only Site Collection administrators have access to the Hold and eDiscovery pages. To give other users permission, add them to the permissions list for the Hold Reports and Holds lists. This will also give access to the Discover and hold content page.

clip_image005You can manually locate content in SharePoint and add it to a hold, or you can search for content and add the search results to a hold. With the Hold and eDiscovery feature you can create holds in the hold list and then manually add content to the relevant hold by clicking on Compliance Details from the drop down menu for individual items.

imageThen click on the link to Add/Remove from hold.

imageAnd you can select the relevant hold to add to or remove from.

imageBy manually adding an item to hold you will block editing and deletion of that item until it is released from hold. You will notice that the document now has a lock icon showing that it cannot be edited or deleted.

imageEach night a report for each hold is generated by a timer job. If you need a hold report faster you can manually run the Hold Processing and Reporting timer job in Central Administration.

Search and Process

You can manually add items to hold on any site collection, which is great. But that doesn’t help you find the content you don’t already know about. What if you have a large amount of items you want to find and add to a hold? For that you can use the features on the Discover and hold content page, which is a settings page in Site Settings. From this page you can specify a search query and then preview the results. The configured search service (SharePoint Search Server or FAST Search for SharePoint) will automatically be used. You can then select the option to keep items on hold in place so they cannot be edited or deleted, or if you have configured a Content Organizer Send to location in Central Administration you can have content copied to another site and placed on hold. You may want to create a separate records center site for a particular hold to store all content related to that hold. The Content Organizer is a new SharePoint Server 2010 feature based on the Microsoft Office SharePoint Server 2007 Document Router with richer functionality to automatically classify content based on Content Type or metadata properties. Look for a future blog post covering the Content Organizer.

Holding content in place is recommended if you want to leave content in the location is was created with all the rich context that SharePoint provides, while blocking deletion and editing of content. Be aware that this will prevent users from modifying items. If you prefer users to continue editing documents, then use the copy to another location approach.

When searching and processing, the search will by default be scoped to the entire Site Collection and run with elevated permissions so all content can be discovered. The search can be scoped to specific sites and you can also preview search results before adding the results to a hold. Items can be placed on multiple holds and compliance details will show all of the holds that are applied to an item.

imageIn summary, SharePoint Server 2010 contains key features that make it an essential aspect of your eDiscovery strategy. With the new SharePoint Server 2010 capabilities you can easily apply proper retention policies for all content and make it easier to discover content if an eDiscovery event occurs. eDiscovery often prescribes tight deadlines for production. SharePoint 2010 helps you find the right content and deliver it faster.

Quentin Christensen
Program Manager – Document and Records Management

eDiscovery Results in Second Look at Records Management Policies

From the PIHRAeScope Blog:

By Allen Smith

Truly robust records management policies can help keep the costs of e-discovery in check, according to Danuta Panich, an attorney with Ogletree Deakins in Indianapolis.

Having a policy that outlines when records are to be destroyed can reduce the amount of information that needs to be reviewed when there’s litigation, explained Philip Gordon, an attorney with Littler Mendelson and chair of the firm’s Privacy and Data Protection Group in Denver. E-mail can be particularly voluminous without a records management policy and costly to search through during e-discovery, he noted.
“Courts recognize that employers don’t have a duty to retain every scrap of information produced by the organization and that information is destroyed that’s no longer needed,” he added.

Panich told SHRM Online that a records management policy should address such issues as:

  • Avoidance of casual proliferation.
  • Where and how electronically stored information (ESI) should be stored.
  • The importance of prompt deletion of all ESI that is not specifically required by the retention schedule, is no longer of immediate use and is not subject to a litigation hold.
  • A process for periodic review of all information stored.
  • Appropriate methods of eliminating ESI.
  • An audit process to ensure compliance.

Different Approaches
There are different schools of thought with records retention. Gordon noted that some categories of information by law need to be retained for a minimum period. For example, he said documents pertaining to the Employee Retirement Income Security Act need to be preserved for six years, while job applications need to be preserved for at least a year and some Occupational Safety and Health Administration documents need to be preserved for 30 years. But documents that don’t need to be preserved for a minimum period, like most e-mails, should have a short life span—such as 30, 60 or 90 days—to keep the volume of material down during e-discovery, he remarked.

However, Robin Shea, an attorney with Constangy, Brooks & Smith in Winston-Salem, N.C., said that since the enactment of the Lilly Ledbetter Fair Pay Act, which theoretically allows employees to recover based on long-past employment decisions, her ideal has been to preserve everything and have no automatic destruction. But she acknowledged that this is too burdensome and expensive for some employers. The next best thing is to retain the records for the longest applicable statute of limitations, she said.

“We recommend these long retention periods not because the Rules of Civil Procedure require them but because the information retained may very well provide the evidence needed to help the employer defend itself,” Shea remarked.

But Karin McGinnis, an attorney with Moore & Van Allen in Charlotte, N.C., warned against saving too much and instead recommended the use of retention policies to narrow down the amount of company information that is preserved.

Litigation Hold
A good retention policy can be a defense against spoliation of evidence, McGinnis added. If e-mails are deleted after 30  days and certain older e-mails are not available when litigation arises, the policy can show that the e-mails were deleted not to hide or destroy evidence but because of the records retention policy.

But once litigation arises, a litigation hold bars the destruction of records potentially relevant to the issues in the case. It would violate the Federal Rules of Civil Procedure for an employer to destroy ESI that was potentially discoverable after the employer became aware of the threat of litigation, Shea noted.

It is critically important to have a process in place to get a litigation hold set quickly so there are no issues about the employer destroying information that should not have been destroyed, McGinnis said.

Key Players
HR or an in-house attorney should coordinate the entire e-discovery effort, according to Shea.

HR must be in close contact with counsel to ensure that it understands what types of information may be relevant and subject to preservation and production, Panich remarked. HR also must coordinate with individuals whose decisions or actions are being challenged to ensure that information in their control is preserved.

And HR should work with information technology for a variety of reasons. HR should gain an understanding of what potentially relevant information is stored in company systems so that it can alert the managers of those systems about preservation requirements, she said. HR also needs to work with IT to devise automated solutions to preservation, such as capturing particular employees’ e-mail automatically without depending on the employee to identify and take action to save relevant e-mail. IT also is integral to the process of identifying and addressing features built into many systems that result in the automatic deletion of relevant information, she noted. Under the Federal Rules of Civil Procedure, employers are expected to take reasonable, good-faith steps to disable such features when there is a litigation hold.

Depending on the nature of the litigation, usually it will be necessary to involve someone from the substantive area from which the litigation arises, Shea noted. For example, if the plaintiff is a sales representative, the e-discovery team should include at least one member of sales management. This person can describe the various types of ESI that are available and interpret them for HR, the attorneys and IT.

Employees should be notified about the records retention policy and understand that failure to preserve relevant records can result in substantial liability for the company, she said.

It is better to have no retention policy than to have a policy that is not followed, Panich cautioned. Early in almost every significant case, there is a request to produce the employer’s records retention policy. It provides a road map as to what should be discoverable. “From there, spoliation claims are only a few steps away if the employer fails to have the referenced information for the requisite period. Nothing detours a case and builds costs like spoliation claims,” she remarked.

The ROI of Information Management

Information, data, electronically stored information (ESI), records, documents, hard copy files, email, stuff—no matter what you call it; it’s all intellectual property that your organization pays individuals to produce, interpret, use and export to others. After people, it’s a company’s most valuable asset, and it has many CIOs, GCs and others responsible asking: What’s in that information; who controls it; and where is it stored?

In simplest terms, I believe that businesses exist to generate and use information to produce revenue and profit.  If you’re willing to go along with me and think of information in this way as a commodity, we must also ask: How much does it cost to generate all that information? And, what’s the return on investment (ROI) for all that information?

The vast majority of information in an organization is not managed, not indexed, not backed up and, as you probably know or could guess, is rarely–if ever–accessed. Consider for a minute all the data in your company that is not centrally managed and  not easily available. This data includes backup tapes, share drives, employee hard disks, external disks, USB drives, CDs, DVDs, email attachments  sent outside the organization and hardcopy documents hidden away in filing cabinets.

Here’s the bottom line: If your company can’t find information or  doesn’t know what it contains, it is of little value. In fact, it’s valueless.

Now consider the amount of money the average company spends on an annual basis for the production, use and storage of information. These expenditures span:

  • Employee salaries. Most employees are in one way or another hired to produce, digest and act on information.
  • Employee training and day-to-day help-desk support.
  • Computers for each employee
  • Software
  • Email boxes
  • Share drives, storage
  • Backup systems
  • IT employees for data infrastructure support

In one way or another, companies exist to create and utilize information. So… do you know where all your information is and what’s in it? What’s your organization’s true ROI on the production and consumption of your information in your entire organization? How much higher could it be if you had complete control if it?

As an example, I have approximately 14.5 GB of Word documents, PDFs, PowerPoint files, spreadsheets, and other types of files in different formats that I’ve either created or received from others. Until recently, I had 3.65 GB of emails in my email box both on the Exchange server and mirrored locally on my hard disk. Now that I have a 480 MB mailbox limit imposed on me, 3.45 GB of those emails are now on my local hard disk only.

How much real, valuable information is contained in the collective 18 GB on my laptop? The average number of pages of information contained in 1 GB is conservatively 10,000. So 18 GB of files equals approximately 180,000 pages of information for a single employee that is not easily accessible or searchable by my organization. Now also consider the millions of pages of hardcopy records existing in file cabinets, microfiche and long term storage all around the company.

The main question is this: What could my organization do with quick and intelligent access to all of its employees’ information?

The more efficient your organization is in managing and using information, the higher the revenue and hopefully profit per employee will be.

Organizations need to be able to “walk the fence” between not impeding the free flow of information generation and sharing, and having a way for the organization as a whole to  find and use that information. Intelligent access to all information generated by an organization is key to effective information management.

Organizations spend huge sums of money to generate information…why not get your money’s worth? This future capability is the essence of true information management and much higher ROIs for your organization.


Effective Records Management Greatly Benefits the Legal Dept for eDiscovery

Many (but not all) corporate legal types consider ESI retention management as the legal hold process. Not a bad thought but really falls short of a true corporate definition of the term. To records managers ESI retention management refers to the systematic retention and disposition of the organizations electronic business records; either for the day to day running of the business, regulatory compliance or litigation support. And in this case I believe the records managers are right.

ESI retention management, also known as records management, needs to be better understood by corporate legal because the proper management and deletion of electronic business records have a direct relationship to the corporate legal department for both legal holds and eDiscovery.

A properly managed ESI records management system allows legal to quickly find and place on legal hold, all archived potentially responsive electronically stored information thereby reducing the risk of spoliation; destruction of evidence. A centralized ESI management system will also act as a on-going collection point so that when eDiscovery starts, the collection phase is already taken care of for that ESI already under management. Because the archive acts as an on-going collection point, the legal department can quickly search the ESI archive for responsive ESI and begin their culling and review responsibilities almost immediately; without the need to spend days or weeks trying to find/collect potentially responsive ESI.


Its time to address this expectation from records managers that all records are created equal…They’re not.

How many pieces of paper a day does the average employee create or receive? For me its zero. Now, how many electronic documents, spreadsheets, email, attachments, instant messages, etc. does the average employee create or receive per day? In my case its hundreds of objects and 30-100 MB per day.

I work with customers all the time that try to use a very detailed retention schedule they created for hardcopy documents with their electronic records. They instruct their employees to classify each email etc. based on the retention schedule with little thought given to how long this directive will take each employee. They throw the policy over the fence and because few openly refuse to use it, assume it is working.

In records management we have a 5 second rule: if it takes an employee more than 5 seconds to classify a document, they will either attach the longest retetion period to it or delete it immediatly. I worked with a large bank that had a 290 page retention schedule that employees were suppose to consult for every record including emails. Every employy I interviewed either didn’t know the schedule existed or they classified everything as infinite retention period. Hardly a usefule system.

Companies not under a federal or state regulatory retentio requirements need to get a little more realistic about electronic record retention policies. Usually “high water mark” retention policies are the way to go. Assign retention based on department or function such as “2 years” for everyone on Finance. Many companies spend way to much time and expense trying to not keep the “lets go to lunch” types of emails for example. Why not keep them, they take up little room and are not detremental.

Lets just get more realistic about records management.