Finding the Cure for the Healthcare Unstructured Data Problem


Healthcare information/ and records continue to grow with the introduction of new devices and expanding regulatory requirements such as The Affordable Care Act, The Health Insurance Portability and Accountability Act (HIPAA), and the Health Information Technology for Economic and Clinical Health Act (HITECH). In the past, healthcare records were made up of mostly paper forms or structured billing data; relatively easy to categorize, store, and manage.  That trend has been changing as new technologies enable faster and more convenient ways to share and consume medical data.

According to an April 9, 2013 article on ZDNet.com, by 2015, 80% of new healthcare information will be composed of unstructured information; information that’s much harder to classify and manage because it doesn’t conform to the “rows & columns” format used in the past. Examples of unstructured information include clinical notes, emails & attachments, scanned lab reports, office work documents, radiology images, SMS, and instant messages.

Who or what is going to actually manage this growing mountain of unstructured information?

To insure regulatory compliance and the confidentiality and security of this unstructured information, the healthcare industry will have to 1) hire a lot more professionals to manually categorize and mange it or 2) acquire technology to do it automatically.

Looking at the first solution; the cost to have people manually categorize and manage unstructured information would be prohibitively expensive not to mention slow. It also exposes private patient data to even more individuals.  That leaves the second solution; information governance technology. Because of the nature of unstructured information, a technology solution would have to:

  1. Recognize and work with hundreds of data formats
  2. Communicate with the most popular healthcare applications and data repositories
  3. Draw conceptual understanding from “free-form” content so that categorization can be accomplished at an extremely high accuracy rate
  4. Enable proper access security levels based on content
  5. Accurately retain information based on regulatory requirements
  6. Securely and permanently dispose of information when required

An exciting emerging information governance technology that can actually address the above requirements uses the same next generation technology the legal industry has adopted…proactive information governance technology based on conceptual understanding of content,  machine learning and iterative “train by example” capabilities

Advertisements

The lifecycle of information


Organizations habitually over-retain information, especially unstructured electronic information, for all kinds of reasons. Many organizations simply have not addressed what to do with it so many of them fall back on relying on individual employees to decide what should be kept and for how long and what should be disposed of. On the opposite end of the spectrum a minority of organizations have tried centralized enterprise content management systems and have found them to be difficult to use so employees find ways around them and end up keeping huge amounts of data locally on their workstations, on removable media, in cloud accounts or on rogue SharePoint sites and are used as “data dumps” with or no records management or IT supervision. Much of this information is transitory, expired, or of questionable business value. Because of this lack of management, information continues to accumulate. This information build-up raises the cost of storage as well as the risk associated with eDiscovery.

In reality, as information ages, it probability of re-use and therefore its value, shrinks quickly. Fred Moore, Founder of Horison Information Strategies, wrote about this concept years ago.

The figure 1 below shows that as data ages, the probability of reuse goes down…very quickly as the amount of saved data rises. Once data has aged 10 to 15 days, its probability of ever being looked at again approaches 1% and as it continues to age approaches but never quite reaches zero (figure 1 – red shading).

Contrast that with the possibility that a large part of any organizational data store has little of no business, legal or regulatory value. In fact the Compliance, Governance and Oversight Counsel (CGOC) conducted a survey in 2012 that showed that on the average, 1% of organizational data is subject to litigation hold, 5% is subject to regulatory retention and 25% had some business value (figure 1 – green shading). This means that approximately 69% of an organizations data store has no business value and could be disposed of without legal, regulatory or business consequences.

The average employee creates, sends, receives and stores conservatively 20 MB of data per day. This means that at the end of 15 business days, they have accumulated 220 MB of new data, at the end of 90 days, 1.26 GB of data and at the end of three years, 15.12 GB of data. So how much of this accumulated data needs to be retained? Again referring to figure 1 below, the blue shaded area represents the information that probably has no legal, regulatory or business value according to the 2012 CGOC survey. At the end of three years, the amount of retained data from a single employee that could be disposed of without adverse effects to the organization is 10.43 GB. Now multiply that by the total number of employees and you are looking at some very large data stores.

Figure 1: The Lifecycle of data

The above lifecycle of data shows us that employees really don’t need all of the data they squirrel away (because its probability of re-use drops to 1% at around 15 days) and based on the CGOC survey, approximately 69% of organizational data is not required for legal, regulatory retention or has business value. The difficult piece of this whole process is how can an organization efficiently determine what data is not needed and dispose of it automatically…

As unstructured data volumes continue to grow, automatic categorization of data is quickly becoming the only way to get ahead of the data flood. Without accurate automated categorization, the ability to find the data you need, quickly, will never be realized. Even better, if data categorization can be based on the meaning of the content, not just a simple rule or keyword match, highly accurate categorization and therefore information governance is achievable.

Next Generation Technologies Reduce FOIA Bottlenecks


Federal agencies are under more scrutiny to resolve issues with responding to Freedom of Information Act (FOIA) requests.

The Freedom of Information Act provides for the full disclosure of agency records and information to the public unless that information is exempted under clearly delineated statutory language. In conjunction with FOIA, the Privacy Act serves to safeguard public interest in informational privacy by delineating the duties and responsibilities of federal agencies that collect, store, and disseminate personal information about individuals. The procedures established ensure that the Department of Homeland Security fully satisfies its responsibility to the public to disclose departmental information while simultaneously safeguarding individual privacy.

In February of this year, the House Oversight and Government Reform Committee opened a congressional review of executive branch compliance with the Freedom of Information Act.

The committee sent a six page letter to the Director of Information Policy at the Department of Justice (DOJ), Melanie Ann Pustay. In the letter, the committee questions why, based on a December 2012 survey, 62 of 99 government agencies have not updated their FOIA regulations and processes which was required by Attorney General Eric Holder in a 2009 memorandum. In fact the Attorney General’s own agency have not updated their regulations and processes since 2003.

The committee also pointed out that there are 83,000 FOIA request still outstanding as of the writing of the letter.

In fairness to the federal agencies, responding to a FOIA request can be time-consuming and expensive if technology and processes are not keeping up with increasing demands. Electronic content can be anywhere including email systems, SharePoint servers, file systems, and individual workstations. Because content is spread around and not usually centrally indexed, enterprise wide searches for content do not turn up all potentially responsive content. This means a much more manual, time consuming process to find relevant content is used.

There must be a better way…

New technology can address the collection problem of searching for relevant content across the many storage locations where electronically stored information (ESI) can reside. For example, an enterprise-wide search capability with “connectors” into every data repository, email, SharePoint, file systems, ECM systems, records management systems allows all content to be centrally indexed so that an enterprise wide keyword search will find all instances of content with those keywords present. A more powerful capability to look for is the ability to search on concepts, a far more accurate way to search for specific content. Searching for conceptually comparable content can speed up the collection process and drastically reduce the number of false positives in the results set while finding many more of the keyword deficient but conceptually responsive records. In conjunction with concept search, automated classification/categorization of data can reduce search time and raise accuracy.

The largest cost in responding to a FOIA request is in the review of all potentially relevant ESI found during collection. Another technology that can drastically reduce the problem of having to review thousands, hundreds of thousands or millions of documents for relevancy and privacy currently used by attorneys for eDiscovery is Predictive Coding.

Predictive Coding is the process of applying machine learning and iterative supervised learning technology to automate document coding and prioritize review. This functionality dramatically expedites the actual review process while dramatically improving accuracy and reducing the risk of missing key documents. According to a RAND Institute for Civil Justice report published in 2012, document review cost savings of 80% can be expected using Predictive Coding technology.

With the increasing number of FOIA requests swamping agencies, agencies are hard pressed to catch up to their backlogs. The next generation technologies mentioned above can help agencies reduce their FOIA related costs while decreasing their response time.

Healthcare Information Governance Requires a New Urgency


From safeguarding the privacy of patient medical records to ensuring every staff member can rapidly locate emergency procedures, healthcare organizations have an ethical, legal, and commercial responsibility to protect and manage the information in their care. Inadequate information management processes can result in:

  • A breach of protected health information (PHI) costing millions of dollars and ruined reputations.
  • A situation where accreditation is jeopardized due to a team-member’s inability to demonstrate the location of a critical policy.
  • A premature release of information about a planned merger causing the deal to fail or incurring additional liability.

The benefits of effectively protecting and managing healthcare information are widely recognized but many organizations have struggled to implement effective information governance solutions. Complex technical, organizational, regulatory and cultural challenges have increased implementation risks and costs and have led to relatively high failure rates.  Ultimately, many of these challenges are related to information governance.

In January 2013, The U.S. Department of Health and Human Services published a set of modifications to the HIPAA privacy, security, enforcement and breach notification rules.  These included:

  • Making business associates directly liable for data breaches
  • Clarifying and increasing the breach notification process and penalties
  • Strengthening limitations on data usage for marketing
  • Expanding patient rights to the disclosure of data when they pay cash for care

Effective Healthcare Information Governance steps

Inadvertent or just plain sloppy non-compliance with regulatory requirements can cost your healthcare organization millions of dollars in regulatory fines and legal penalties. For those new to the healthcare information governance topic, below are some suggested steps that will help you move toward reduced risk by implementing more effective information governance processes:

  1. Map out all data and data sources within the enterprise
  2. Develop and/or refresh organization-wide information governance policies and processes
  3. Have your legal counsel review and approve all new and changed policies
  4. Educate all employees and partners, at least annually, on their specific responsibilities
  5. Limit data held exclusively by individual employees
  6. Audit all policies to ensure employee compliance
  7. Enforce penalties for non-compliance

Healthcare information is by nature heterogeneous. While administrative information systems are highly structured, some 80% of healthcare information is unstructured or free form.  Securing and managing large amounts of unstructured patient as well as business data is extremely difficult and costly without an information governance capability that allows you to recognize content immediately, classify content accurately, retain content appropriately and dispose of content defensibly.

Ineffective eDiscovery Processes Raise the Cost of Healthcare


Healthcare disputes arise for many reasons.  Healthcare providers challenge payors’ claims policies, practices and actual payments.  Health insurance beneficiaries and healthcare providers dispute coverage decisions by payors.  Patients file malpractice claims when the end result of a medical procedure doesn’t meet their expectations. Healthcare disputes can lead to litigation which also leads to eDiscovery. Healthcare eDiscovery can be complex and burdensome due to the myriad formats used as well as the data security requirements imposed via federal and state regulatory requirements.

New healthcare information management requirements are changing the way healthcare organizations evolve their enterprise infrastructures as new regulatory requirements direct how information is created, stored, shared, referenced and managed. As new information governance technology is adopted and changes how patient and business records are utilized, healthcare providers as well as healthcare payors and suppliers will have to change and adapt how they respond to eDiscovery.

Healthcare eDiscovery Key Requirements and Recent Developments

The 2006 amendments to the Federal Rules of Civil Procedure (FRCP) established that all forms of ESI are potentially discoverable if not deemed privileged or heresy by the Judge, and apply to all legal actions filed in federal courts on or after December 1, 2006. Under the FRCP, any information potentially relevant to the case, whether in paper or electronic format, is subject to an eDiscovery request. Many states have adopted the federal rules of civil procedure in whole or in part with respect to defining what’s discoverable when it comes to electronic data.

The eDiscovery process for the healthcare industry is the same as for any other industry except that special care has to be taken with patient data. When attorneys do handle protected health information (PHI), they must be aware of state and federal legal ramifications of being exposed to this type of information. Failure to do so could lead to significant fines and damaged reputations stemming from the improper handling of PHI.

Effective Healthcare eDiscovery steps

eDiscovery is a complex process that requires a multidisciplinary approach to successfully implement and manage. Healthcare organizations should consider the following activities to successfully prepare for eDiscovery.

  1. Establish a litigation response team with a designee from the legal, HIM, and IT departments
  2. Review, revise, or develop an organizational information management plan
  3. Identify the data owners or stewards within the organization
  4. Review, revise, or develop an enterprise records retention policy and schedule
  5. Audit compliance with the records retention policy and schedule
  6. Penalize non-compliance with the records retention policy and schedule
  7. Conduct thorough assessment of the storage locations for all data including back-up media
  8. Review, revise, or develop organizational policies related to the eDiscovery process
  9. Establish an organizational program to educate and train/retrain all management and staff on eDiscovery and records retention compliance

The eDiscovery process is equivalent to searching warehouses, waste baskets, file cabinets, home offices, and personal notes to find that “needle in the haystack” that will help prove the other side’s claims. Healthcare organizations are finding it especially difficult to respond to and review the huge amounts of data due to additional healthcare specific data formats and regulatory requirements around patient privacy.

The huge expense of information review during litigation coupled with the high risk of enforcement action by regulatory authorities drives many legal professionals to seek a more proactive, defensible and cost efficient approach.


 

 

 

Coming to Terms with Defensible Disposal; Part 1


Last week at LegalTech New York 2013 I had the opportunity to moderate a panel titled: “Defensible Disposal: If it doesn’t exist, I don’t have to review it…right?” with an impressive roster of panelists. They included: Bennett Borden, Partner, Chair eDiscovery & Information Governance Section, Williams Mullen, Clifton C. Dutton, Senior Vice President, Director of Strategy and eDiscovery, American International Group and John Rosenthal, Chair, eDiscovery and Information Management Practice, Winston & Strawn and Dean Gonsowski, Associate General Counsel, Recommind Inc.

During the panel session it was agreed that organizations have been over-retaining ESI (which accounts for at least 95% of all data in organizations) even if it’s no longer needed for business or legal reasons. Other factors driving this over-retention of ESI were the fear of inadvertently deleting evidence, otherwise called spoliation. In fact an ESG survey published in December of 2012 showed that the “fear of the inability to furnish data requested as part of a legal or regulatory matter” was the highest ranked reason organizations chose not to dispose of ESI.

Other reasons cited included not having defined policies for managing and disposing of electronic information and adversely, organizations having defined retention policies to actually keep all data indefinitely (usually because of the fear of spoliation).

One of the principal information governance gaps most organizations haven’t yet addressed is the difference between “records” and “information”. Many organizations have “records” retention/disposition policies to manage those official company records required to be retained under regulatory or legal requirements. But those documents and files that fall under legal hold and regulatory requirements amount to approximately 6% of an organization’s retained electronic data (1% legal hold and 5% regulatory).

Another interesting survey published by Kahn Consulting in 2012 showed levels of employee understanding of their information governance-related responsibilities. In this survey only 21% of respondents had a good idea of what information needed to be retained/deleted and only 19% knew how  information should be retained or disposed of. In that same survey, only 15% of respondents had a general idea of their legal hold and eDiscovery responsibilities.

The above surveys highlight the fact that organizations aren’t disposing of information in a systematic process mainly because they aren’t managing their information, especially their electronic information and therefore don’t know what information to keep and what to dispose of.

An effective defensible disposal process is dependent on an effective information governance process. To know what can be deleted and when, an organization has to know what information needs to be kept and for how long based on regulatory, legal and business value reasons.

Over the coming weeks, I will address those defensible disposal questions and responses the LegalTech panel discussed. Stay tuned…

The Dangers of Infobesity at LegalTech


LegalTech just concluded in New York and one of the popular hot buttons many vendors were talking about was the idea that too much corporate, especially valueless, ungoverned, unstructured information is both risky as well as costly to organizations… I agree. The answer to this “infobesity” (the unrestricted saving of ESI because storage is supposedly cheap and saving everything is easier than checking with others to see if its ok to delete) is a defensible process to systematically dispose of information that’s not subject to regulatory requirements, litigation hold requirements or because it still has business value. In a 2012 CGOC (Compliance, Governance and Oversight Counsel) Summit survey, it was found that on the average 1% of an organization’s data is subject to legal hold, 5% falls under regulatory retention requirements and 25% has business value. This means that 69% of an organization’s ESI can be disposed of.

Several vendors at LegalTech were highlighting Defensible Disposal solutions, also known as defensible disposition and defensible deletion, as the answer to the problem of infobesity. Defensible Disposal is defined by many as a process (manual, automated or both) of identifying and permanently disposing of unneeded or valueless data in a way that will standup in court as reasonable and consistent. The key to this process is to be able to identify valueless information (not subject to regulatory retention or legal hold) with enough certainty to be able to actually follow through and delete the data. This may sound easy… its not. Many organizations are sitting on huge amounts of data because their legal department doesn’t want to be accused of spoliation, so has standing orders to “keep everything forever”. Corporate legal has to be convinced that the defensible disposal processes and solutions billed as being the answer to infogluttony can actually tell the difference, accurately and consistently, between information that should be kept and that information that’s truly valueless.

To automate this defensible disposal process, the solution needs to be able to be able to understand and differentiate content conceptually; that an apple is a fruit as well as a huge high tech company. The automated classification/categorization of content cannot accurately or consistently differentiate the meaning in unstructured content by just relying on keywords or simple rules.

An even less consistent approach to categorization is to base it on simple rules such as “delete everything from/to Bill immediately” or “keep everything to/from any accounting employee for 3 years”. This kind of rules based retention/disposition process will quickly have your GC explaining to a Judge why data that should have been retained was “inadvertently” deleted.

To truly automate disposal of valueless information in a consistently defensible manner, categorization applications must have the ability to first, conceptually understand the meaning in unstructured content so that only content meeting your intended intentions, regardless of language, is classified as “of value” to the organization not because it shares a keyword with other records but because it truly meets your definition of content that needs to be kept. Second, because unstructured data by definition is “free-flowing” (not structured into specific rows and columns) extremely high categorization accuracy rates and defensibly can only be achieved with defensible disposal solutions which incorporate an iterative training processes including “train by example” in a human supervised workflow.