Every information governance program I have worked on has started with the creation of an information inventory, covering both structured and unstructured information, depending on of the governance program.
Creating a structured information inventory has traditionally been a simpler task, as many tools have existed to document the data being used in business applications, such as MDM and Business Glossary tools. However, creating an unstructured information inventory has posed a more daunting manual task.
Mid-sized organizations have hundreds of millions of files, while very large ones can have into the tens of billions of documents. This sheer enormity makes it impossible to identify and inventory those files using currently accepted methodologies and tools.
An unstructured inventory should address the following questions:
• What types of information do I have?
• Where is the information stored?
• Where within the organization is the information being used? i.e. Global/Cross Domain/Local
• What is the information being used for?
• Who is the audience for the information?
Solving an age-old problem
We now have a solution to this time-consuming, complex problem. DocAuthority’s Data Evolutional Artificial Intelligence (AI) automates the discovery, collection and categorization of documents into well-defined business categories, eliminating reliance on end-users.
The AI tool reads documents contextually to identify likenesses between them and creates categories, and extracts file properties that provide the core inventory information. The inventory created using AI delivers the core file information:
• File location
• Security rights
• Create/ Modified/ Last Accessed dates
DocAuthority presents the results of the discovery and categorization in a user interface that aggregates the information for the user to see clearly, as below.
The organization is then able to export file information into Excel, where it can be used for migrations or obsolescence analysis.
Many files are obsolete and offer little value to the organization
Very few information governance initiatives place obsolescence or the management of old and orphan data in the foreground. Some don’t do it at all. However, if you’re busy managing records retention, confidentiality and sensitive information in a business, then surface 25% of data which can be quickly and defensibly removed can have far reaching benefits beyond simply freeing up storage capacity.
At DocAuthority, we treat obsolescence as a by-product of most information management activities. We support, enable and accelerate activities that can quickly identify ROT (redundant, outdated, trivial information). Exposing the files properties provides the user with a view by last accessed or last modified. This view can be used in determining those obsolete flies that can be deleted.
Understanding the last modified and access dates is the key information needed to make intelligent decisions on reducing the storage footprint.
Storage is cheap - think again
To some extent, storage has become cheaper in the last few years. Organizations tend to add storage instead of cleaning up the storage they have, whereas taking the time to address data obsolescence will yield benefits beyond cost savings.
The customer reaps the benefits of additional storage capacity and quicker back-ups, as well as reducing their risk - although, just because data is obsolete doesn’t mean it has no risk. The Compliance, Governance and Oversight Council's latest benchmark found that enterprises spend as much as $50 million to protect 10 petabytes of data, and that $34.5 million of this is spent on protecting data that should be deleted. Reducing the amount of data makes it easier to get to the information you want without wading through the data you don’t need.
File cleanup results in information governance benefits
DocAuthority’s evolutionary AI enables organizations to easily discover and categorize their unstructured data into file groups. The first step is to understand the universe of information, as it's difficult to define policies without knowing the type of information your users are creating, storing and managing.
Starting your information governance program with a cleaned-up set of unstructured data, organized in an effective manner that offers users maximum benefit will give your information governance program the best chance of success.
Alan is a senior information management leader and AIIM Fellow focusing on helping organizations maximize the value of their information. Alan is a leading expert on multiple aspects of enterprise information management (EIM) including information governance (both data and content governance), enterprise content management, data management, digital rights management, and digital asset management. Get in touch with Alan on LinkedIn and Twitter.
by Alan Weintraub January 2019
by Mike Quinn December 2018