Alan Weintraub

Alan Weintraub

November 2019

Data Obsolescence Cloud Migration data cleansing

Cleaning up the ROT before migrating to the cloud

Successful cloud migrations require planning and file cleanup

Many organizations approach cloud migrations with a simple ‘lift and place’ approach.  They see their migration to the cloud as just a replacement for their onsite file shares.  Most organizations feel that storage is cheap and a move to the cloud would reduce their overall operational costs.  This may have been true in the early days of cloud file share support, but these days the cloud vendors have come to realize that the added storage requirements require additional infrastructure to support it.  This has resulted in most of cloud vendors regulating the amount of standard storage offered. Going beyond the standard offering results in significant monthly cost increases.  

Many organizations have allowed their users to store files without any storage limitations, resulting in a sea of digital hoarding.  Organizations tend to add storage instead of cleaning up the storage they have. Users would rather keep everything forever than take the time to cleanup on a regular basis.  Users will use the excuse that they never know when they will need to go back to an old file as part of a work effort. Data cleansing and addressing data obsolescence of unneeded files reduces the amount of data, making it easier to get to the data you want eliminating the need to wade through the data you don’t need.  Compliance, Governance and Oversight Council reports that an enterprise will spend as much as $50 million to protect 10 petabytes of data, and that $34.5 million of this is spent on protecting data that should be deleted. Another issue with digital hoarding is that many times files stored in old formats are no longer retrievable.   Migrating unreadable files to the cloud will only result in added costs to the organization. The answer to the digital hoarding problem is the creation of a file inventory that will help evaluate and choose those files that should be migrated to the cloud. Mid-sized organizations can have 100s of millions of files, and very large ones can have 10s of billions. The sheer size of the problem makes it impossible to identify and inventory those files using the currently accepted methodologies and tools.

We now have a solution to the time-consuming complex problem.  DocAuthority’s Data Evolutional Artificial Intelligence (AI) automates the discovery, collection and categorization of documents into well-defined business categories, eliminating reliance on end-users.  The AI tools contextual reads the documents to identify likenesses between documents and creates categories for each document collection. The tool extracts file properties that provide the core inventory information.  The inventory created using AI delivers the core file information that I have advised clients who are creating an information governance program:

  • File location
  • Owner
  • Security rights
  • Create/ Modified/ Last Accessed dates
  • Number of duplicate file
  • DocAuthority presents the results of the discovery and categorization in a user interface that aggregates the information together for the user to see.


Another benefit of DocAuthority is the ability to export the file information into excel where it can be used for migrations or obsolescence analysis


Many of the files are obsolete and have little value to the organization

Very few cloud migration initiatives really place data obsolescence or the data cleansing of old and orphan data in the foreground. Some don’t do it at all. However, if you’re busy managing records retention, confidentiality and sensitive information in a business, then surfacing 25% of data which can be quickly and defensibly removed can have far reaching benefits beyond simply freeing up storage capacity. At DocAuthority we treat obsolescence as key step in the migration preparation. We support, enable and accelerate activities which can quickly identify ROT.  Exposing the files properties provides the user with a view by last accessed or last modified. This view can be used in determining those obsolete files that can be deleted.


Understanding the last modified and access dates is the key information needed to make intelligent decisions on reducing the storage footprint.

File cleanup leads to a more efficient cloud migration

In summary, DocAuthority’s evolutionary AI provides organizations with the ability to easily discover and categorize their unstructured data into file groups.  Understanding the universe of information is the first step in developing a migration plan. It’s difficult to identify those files that should be migrated without first knowing the type of information your users are creating, storing and managing.  Starting your migration program with a cleaned-up set of unstructured data organized in an effective way will allow users to gain maximum benefit while giving your program the best chance of success.

Is your data playing hide and seek? Let us help you clean up the mess on this next webinar - tune in.


Recent posts


Information Management

Your Information Catalog is the first step towards Information Governance

I have worked with many organizations to develop their information governance program.  The first step I always ask them to do is to develop a macro ...

by Alan Weintraub January 2019

nasa-53884-unsplash (1)


Research launch reveals the business value of data

Last week we launched our ground-breaking research into the value of business data with the Ponemon Institute. Launch attendees were given a sneak ...

by Mike Quinn December 2018


Information Management

Making Records Managers Information Heroes

Records Managers have always struggled to be viewed as providing strategic value to their organization.  If you look at the history of records ...

by Alan Weintraub December 2018