Why do we end up with Dirty Data [platform agnostic] ?

One of the biggest themes in operational databases and data warehouses alike that is universally recognized but far too often ignored is the cleanliness of the data.

From hundreds of meetings with data processing and IS staff, I have identified three consistent themes.

Although these three themes stand out dramatically as the biggest problems in corporate data access, the same data processing and IS staffs that identify them are usually attacking only the first two of them.

The three themes can be expressed by the following comments:

  • The data access issue: “We have one of the world’s largest sets of data but we can’t get access to it.”
  • The query tool issue: “I want the system to show me what is important and then I want to ask why.”
  • The data integrity issue: “We know that some of our data isn’t very good. For instance, we don’t have a single, centrally maintained customer list.” or “We know that the source doesn’t hold that info but it’s critical for us to compute it and report on it.”

Given the universality of these comments, it is strange that entire industries are being organized around the first two issues but the third issue seems to be something that we don’t want to talk about.

The database marketplace has responded to the need for data access with client/server architectures, dedicated data warehouse hardware and software, and whole families of communications schemes to connect users to their data.

The query tool marketplace is an embarrassment of riches. There are dozens of ad hoc query tools, report writers, and application development environments. We are well into the magnificent new generation of powerful tools for data warehousing end-user applications with dimensional OLAP/ ROLAP tools and the hot new data mining tools.

Yet the third issue, data integrity, languishes in the backwater of operational data & data migration & data warehousing. It is talked about briefly and then completely ignored.

There is a definite avoidance of the topic and very few plans in place to address data integrity at the same level as those plans that address data access or query tools.

Enterprises across the globe face these issues mostly due to:

  • Excessive unstructured growth of the IT & Data architecture
  • Mergers & Acquisitions
  • Faulty Enterprise Integration
  • Slim or Missing Master Data Management

All of the issues above can be mitigated with proper access to expert resources that can deliver and maintain an enterprise data architecture and the business value gained through a data clean up exercise in the long term is priceless

Thanks  for reading  this  article,

Next steps :

1.  Share this with your colleagues because Sharing  is Learning

        2. Comment below if you need any assistance

Powered by CodeReview – Let’s make it Better!