Usually when a company enrolls into a data clean-up exercise the focus is mainly on the in-depth analysis and profiling of the data.
Sometimes it’s worthwhile to step back from the in-depth look at data cleaning to get some perspective.
You should start asking questions like “Why is data dirty in the first place?”, “Are any of the clean-up exercise that we are currently doing reversible ?” or “Does this really need to become an ongoing process ?”
In most of the cases the root cause falls into two main categories :
- Poor systems
- Poor practices at the moment of data capture.
Often, someone such as a salesperson, field adjuster, or courier is stuck with the job of entering critical data. That person may have an awkward system with no built-in support for data integrity.
For example, a courier may simply have an “Address field” field on an hand-held device in which he is to enter the delivery address for the package he just picked up. Often you’ll end up with 20 variations of “Saint Katharine’s Way, London, UK”.
At the same time, the person responsible for data entry may not be motivated to drive data quality to the 100-percent level.
If your job is to drop data down a pipe and no one ever tells you that your efforts are worthwhile, then you aren’t going to do a perfect job for very long. You probably won’t even know what a perfect job entails.
The answer to the problem of data-entry quality is a kind of business re-engineering, which requires multiple steps such as:
- Providing well-engineered data-entry systems that are easy and compelling for professionals to use and that restrict the data entry to valid rules as much as possible
- Creating a single set of business rules for data validation of each kind of data
- Providing executive-level support for making data entry a high priority
- Providing explicit and regular feedback to the front-line professionals in the form of newsletters, contests, or rewards for superior data-entry performance
- Keep a track and report on the data quality enterprise wide
- Providing a corporate culture that admires and values attention to detail and data quality
Data warehouse and data-cleaning tools play a unique role in defining the need for this type of business re-engineering as the data warehouse is the perfect place to see the value of good data.
Sometimes, paradoxically, the data warehouse must make imperfect data available in order to show the organization how valuable perfect data would be.
Data-cleaning tools alert IS to the exact issues involved in in achieving clean data. The final architecture for data delivery will ideally be balanced between data originally entered as accurately as possible and powerful data-cleaning systems operating downstream in the data extract process to bring data up to the shining goal of being 100-percent correct.
What processes do you follow to ensure a high data quality & integrity ?
Until next time,
keep learning, keep searching, keep succeeding…