When working on the design and implementation of a data-warehouse / business intelligence solution some of the most common questions that I receive are in relation to the ODS or Operational Data Store. In today’s post I’ll explain it’s purpose and the proper way to interact with it.
The definition of the Operational Data Store
An operational data store (or “ODS”) is a database designed to integrate data from multiple sources for additional operations on the data.
Unlike a master data store the data is not passed back to operational systems but it may be passed for further operations and to the data warehouse for reporting.
What the Operational Data Store – ODS – isn’t
The ODS isn’t a data storage step before the data warehouse. The ODS must not be used as a convenient staging ground for raw data not yet ready for data warehouseconsumption. The ODS isn’t inherently unqueryable.
What the Operational Data Store – ODS – is
The ODS must be the bedrock of the data warehouse itself. When data is extracted from the primary legacy systems, you must store it for safekeeping and for querying before you perform any summarization. It is this non-summarized, queryable, and long-lasting form of the legacy data that is the ODS.
Implications of the above
When the data is non-summarized, it is a direct image of the original legacy base-level records. The charm of capturing this base-level data is that you can only roll the data upward; it is not possible to descend to a lower level. The data warehouse is “off the hook” if it makes the most granular data available.
Base-level records from legacy systems take on many different forms. Base-level data generally consists of transactions (operations performed against a “customer”) or a snapshot of the “customer” taken at the end of the reporting interval (usually daily).
In both cases, the base-level record looks remarkably similar. The record consists of key entries and textual and numerical measured values. These base-level records are almost always just one step away from being both dimensional and queryable.
That one step consists simply of cleaning up the key values so that they point to clean dimensions. If a key value refers to a central, common business dimension such as customer, product, or geography, you should use a clean, corporate-wide key value for the customer, product, or geography key in the base data that points to the respective dimension table recognizable by the other data sources in the data warehouse.
When base-level legacy records are subjected to this initial cleaning of the primary dimension keys, two wondrous things happen.
- First, the base-level records start to become queryable because they can be tied via a simple star join to the primary dimensions of the surrounding business.
- Second, these records will be much easier to deal with in the future because some of the data processing needed to tie these records to other data in thedata warehouse has already been done. In particular, it will not be necessary in the future to remember the funny, idiosyncratic customer, product, and geography codes in these particular records.
If this lightly cleaned base-level data is housed in an actual relational database, rather than being held in an intermediate “suspense file,” then it really is queryable.
With an aggregate navigator, this data can be accessed automatically whenever users ask precise questions that no higher level aggregate can satisfy.
Although this is highly beneficial on Credit Risk / Financial Domain Reportingimplementations due to the facility it provides to drill down into the finer details and retrieve the actual numbers that stood behind the summary it should the standard approach for Data Warehouse and Business Intelligence Reporting implementations.
Until next time,
Keep learning, Keep searching and Keep succeeding!
Let’s connect on : LinkedIn
Follow me on Twitter : @lgleonte