In last week’s article “What an ODS is and how should you use it” I’ve explained the concept of an Operational Data Store and how it fits in the overall data warehouse picture.
Today I’m going get into the specific characteristics of OLTP Databases versus Data Warehouses and how the dimensional model fits into the overall picture in the interesting world of investment banking data warehouse and business intelligence reporting.
OLTP characteristics and responsibilities
By comparison means one major difference between an OLTP system and a data warehouse is the latter’s ability to accurately describe the past.
OLTP systems are usually not so proficient when it comes to correctly represent a business as of a quarter or a year ago especially when we are talking about the complexity and higher degree of change usually present in the investment banking world.
A good OLTP system is always evolving and is this the main characteristic that it should posses because that’s the natural thing to do. The world is changing, and so are the businesses, changing everyday, by developing new traded products, credit cards, mortgage types etc., so your operational system has to be always flexible and scalable enough to meet businesses needs and not the other way around.
Intra-day transactions, costs of all sorts, contracts, credit card invoices are added and, thus, the operational data backlog is constantly changing. Descriptions of products, partners, and customers are constantly being updated, usually by overwriting.
The large volume of data in an OLTP system is typically purged every 90 to 180 days or higher amounts of data are even reduced to end-of-day values. For these reasons, it is difficult for an OLTP system to correctly represent the past especially in a high producing data industry as banking.
You don’t want/need to hold in an OLTP system, the entire history of trading prices for a stock during a day nor you need to store contract statuses, product descriptions or customer descriptions over a multi year period, as that will only complicate things and add unnecessary overhead to the usual operational flow and last but not least it will be extremely costly from an infrastructure perspective.
Data Warehouse characteristics and responsibilities
The data warehouse must accept the responsibility of accurately describing the past. By doing so, the data warehouse simplifies the responsibilities of the OLTP system. Not only does the data warehouse relieve the OLTP system of almost all forms of reporting, but the data warehouse contains special structures that have several ways of tracking historical data.
OLTP systems produce “flash reports” for management, and the people who run OLTP systems are proud of that capability. But beyond these simple daily and weekly summaries and counts, the OLTP environment is a very costly environment in which to do any kind of complex reporting. So that’s why the economics of reporting will always favor the data warehouse.
Dimensional Data Warehouse database
A dimensional data warehouse database consists of a large central fact table with a multi-part key. This fact table is surrounded by a single layer of smaller dimension tables, each containing a single primary key. In a dimensional database, these issues of describing the past mostly involve slowly changing dimensions.
A typical slowly changing dimension is a product dimension in which the detailed description of a given product is occasionally adjusted. For example, a minor change in the description or characteristics a of a product which does not require the creation of new product number (which the data warehouse has been using as the primary key in the product dimension), but nevertheless gives the data warehouse team a revised description of the product.
Another example would be traded products categories that might vary from a bank to another and each would have their approach in categorizing and reporting on each of the categories. Usually categories are changed when the bank has a change of perspective and develops different product packaging strategies to increase profitability. They may decide to create completely new categories or to slightly change some of them to reflect the new position.
The data warehouse team faces a dilemma when this happens. If they want the data warehouse to track both the old and new states, what do they use for the key? And where do they put the two values of the changed attribute?
Techniques
There are three main techniques for handling slowly changing dimensions in a data warehouse:
- Overwriting
- Creating another dimension record
- Creating a current value field or a set of effective date fields.
Each technique handles the problem differently is you the designer that has to consider the implications of each and choose among them bearing in mind that the implementation has to be strongly dependent on the business needs.
Join me next week when I’ll go into further details on each of the techniques outlined above.
Until next time,
keep learning, keep searching and keep succeeding…
Connect with me on : LinkedIn
Follow me on Twitter : @lgleonte