Months after a successful dimensional data mart deployment, I was informed end users were unhappy. The users were apparently dissatisfied with the slowly changing dimension component of the solution. Improper product names continued to appear on reports because those products were named incorrectly when they were sold. End users wanted the products sold to reflect the proper product names rather than some earlier erroneous version of the product names.
Sure the development team discussed the pros and cons of slowly changing dimensions with those same users before we designed the data mart solution. As I recall, those users were excited about the ability to see the costs of products over time, the unit counts of packages over time and even the name of products over time. Now it seems those same users want to see only the latest product name "Super Galaxy X" instead of the two earlier product names "My Boss Is a Dolt" and "I Am So Tired".
By the time I was informed of the unhappy users, the question was posed to me as, "How do we turn off slowly changing dimensions?". I tried to explain that SCD is more like plumbing in your house than a lamp on your bed stand. You don't just turn SCD on and off at your convenience.
Really, turning off SCD means altering your ETL packages at a minimum. It could also mean altering your physical tables and possibly your reports. Code changes and testing mean costs and risks. The return on those costs and risks would actually be a loss of functionality because users would no longer have access to historic values like the costs of products over time. With a little creativity, there is a way to meet both the original and evolving business requirements while also minimizing business costs and risks.
Let's assume we have the following Type-2 slowly changing product dimension and a sales fact tables.
CREATE TABLE DimProduct ( ProductKey INTEGER IDENTITY, ProductID VARCHAR(10), Name VARCHAR(50) NOT NULL, CostAmount DECIMAL(8,2), PriceAmount DECIMAL(8,2), BeginEffectiveDate DATE NOT NULL, EndEffectiveDate DATE);, CREATE UNIQUE INDEX PKDimProduct ON DimProduct (ProductKey); CREATE UNIQUE INDEX AKDimProduct ON DimProduct (ProductID, EndEffectiveDate); CREATE TABLE FactSale ( SaleID INTEGER NOT NULL, DateKey INTEGER NOT NULL, ProductKey INTEGER NOT NULL, SaleQuantity DECIMAL(6,2), SaleAmount DECIMAL(8,2));
Users join FactSale to DimProduct on ProductKey. The proper historic CostAmount shows, but so does that awkward "My Boss Is a Dolt" product name. To allow users access to historic CostAmount values with an option of not perpetually offending their boss, we simply create a new view on the existing DimProduct table.
CREATE VIEW DimCurrentProduct AS SELECT h.ProductKey, h.ProductID, h.Name, h.CostAmount, h.PriceAmount, h.BeginEffectiveDate, h.EndEffectiveDate, c.Name AS CurrentName, c.CostAmount AS CurrentCostAmount, c.PriceAmount AS CurrentPriceAmount FROM DimProduct h INNER JOIN DimProduct c ON c.ProductID = h.ProductID
The DimCurrentProduct view essentially creates a Type-6 slowly changing product dimension with both historic attributes (e.g., Name, CostAmount) and current attributes (e.g., CurrentName, CurrentCostAmount) on a single row. Users join FactSale to DimCurrentProduct on ProductKey. Now the users can easily display CostAmount changes over time with the preferred CurrentName "Super Galaxy X". The new business requirement of showing all sales with current product names is met without altering any ETL packages and without losing any existing functionality.
Although including this sort of self-join in some reporting framework could meet your reporting needs, including it in the database ensures the solution is readily visible and universally available to all authorized database queries.
Yes, joining DimProduct to itself to create DimCurrentProduct incurs a performance cost every time a query selects from DimCurrentProduct. Since your alternate key is indexed on the DimProduct table, joining DimProduct to itself on the alternate key should only cause performance problems if the dimension table is very large or your indexes are not being properly maintained.