Slowly
Changing Dimension (SCD)
This applies to cases where the attribute for a record varies over
time. We give an example below:
Christina is a customer
with ABC Inc. She first lived in Chicago, Illinois. So, the original entry in
the customer lookup table has the following record:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
At a later date, she moved
to Los Angeles, California on January, 2003. How should ABC Inc. now modify its
customer table to reflect this change? This is the "Slowly Changing
Dimension" problem.
There are in general three
ways to solve this type of problem, and they are categorized as follows:
Type 1: The new record replaces the original record. No
trace of the old record exists.
Type 2: A new record is added into the customer dimension
table. Therefore, the customer is treated essentially as two people.
Type 3: The original record is modified to reflect the
change.
We next take a look at each
of the scenarios and how the data model and the data looks like for each of
them. Finally, we compare and contrast among the three alternatives.
Type 1
Slowly Changing Dimension
In Type 1 Slowly Changing Dimension, the new information simply
overwrites the original information. In other words, no history is kept.
In our example, recall we originally have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
After Christina moved from Illinois to California, the new
information replaces the new record, and we have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
California
|
Advantages:
- This is the easiest way to handle the Slowly Changing Dimension
problem, since there is no need to keep track of the old information.
Disadvantages:
- All history is lost. By applying this methodology, it is not
possible to trace back in history. For example, in this case, the company would
not be able to know that Christina lived in Illinois before.
Usage:
About 50% of the time.
When to use Type 1:
Type 1 slowly changing dimension should be used when it is not
necessary for the data warehouse to keep track of historical changes.
Type 2
Slowly Changing Dimension
In Type 2 Slowly Changing Dimension, a new record is added to the
table to represent the new information. Therefore, both the original and the
new record will be present. The new record gets its own primary key.
In our example, recall we originally have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
After Christina moved from Illinois to California, we add the new
information as a new row into the table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
1005
|
Christina
|
California
|
Advantages:
- This allows us to accurately keep all historical information.
Disadvantages:
- This will cause the size of the table to grow fast. In cases
where the number of rows for the table is very high to start with, storage and
performance can become a concern.
- This necessarily complicates the ETL process.
Usage:
About 50% of the time.
When to use Type 2:
Type 2 slowly changing dimension should be used when it is
necessary for the data warehouse to track historical changes.
Type 3
Slowly Changing Dimension
In Type 3 Slowly Changing Dimension, there will be two columns to
indicate the particular attribute of interest, one indicating the original
value, and one indicating the current value. There will also be a column that
indicates when the current value becomes active.
In our example, recall we originally have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
To accommodate Type 3 Slowly Changing Dimension, we will now have
the following columns:
·
Customer Key
·
Name
·
Original State
·
Current State
·
Effective Date
After Christina moved from Illinois to California, the original
information gets updated, and we have the following table (assuming the
effective date of change is January 15, 2003):
Customer Key
|
Name
|
Original State
|
Current State
|
Effective Date
|
1001
|
Christina
|
Illinois
|
California
|
15-JAN-2003
|
Advantages:
- This does not increase the size of the table, since new
information is updated.
- This allows us to keep some part of history.
Disadvantages:
- Type 3 will not be able to keep all history where an attribute
is changed more than once. For example, if Christina later moves to Texas on
December 15, 2003, the California information will be lost.
Usage:
Type 3 is rarely used in actual practice.
When to use Type 3:
Type III slowly changing dimension should only be used when it is
necessary for the data warehouse to track historical changes, and when such
changes will only occur for a finite number of time.