Surrogate key is
a substitution for the natural primary key. It is just a unique identifier or
number for each row that can be used for the primary key to the table. The only
requirement for a surrogate primary key is that it is unique for each row in
the table.
Data warehouses typically use a surrogate, (also known as artificial
or identity key), key for the dimension tables primary keys. They can use sequence generator, or Oracle
sequence, or SQL
Server Identity values for the surrogate key.
Reasons for
choosing Surrogate key (SK) are:
•If we replace
the Natural Key (NK) with a single Integer, it should be able to save a substantial amount of
storage space. The SKs of different Dimensions would be stored as Foreign Keys
(FK) in the Fact tables to maintain Referential Integrity (RI), and here
instead of storing of those big or huge NKs, storing of concise SKs would
result in less amount of space needed. The UNIQUE indexes built on the SK will
take less space than the UNIQUE index built on the NK which may be
alphanumeric.
•Replacing big,
ugly NKs and composite keys with beautiful, tight integer SKs is bound to
improve join performance, since joining two Integer columns works faster. So,
it provides an extra edge in the ETL performance by fastening data retrieval
and lookup.
•Advantage of a
four-byte integer key is that it can represent more than 2 billion different
values, which would be enough for any dimension and SK would not run out of
values, not even for the Big or Monster Dimension.
•SK is usually
independent of the data contained in the record, we cannot understand anything
about the data in a record simply by seeing only the SK. Hence it provides Data
Abstraction.
Historical Data: SK helps in maintaining historical data. Suppose
there us a department table in OLTP system, which has dept_id and bu_id as NK
(natural primary key or composite key) and lets sat there is department number
as primary key in OLTP.
If we do not
create SK when this department data is populated through ETL into DW and
department number is used in DW also as primary key. Then let’s say if a
particular department is moved from one BU1 to another BU2 in OLTP system then
the records in fact tables in DW will show as they are in department which is
in BU2 even historical records will appear as if they were always in BU2.
No comments:
Post a Comment