Monday, September 16, 2013

Surrogate Key


Surrogate key is a substitution for the natural primary key. It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table. 

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

Reasons for choosing Surrogate key (SK) are:

•If we replace the Natural Key (NK) with a single Integer, it should be able to save a substantial amount of storage space. The SKs of different Dimensions would be stored as Foreign Keys (FK) in the Fact tables to maintain Referential Integrity (RI), and here instead of storing of those big or huge NKs, storing of concise SKs would result in less amount of space needed. The UNIQUE indexes built on the SK will take less space than the UNIQUE index built on the NK which may be alphanumeric.

•Replacing big, ugly NKs and composite keys with beautiful, tight integer SKs is bound to improve join performance, since joining two Integer columns works faster. So, it provides an extra edge in the ETL performance by fastening data retrieval and lookup.

•Advantage of a four-byte integer key is that it can represent more than 2 billion different values, which would be enough for any dimension and SK would not run out of values, not even for the Big or Monster Dimension.

•SK is usually independent of the data contained in the record, we cannot understand anything about the data in a record simply by seeing only the SK. Hence it provides Data Abstraction.
Historical Data: SK helps in maintaining historical data. Suppose there us a department table in OLTP system, which has dept_id and bu_id as NK (natural primary key or composite key) and lets sat there is department number as primary key in OLTP.

If we do not create SK when this department data is populated through ETL into DW and department number is used in DW also as primary key. Then let’s say if a particular department is moved from one BU1 to another BU2 in OLTP system then the records in fact tables in DW will show as they are in department which is in BU2 even historical records will appear as if they were always in BU2.

But if we maintain a SK in DW then when department belonged to BU1 then leys say the SK assigned was Dept_SK1. And fact table also has Dept_SK1 as foreign key; now when department is assigned to BU2 in OLTP and when this comes to DW this is treated as new department and new SK is created, like Dept_SK2 (for same department and different BU i.e. BU2); And fact table will have new foreign key Dept_SK2. So fact table will have records for Dept_SK1 and Dept_SK2.


No comments:

Post a Comment