What early arriving facts are? (Or late arriving dimension)
An early arriving fact, when a fact record arrives at the data warehouse before its dimension record.
when we lookup the surrogate key in our dimension table using the fact table natural key, the dimension member doesn’t exist (yet).
In many cases when we load our data warehouse, we would assume that dimensions and facts would arrive at the same time, or dimension would arrive before fact record that has reference to the dimension.
The reasons why we have to be able to handle early arriving facts are many
a) multiple source systems and/or
b) different load frequencies seems to be the dominating factors.
We might be loading our dimensions using a full load strategy with one load schedule, while using an incremental load strategy for our fact table with a close to real-time frequency.
An employee availing medical insurance through his employer is eligible for insurance coverage from the first day of employment. But the employer may not provide the medical insurance information to the insurance provider for several weeks. If the employee undergo any medical treatment during this time, his medical claim records will come as fact records with out having the corresponding patient dimension details.
Design Approaches
Depending on the business scenario and the type of dimension in use, we can take different design approaches.
1. Hold the Fact record until Dimension record is available.
2. 'Unknown' or default Dimension record.
1. Hold the Fact record until Dimension record is available
One approach is to place the fact row in a suspense table. The fact row will be held in the suspense table until the associated dimension record has been processed. This solution is relatively easy to implement, but the primary drawback is that the fact row isn’t available for reporting until the associated dimension record has been handled.
This approach is more suitable when your data warehouse is refreshed as a scheduled batch process and a delay in loading fact records until the dimension records are available is acceptable for the business.
2. Unknown' or default Dimension record
Another approach is to simply assign the “Unknown” dimension member to the fact record. On the positive side, this approach does allow the fact record to be recorded during the ETL process. But it won’t be associated with the correct dimension value.
The "Unknown" fact records can also be kept into a suspense table. Eventually, when the Dimension data is processed, the suspense data can be reprocessed and associate with a real, valid Dimension record.