Tuesday, January 12, 2016

Full Extract vs Incremental Extract




There are two approaches to refresh an Extract-> Full Refresh and Incremental Refresh

Full Refresh-> 

As the  name says the Extract will be fully refreshed. All the  data/rows in the Extract will be completely replaced by the new data/rows from the data source.

This is the default refresh.

Depending on the number of records we fetch, it may sometime be expensive on the data source and may take long time.


Incremental Refresh->


Rather than refreshing the entire extract, we can set it up to only add the rows that are new since the last time we extracted data. For example, we may have a data source that is updated daily with new sales transactions. Rather than rebuild the entire extract each day, we can just add the new transactions that occurred that day. (Then once a week you may want to do a full refresh just to be sure you have the most up to date data.)


But before configuring Incremental refresh we need to check following->

a) Does the data always gets inserted and never gets updated / Deleted ?


b) Does the data set have a unique key (A integer serial number which always increase or a date that always increase)?


If answer to both these questions in "Yes" then only we should configure Incremental Refresh or else we have to go for Full Refresh.


Reference:
https://community.tableau.com/thread/204061




Sunday, January 10, 2016

Data Warehouse Design Approaches (Inmon vs Kimball)




When we consider the Data Warehouse approaches, there are primarily two approaches- Top-Down and Bottom-Up

The selected approach will deeply affect the Data Warehouse lifecycle.



Top-down (Inmon’s Approach)

In this approach, first the Data Warehouse is built which is mostly a normalized Data Model and thereafter Data Marts are fed from the Enterprise Data Warehouse. The DMs are specific to subject area or departments.






Advantages of This Approach:

The Enterprise Data Warehouse has the data from most or all departments of the Enterprise. 
We will have integrated Data Warehouse. We can generate analytics for across the Subject Areas or Departments. 

Disadvantages of This Approach:

  • High initial cost.
  • Integrating Master Data from all the departments would be challenging. It may lead to implementation of Master Data Management which itself is substantial undertaking.
  • It will take significant to time to come up with a even small prototype of reporting; and business users may lose interest and trust.
  • Long term implementation plan may be discouraging.



Bottom-Up (Kimball’s Approach)


In this approach, the Data Marts are created first, each specific to a particular subject area or department of the organization. 






And as and when required, the Data Marts can combine into a larger Data Warehouse.
Kimball’s Data Warehouse architecture is also known as Data Warehouse bus (BUS).
Takes lesser time, shorter time for initial setup.

Which approach to choose?

It depends on the business objective of the organization, nature of business, time and cost.

Inmon’s approach is more suitable for stable businesses where business changes are not very often. With changes, they do not change the design instead they accommodate the changes in existing data model.

However, if the focus is on quick win, the Kimball’s approach is more suitable.