Saturday, December 12, 2015

Create Basic Date Dimension with Data



This statement will create basic Date Dimension table with data starting from 1-Jan-2014 and till plus 2000 days.



CREATE TABLE DATE_DIM AS
  SELECT
       n AS Date_ID, -- (This is just a serial number not a date)
   1 as DATE_KEY,
       TO_DATE('31/12/2013','DD/MM/YYYY') + NUMTODSINTERVAL(n,'day') AS Full_Date, --- (PK)
       TO_CHAR(TO_DATE('31/12/2013','DD/MM/YYYY') + NUMTODSINTERVAL(n,'day'),'DD') AS Days,
       TO_CHAR(TO_DATE('31/12/2013','DD/MM/YYYY') + NUMTODSINTERVAL(n,'day'),'Mon') AS Month_Short,
       TO_CHAR(TO_DATE('31/12/2013','DD/MM/YYYY') + NUMTODSINTERVAL(n,'day'),'MM') AS Month_Num,
       TO_CHAR(TO_DATE('31/12/2013','DD/MM/YYYY') + NUMTODSINTERVAL(n,'day'),'Month') AS Month_Long,
       TO_CHAR(TO_DATE('31/12/2013','DD/MM/YYYY') + NUMTODSINTERVAL(n,'day'),'YYYY') AS YEAR
   FROM (
   SELECT LEVEL n
   FROM dual
   CONNECT BY LEVEL <= 2000
   );
   
update DATE_DIM SET DATE_KEY = YEAR||Month_Num||Days ;
--DATE_KEY in in format YYYYMMDD

Sunday, November 15, 2015

Data Warehouse Hardware Sizing (Disk/RAM/CPU)



While deciding on the sizing of hardware required we are primarily looking for 

1) Disk Space Required for Machine hosting Data Warehouse
2) RAM on the Machine
3) CPU

This is for ELT method where Data Warehouse and Integration Server would reside on same machine.

(not the ETL, where DW and integration server will be on different machines)



Disk Space: We will need to  consider following->

a) What size of master tables and transaction tables we will pull from source system into Data Warehouse that will form Dimensions and Facts.

b) Space would be needed by staging area. We may not pull all the columns of a particular table, we may pull only selective columns so space needed would be less.

c) The tables in source system will be in Normalized form whereas in DW they are demoralized so there would be redundancy and so  more space need for them in DW.

d) The dimensions that we create in DW may need to retain history, i.e. SCD implementation so space needed would be more.

e) The indexes created on Dimension and Fact tables would need space.

f) There would be aggregated/ summarized tables and/or MVs over and above leaf level fact tables so space would be needed  for them. The number of summary level fact would depend on the levels in dimensions, more the levels in dimensions more summary tables would require. Indexes would be required in summary tables also so would need space.


Wednesday, August 12, 2015

Master Data and Reference Data




Master Data:

A master data is information that an enterprise agree upon. And enterprise may or most likely to have several departments like HR, Marketing, Sales, Procurement, Manufacturing etc.

Now all these departments may have different system, HR department may have some HRMS, Marketing and Sales departments may have some Marketing and/or Sales Management system.

You would have products in the Marketing system and in the sales system too. That product data should be same in both the system, these are the products that both Market and sales agreed to use. It should not happen that both the 
systems have different set of products.

Another example could be Customer data.

Another example could be Employee data.

Another example could be Supplier data.

Reference Data:

Reference data is something that defines permissible limits to use the data.
Like currency codes which would be very much standards and would rarely change.

Reference data is mostly used to maintain data quality or data validation or data standardization purpose. 

If I am maintaining currency codes, then all the systems (be it HR, Sales etc..) should use same currency codes.


Another example of reference data could be Country Codes or State Codes.

Saturday, August 1, 2015

Tableau Vs OBIEE





The major difference between OBIEE and Tableau is that Tableau doesn't really have a semantic layer (the BMM layer in OBIEE).  With Tableau, we have to have data well manicured (cosmetic treatment) in order to work effectively.  There are differences but thetwo can and should co-exist in any organization.

Tableau Merits Over OBIEE

·         Quick and rapid development of reports and dashboards.  Easy enough for business people to figure out with little to no training.  Companies may use Tableau for prototyping and then building the enterprise wide reports in something like OBIEE.
·         Great desktop reporting tool.
·         Has an excellent recommendation tool for which type of visual analysis to use based on selected data.
·         Almost true WYSIWIG.
·         Positioning of dashboard objects much more flexible.
·         Good free on demand training and extremely active user community.
·         Maps are very easy to use with lots of cool functionality.


OBIEE Merits Over Tableau

·         More advanced dashboarding capabilities such as conditional guided navigation.  Also much better at managing tons of dashboards. Use of sub reports.
·         Stronger enterprise level security and user management.
·        Shared reporting objects like filters
·         Better at federating data (connecting to multiple sources) than Tableau.  Tableau can do it and is OK at it, but OBIEE is the best of all reporting tools at it.
·         The best functioning KPI/scorecard reports,  an area where OBIEE rocks the competition.
·         Great integration to other Oracle products. IE, can interface directly with Fusion/EBS/Forms.
·         Has multi-user development support and better environment management/migration.  we don't need multiple environments for Tableau or multiple users working on the same reports.
·         Ad-Hoc reports can be created through the web interface by any user you grant access to.  That user, however, is limited to what you have defined in your semantic layer.  Tableau requires their client to do ad-hoc.  However, the user can then do any kind of logical calculations or other data manipulation they wish.
·         Can cache queries.  Works very well and write very efficient SQL when directly querying databases.  Tableau has a proprietary data format that basically works as a cube, which can slow down very quickly on large data sets.  It also is is pretty lackluster at performing direct queries in the database.


While Tableau continues to add features very quickly, it isn't up to a true enterprise wide reporting platform.  It is still best served for departmental use in the hands of a data analyst.

OBIEE requires a lot more effort to create even basic visualizations and usually requires a full development team.


There are pros and cons to both.  

Difference between Data Blending and Data Join




Joining is where you are able to combine data from the SAME data source, for example worksheets in an Excel file or tables in an Oracle database.  You need a common field, also called a key, that tells Tableau how to join the data.  You also need a key to do blending.


Blending is where you able to combine data from DIFFERENT data sources, for example a worksheet in Excel with a table from an Oracle database.  This is very powerful because in the past you would typically need to have someone with the specific technical skills create a new data set with the combined information.

Wednesday, July 1, 2015

Tableau: Connections you can make with your dataset




We can either connect live to our data set or extract data onto Tableau.

Live: Connecting live to a data set leverages its computational processing and storage. New queries will go to the database and will be reflected as new or updated within the data.

Extract: An extract will make a static snapshot of the data to be used by Tableau’s data engine. The snapshot of the data can be refreshed on a recurring schedule as a whole or incrementally append data. One way to set up these schedules is via the Tableau server.



The benefit of Tableau extract over live connection is that extract can be used anywhere without any connection and you can build your own visualization without connecting to database.


When to use Live Connection:


When you need more updated data,  or if the data is updated in the database more frequently and you need updated data to reflect in the workbook/dashboard.


Database is Fast.




When to use Extract:


When Database is not fast enough for interactive analysis (you change the value in the filter and workbook takes long to refresh)


When you want to take the load off the database.


When you need to be off line (in air plane)and want to play with the workbook.


Friday, January 9, 2015

Early Arriving Fact (or Late Arriving Dimension)



What early arriving facts are? (Or late arriving dimension)

An early arriving fact, when a fact record arrives at the data warehouse before its dimension record. 


when we lookup the surrogate key in our dimension table using the fact table natural key, the dimension member doesn’t exist (yet).


In many cases when we load our data warehouse, we would assume that dimensions and facts would arrive at the same time, or dimension would arrive before fact record that has reference to the dimension.


The reasons why we have to be able to handle early arriving facts are many

a) multiple source systems and/or
b) different load frequencies seems to be the dominating factors.

We might be loading our dimensions using a full load strategy with one load schedule, while using an incremental load strategy for our fact table with a close to real-time frequency.


An employee availing medical insurance through his employer is eligible for insurance coverage from the first day of employment. But the employer may not provide the medical insurance information to the insurance provider for several weeks. If the employee undergo any medical treatment during this time, his medical claim records will come as fact records with out having the corresponding patient dimension details.

Design Approaches

Depending on the business scenario and the type of dimension in use, we can take different design approaches.
1. Hold the Fact record until Dimension record is available.
2. 'Unknown' or default Dimension record.


1. Hold the Fact record until Dimension record is available
One approach is to place the fact row in a suspense table. The fact row will be held in the suspense table until the associated dimension record has been processed. This solution is relatively easy to implement, but the primary drawback is that the fact row isn’t available for reporting until the associated dimension record has been handled.

This approach is more suitable when your data warehouse is refreshed as a scheduled batch process and a delay in loading fact records until the dimension records are available is acceptable for the business.

2. Unknown' or default Dimension record
Another approach is to simply assign the “Unknown” dimension member to the fact record. On the positive side, this approach does allow the fact record to be recorded during the ETL process. But it won’t be associated with the correct dimension value. 

The "Unknown" fact records can also be kept into a suspense table. Eventually, when the Dimension data is processed, the suspense data can be reprocessed and associate with a real, valid Dimension record.

Thursday, November 20, 2014

Enable or disable caching in the system level and table level


Disable Caching for the System

You can disable caching for the whole system by setting the ENABLE parameter to NO in the
NQSConfig.INI file and restarting the Oracle BI Server. Disabling caching stops all new cache entries
and stops any new queries from using the existing cache. Disabling caching allows you to enable it
at a later time without losing any entries already stored in the cache.

Caching and Cache Persistence Timing for Specified Physical Tables


You can set a cachable attribute for each physical table, allowing you to specify if queries for a that
table will be added to the cache to answer future queries. If you enable caching for a table, any query
involving the table is added to the cache.

[Vikas Arora]:
The NQSConfig.ini – Parameter ENABLE controls the global level setting. If this is set to NO then even if we set “Cache Never Expires” for individual tables then also the query that accesses that table is not cached.
If the ENABLE parameter is set to YES and then “Cache Never Expires” for individual table is checked then query accessing the table is cached. The location of cache is mentioned in NQSConfig.ini in parameter DATA_STORAGE_PATHS.


If the ENABLE parameter is set to YES and then “Cache Never Expires” for individual table is not checked then query accessing the table is not cached.

Monday, October 13, 2014

Implementation of Usage Tracking in OBIEE 11G



Step 1:Create S_NQ_ACCT, S_ETL_DAY,S_ETL_TIME_DAY in bipauth schema of database.

Create table script for S_NQ_ACCT will be available in BI_DEVPLATFORM schema.

Create table script and Data for S_ETL_DAY,S_ETL_TIME_DAY will be available in below path:


C:\Oracle11g_MiddlewareHome\instances\instance1\bifoundation\OracleBIServerComponent\coreapplication_obis1\sample\usagetracking\SQL_Server_Time

@Oracle_create_nQ_Calendar.sql
@Oracle_create_nQ_Clock.sql
@Oracle_nQ_Calendar.sql
@Oracle_nQ_Clock.sql

 Create the following view  in bipauth schema of database.

create view NQ_LOGIN_GROUP as
select distinct USER_NAME as LOGIN, USER_NAME as RESP
from S_NQ_ACCT;

Step 2:Take a copy of the usage tracking RPD from the below path and paste into a temporary directory   C:\TEMP

C:\Oracle11g_MiddlewareHome\instances\instance1\bifoundation\OracleBIServerComponent\coreapplication_obis1\sample\usagetracking





Step 3:Upgrade the Usage Tracking rpd which is in Temporary directory by following the below steps:

Open the command prompt and run the below steps.


set ORACLE_INSTANCE=C:\Oracle11g_MiddlewareHome\instances\instance1

cd C:\Oracle11g_MiddlewareHome\Oracle_BI1\bifoundation\server\bin

obieerpdmigrateutil.exe -I C:\TEMP\UsageTracking.rpd -O c:\temp\NewUsageTracking.rpd -L c:\temp\ldif -U weblogic


When asked for the encryption password, it means the new repository password which is required by 11g.Just give it a simple password as Admin123.






Step 4:Now  we can observe New Usage Tracking  rpd created in below path.
                       C:\TEMP



Step 5:We now need to add extra fields to the S_NQ_ACCT table in the physical layer in New Usage Tracking  rpd.

Expand the S_NQ_ACCT table in the physical layer and check that all of the following fields exist with the correct data type. 

Rename RUNAS_USER_NAME to IMPERSONATOR_USER_NAME and then add  ID, QUERY_BLOB and QUERY_KEY columns.








Finally change the data type for all of the DOUBLE fields below from INT to DOUBLE. It is important that the Nullable field is correct on every field.




Field Name              Data Type     Length     Nullable?
CACHE_IND_FLG           CHAR            1           No
COMPILE_TIME_SEC        DOUBLE                      Yes
CUM_DB_TIME_SEC         DOUBLE                      Yes
CUM_NUM_DB_ROW          DOUBLE                      Yes
END_DT                  DATETIME                    Yes
END_HOUR_MIN            CHAR            5           Yes
END_TS                  DATETIME                    Yes
ERROR_TEXT              VARCHAR         250         Yes
ID                      VARCHAR         50          No
IMPERSONATOR_USER_NAME  VARCHAR         128         Yes
NODE_ID                 VARCHAR         15          Yes
NUM_CACHE_HITS          DOUBLE                      Yes
NUM_CACHE_INSERTED      DOUBLE                      Yes
NUM_DB_QUERY            DOUBLE                      Yes
PRESENTATION_NAME       VARCHAR         128         Yes
QUERY_BLOB              LONGVARCHAR     4000        Yes
QUERY_KEY               VARCHAR         128         Yes
QUERY_SRC_CD            VARCHAR         30          Yes
QUERY_TEXT              VARCHAR         1024        Yes
REPOSITORY_NAME         VARCHAR         128         Yes
ROW_COUNT               DOUBLE                      Yes
SAW_DASHBOARD           VARCHAR         150         Yes
SAW_DASHBOARD_PG        VARCHAR         150         Yes
SAW_SRC_PATH            VARCHAR         250         Yes
START_DT                DATETIME                    Yes
START_HOUR_MIN          CHAR            5           Yes
START_TS                DATETIME                    Yes
SUBJECT_AREA_NAME       VARCHAR         128         Yes
SUCCESS_FLG             DOUBLE                      Yes
TOTAL_TIME_SEC          DOUBLE                      Yes
USER_NAME               VARCHAR         128           Yes


Step 6: Now right click on S_NQ_ACCT, select Physical Diagram -> Objects and Direct Joins.
The joins to S_ETL_TIME_DAY and S_ETL_DAY are fine, but the join to NQ_LOGIN_GROUP is broken – the line is red and there is no arrow. Click on it and delete it.




Create a new join from S_NQ_ACCT to NQ_LOGIN_GROUP and change the join condition to:"OBI Usage Tracking"."Catalog"."dbo"."NQ_LOGIN_GROUP"."LOGIN" = "OBI Usage Tracking"."Catalog"."dbo"."S_NQ_ACCT"."USER_NAME"





Then click ok to save and close the diagram.
Now right click on S_NQ_ACCT and select properties. If there is no Key defined, enter ID in Key name and select the ID field from the drop down in the Columns field. This defines the key on the table. Click ok.




Step 7: In the Physical layer right click on ‘OBI Usage Tracking’ database, select properties. In the General tab change the Database to Oracle 11g.




Step 8: Then click on the Features tab and click the ‘Reset to defaults’ button.





Step 9:Right click on the ‘Connection Pool’ connection pool, select Properties.


Give the DSN name as <database>.WORLD.

User Name: bipauth
Password:*********

Now edit the ‘Usage Tracking Writer Connection Pool’ connection pool and make the same changes as above.




Step 10:Check Consistency.


Step 11: Take a back up of DevRpd to merge this rpd with Usage Tracking rpd.

Step 12:create a blank repository, select File -> New Repository.  Enter a name (Test.rpd), a location (the temp directory), click the No option against import metadata then enter the password ********. Finally click Finish.




Step 13:Open the backup rpd of Dev in offline mode.
select File -> Merge

In the Merge Wizard screen select Merge Type ‘Full Repository Merge’

Then select the Blank RPD as the Original Master Repository and enter its password ********

Then select NewUsageTracking Repository as the Modified Repository and enter its password Admin123.

Leave the Save Merged Repository as the default name (usually the same name with (1) at the end) –  rename it later once the merge is complete. But  give the password for this rpd as *********.

Leave Equalize during merge unchecked. Click Next.




From drop down in the Decision column choose ‘Current’. Then click Finish.



Then  we can see the merged rpd and check the consistency.



Step 14:Update the NQSConfig.INI file, for  locate Usage Tracking section and update following highlighted information.


[USAGE_TRACKING]

ENABLE = YES; 


#  Parameters used for inserting data into a table (i.e. DIRECT_INSERT = YES).
#
PHYSICAL_TABLE_NAME = "OBI Usage Tracking"."Catalog"."dbo"."S_NQ_ACCT";
CONNECTION_POOL = "OBI Usage Tracking"."Usage Tracking Writer Connection Pool"; 
BUFFER_SIZE = 250 MB; 
BUFFER_TIME_LIMIT_SECONDS = 5; 
NUM_INSERT_THREADS = 5; 
MAX_INSERTS_PER_TRANSACTION = 1; 

Step 15:Make the merged rpd as online and Restart the services.
Step 16:Now copy and paste UsageTracking.catalog on the desktop
Step 17: Log into Analytics and go to Shared Folders in Catalog.
Step 18:  Click on Shared Folder and then click on Unarchive option  which is in the left bottom corner.
Step 19:  Then new window will open to unarchive the catalog. Select Usage Tracking.webcatalog in browse and Select ACL as  'Inherit' and then click on 'OK'.




Step 20: Now Usage Tracking dashboard will appear under Dashboards.