.

Wednesday, May 2, 2018

DBSync Cloud Replication Tool for Salesforce

The DBSync Cloud Replication for Salesforce tool is used for synchronizing the data between Salesforce and Database. The Salesforce would be on cloud. The Database could be on cloud or on premises.

Some of the main application of this tool are

· Building Data Warehouse for Reporting or for Data Mining

· Data Migration

· Offline Data Cleansing and may be subsequent synchronization (We can bring the data from Salesforce to offsite database…. Cleans it…. and/or perform manipulations and load it back to Salesforce)

DBSync Cloud Replication can be invoked from

· Web interface

· Command line

· Shell

· Windows Scheduler

· Any ETL tool

DBSync Features:

• The application creates the objects in the target DB that do not already exist.

• The objects can be replicated in batch or in real-time

• The objects in Salesforce can be updated from the Data Warehouse (bi-directional)

Monday, February 5, 2018

Hive

I am newbie in Hive and Big Data. Writing some basic hive commands for quick reference-

--================Show Databases===============
hive> show databases;
--================Create Database==============
hive> create database hive_db;
--================Drop Database================
hive> drop database hive_db;

--================Create Table=================
hive> create table emp_hive (empid int, empname string);

hive> show tables;
OK
emp_hive Time taken:" 0.115 seconds Fetched: 2 row(s)
--================Rename A Table===============

hive> ALTER table emp_hive RENAME to emp_hive_New;

hive> show tables:
OK
emp_hive_New
--================Drop table======================
hive> drop table emp_hive_New;

--================Table types and usage=================
Hive has two types of structures like Internal and External tables depending on loading and design of schema in Hive

Internal Tables:
- First we create the table then load the data.
- This is "data on schema"
- By dropping this table, both data and schema will be removed
- the stored location of this table will be at /user/hive/warehouse

When to choose Internal Table ->
- If the the processing data is available in local file system
- If we want Hive to manage the complete lifecycle of data including the deletion

Create Internal Table:
hive> CREATE TABLE hive_internal_table (id INT, Name STRING);
Row format delimited
Fields terminated by '\t';

Load the Data into internal table
hive> LOAD DATA INPATH '/user/arora_hive/data.txt ' INTO table hive_internal_table;

Display t he content of the table

hive> select * from hive_internal_table;

Drop the internal table
hive> DROP TABLE hive_internal_table;

when we drop the table, its data is also deleted from Hive

External Tables:
- Data will be available in HDFS. The table is going to create on HDFS data.
- Its schema on Data.
- When we drop the table, we drop the schema only. The data still remains in HDFS.

When to Choose External Table:
- If the processing data is available in HDFS.
- When the files are being used outside of HIVE.

Create External Table:
hive> CREATE EXTERNAL TABLE hive_external_table (id INT, Name string)
Row format delimited
Fields terminated by '\t'
LOCATION '/user/arora_hive/hive_external_table';

If we do not specify the location at the time of table creation, we can load the data manually-

hive> LOAD DATA INPATH '/user/arora_hive/data.txt' INTO TABLE hive_external_table;

Display the content of the table
hive> select * from hive_external_table;

To drop the external table-
hive> DROP TABLE hive_external_table;

--=========Difference in Internal and External Tables
Feature Internal External

Schema Data on Schema Schema on Data
Storage Location /usr/hive/warehouse HDFS LOCATION
Data Availability Within Local File System Within HDFS

What is the difference in Order by and Sort by? Distributed by and cluster by?

Friday, January 13, 2017

Row Level Security in Tableau (Simple and Org Hierarchy Based)

Row Level Security requires setup both in Tableau workbooks and in database.

To be secure and perform well, it is recommended to use a Live Connection to the database. Extracts may be too large if we join a security table that is one-to-many to the data table.

Lets consider the Excel Sheet that comes with Tableau-> Sample - Superstore Sales (Excel).

Consider Data is "Orders" worksheet .

Now lets say we want user to see the data pertaining to his own region.

For example User-A will see data for Nunavut
User-B will see data for User-B

Create a worksheet that has mapping for Region and respective User. (If we have RDBMS then we will have similar mapping table. )

Now open the workbook -> Pull the tables (or worksheets in this case); make a join betwenk Order and User Mapping.

Open the Worksheet we we want to create our report. Create a calculated column as shown in below screen-

Place the calculated column in the Filter and Select True.

Now when User-A logs in then he will see only data for Region Nunavut; and User-B will for Atlantic.

--=========================================================---
We can have following tables also as security tables -

Users Table
Roles Table
Users-Roles-Mapping Table

And then the Data Table (or Fact table will have the Roles instead on individual users) and that Role column can be joined to Users-Roles-Mapping Table.

--===================Row Level Security Based on Organization Hierarchy========

(Refer to workbook here->
https://public.tableau.com/profile/vikas.arora.bidw#!/vizhome/RegionalSalesProfit-RowLevelSecurityonHierarchy/Dashboard1

)

A more practical and real life scenario would be implementation of Row-Level Security based on Organizational Hierarchy where a boss can see all the data (records) created by himself or by his sub-ordinates.

Lets consider a scenario->

Les add a excel worksheet having following data (which is having employee details and their org path-

This will be kind of Employee table maintained by may in HR database. Notice the Org Path Desc the column that we will use for implementing security based on org hierarchy The column login ID is the one which LDAP Id.

Now lest see the a column Emp ID that has been added in Orders worksheet, this indicates the row created by the employee id.

As per the data in Empl ID- following is the mapping ->
Atlantic-> All records by Emp ID 1010 (Karen)
Northwest Territories-> All records by Emp ID 1010 (Karen)
Nunavut-> All records by Emp ID 1006 (varsha)
Ontario-> All records by Emp ID 1008 (monika)
Prarie -> All records by Emp ID 1008 (monika)

Quebec ->All records by Emp ID 1003 (vaurora)

West-> All records by Emp ID 1003 (vaurora)

Yukon->All records by Emp ID 1009 (david)

Lets see the workbook->

Create a Calculated Field -> RowBasedSecurityFilter

Place the field in Filter Shelf and select True.

So now the for different Regions appear based on the org hierarchy.

Monday, January 9, 2017

ODI 12c Standalone Agent Configuration

This involves three steps:

1) Creating a Domain for Standalone Agent

2) Creating Physical Agent in ODI Studio

3) Starting the Agent

1) Creating a Domain for Standalone Agent

Start FM Configuration Wizard:

On Unix machine, navigate to

/u01/Middleware/ODI12c/oracle_common/bin

Execute ./config.sh

If you do not yet have a WLS domain, then the first one created will be called base_domain by default. You can rename it to something such as odi_agent or mydomain99, just make sure that it remains in $FMW_HOME/user_projects/domains/. If you already have a domain named base_domain, then you cannot "create" another one, you would have to rename it to base_domain2 or something else, or you could Update an existing domain.

Click Next

Select Oracle Data Integrator - Standalone Agent. It is standalone in the sense that it does not use WLS.

Select as shown, Click Next.

Select Manual Configuration. Click Next

Enter the parameters as shown.

(Change hostname to the server where you installed the Master and Work Repository.)

(Mention passwords for schemas PROD_STB and PROD_ODI_REPO; these schemas are in the database where you installed the Master and Work Repository)

Click Next.

Leave these to default.

The is the password of ODI user that we use to connect from ODI Studio.

Note down the PORT and Password, we will need these while starting Agent.

Click Next.

Mention password as odi_agent

Click Next

Click Next.

Click Finish.

2) Create a Physical Agent in ODI Studio

Open ODI Studio

Enter Wallet Password.

Enter ODI Username and Password

On Topology Manager, Right Click Agent and Select New Agent

Enter details as we mentioned during domain configuration

Physical Agent OracleDIAgent1 appears under Agent tree.

Save.

3) Start the Agent

On Unix machine, Navigate to

/u01/Middleware/ODI12c/user_projects/domains/base_domain/bin

Execute ./agent -NAME=OracleDIAgent1 -PORT=20910

Agent is started.

Reference:
http://www.oracle.com/webfolder/technetwork/tutorials/obe/fmw/odi/odi_12c/odi12c_agent/odi12c_agent.html#section1s1

Friday, January 6, 2017

Planning your Tableau Deployment

Planning your Tableau deployment->

1) How will you license your installation?
2) How will users authenticate to Tableau Server?
3) How will Tableau Server access data sources?
4) What hardware will you need?

5) High Availability Architecture/Failover Architecture

Monday, September 26, 2016

Friday, September 16, 2016

OBIEE 12c Installation on Windows

Pre-requisite

a) Oracle Database installation (version 11.2.0.4 or higher)
B) JDK Installation

Steps Involved

a) Install Weblogic Server 12c
b) Install OBIEE 12c
c) Create BI Schemas using RCU
d) Set Environment Variables
e) Create BI Domain

Download OBIEE 12c installation files (and Weblogic Server file) from OTN)