Monday, June 16, 2014

Change Data Capture in Oracle

Publisher/Subscriber Approach

1)     SYS
a.       The database should run in Archive Log mode
b.      Force Logging should be on.
c.       Create table space for three users (Albert, Publ_cdc and Subs_cdc). Separate table space is not mandatory and but is recommended for clean administration.
d.      Create three users – owner, publisher and subscriber
e.       Grant create session/table/sequence/procedure to three users. Grant connect/resource to three users.
f.        Grant execute_catalog_role and select_catalog_role to publisher and subscriber
g.       Grant execute permission on following DBMS packages to publisher and subscriber
                                                               i.      DBMS_CDC_PUBLISH
                                                             ii.      DBMS_CDC_SUBSCRIBE
                                                           iii.      DBMS_LOCK
h.      Make publisher as stream administrator
i.         Make user table(s) as capture enabled tables.
j.         Important SQL to check status on various things
2)     User/Owner
a.       Create table whose changes needs to be captured
b.      Grant select permission on the  table to publisher and subscriber
c.       Ensure all changes are recorded in redo logs
3)     Publisher
a.       Define CHANGE_SET, SOURCE TABLE and Publication
b.      Grant access of Change table to Subscriber
4)     Subscriber
a.       Create subscription to the publication
b.      Create subscribers view
c.       Activate subscription

5)     Test Asynchronous CDC


1)     SYS
a.       The database should run in Archive Log mode
b.      Force Logging should be on (below command should be run while database is in mont state (not in open)
                                                                           i.      alter database force logging;
                                                                         ii.      alter database add supplemental log data;
c.       Create table space for three users (Albert, Publ_cdc and Subs_cdc).
CREATE TABLESPACE TS_CDC DATAFILE 'C:\ORACLE\ORADATA\TEST10G\TS_CDC.DBF' SIZE 50M
EXTENT MANAGEMENT LOCAL AUTOALLOCATE
SEGMENT SPACE MANAGEMENT AUTO
LOGGING
FORCE LOGGING;

d.      Create three users – owner, publisher and subscriber
create user albert identified by albert
default tablespace ts_cdc
temporary tablespace temp ;

create user publ_cdc identified by publ_cdc
default tablespace ts_cdc
temporary tablespace temp ;

create user subs_cdc identified by subs_cdc
default tablespace ts_cdc
temporary tablespace temp;

e.       Grant create session/table/sequence/procedure to three users. And connect/resource also.

GRANT create session TO albert;
GRANT create table TO albert;
GRANT create sequence TO albert;
GRANT create procedure TO albert;
GRANT connect TO albert;
GRANT resource TO albert;

GRANT create session TO publ_cdc;
GRANT create table TO publ_cdc;
GRANT create sequence TO publ_cdc;
GRANT create procedure TO publ_cdc;
GRANT connect TO publ_cdc;
GRANT resource TO publ_cdc;

GRANT create session TO subs_cdc;
GRANT create table TO subs_cdc;
GRANT create sequence TO subs_cdc;
GRANT create procedure TO subs_cdc;
GRANT connect TO subs_cdc;
GRANT resource TO subs_cdc;
f.        Grant execute_catalog_role and select_catalog_role to publisher and subscriber since they need to access the dictionary i.e. views in SYS schema-
GRANT execute_catalog_role TO publ_cdc;
GRANT select_catalog_role TO publ_cdc;

GRANT execute_catalog_role TO subs_cdc;
GRANT select_catalog_role TO subs_cdc;
g.       Grant execute permission on following DBMS packages to publisher and subscriber

GRANT execute ON DBMS_CDC_PUBLISH TO publ_cdc;
GRANT execute ON DBMS_CDC_SUBSCRIBE TO publ_cdc;
GRANT execute ON DBMS_LOCK TO publ_cdc; -- not mandatory

GRANT execute ON DBMS_CDC_PUBLISH TO subs_cdc;
GRANT execute ON DBMS_CDC_SUBSCRIBE TO subs_cdc;
GRANT execute ON DBMS_LOCK TO subs_cdc; -- not mandatory

h.      Make publisher as stream administrator
execute dbms_streams_auth.grant_admin_privilege('publ_cdc');
The last statement makes PUBL_CDC a streams Administrator.
Let's see if that’s really true:

SQL> select * FROM dba_streams_administrator;

USERNAME LOC ACC
------------------------------ --- ---
PUBL_CDC YES YES
i.         Make user table(s) as capture enabled tables
exec dbms_capture_adm.prepare_table_instantiation(table_name => 'ALBERT.PERSOON');
j.         Important SQL to check status on various things
                                                                           i.      Check capture enabled tables
SQL> SELECT * FROM dba_capture_prepared_tables;

                                                                         ii.      Check if change set, source table or publication is defined?
SQL> SELECT * FROM DBA_SOURCE_TABLES;
SQL> SELECT * FROM cdc_change_sets$;
SQL> SELECT * FROM cdc_change_tables$;
                                                                       iii.      Check if change set is enabled?
SQL> SELECT * FROM CHANGE_SETS;
SQL> SELECT * FROM DBA_PUBLISHED_COLUMNS;
                                                                       iv.      Check if subscription exists?
SQL> SELECT * FROM cdc_subscribers$;
SQL> SELECT * FROM user_subscriptions;
SQL> SELECT * FROM DBA_SUBSCRIPTIONS;
2)     User/Owner
a.       Create table whose changes needs to be captured
create table PERSOON
(
userid number,
name varchar(30),
lastname varchar(30),
constraint pk_userid primary key (userid)
);
               SQL> insert into PERSOON  values (1,'john','west');
                               SQL> insert into PERSOON  values (2,'mary','east');

b.      Grant select permission on the  table to publisher and subscriber

GRANT SELECT ON PERSOON TO publ_cdc;
GRANT SELECT ON PERSOON TO subs_cdc;

c.       Ensure all changes are recorded in redo logs

      ALTER TABLE persoon ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

3)     Publisher

a.       Define CHANGE_SET, SOURCE TABLE and Publication

exec DBMS_CDC_PUBLISH.CREATE_CHANGE_SET('CDC_DEMO_SET', 'CDC Demo 2 Change Set', 'HOTLOG_SOURCE', 'Y', NULL, NULL);

b.      Create Change Table

SQL> BEGIN
DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE('publ_cdc', 'CDC_PERSOON', 'CDC_DEMO_SET',
 'ALBERT', 'PERSOON', 'userid number, name varchar(30), lastname varchar(30)',
 'BOTH', 'Y', 'Y', 'Y', 'Y', 'N', 'N', 'Y', 'TABLESPACE TS_CDC');
 END;
 /
c.       Grant Select on Change Table to subscriber

SQL> GRANT select ON CDC_PERSOON TO subs_cdc;

d.      Enable Capture on Change Table

SQL> exec DBMS_CDC_PUBLISH.ALTER_CHANGE_SET(change_set_name=>'CDC_DEMO_SET', enable_capture=> 'Y');

4)     Subscriber

a.       Create subscription to the publication

exec DBMS_CDC_SUBSCRIBE.create_subscription('CDC_DEMO_SET', 'cdc_demo subx', 'CDC_DEMO_SUB');

b.      Create subscribers view

SQL> BEGIN
 DBMS_CDC_SUBSCRIBE.subscribe('CDC_DEMO_SUB', 'ALBERT', 'PERSOON',
 'userid, name, lastname', 'CDC_DEMO_SUB_VIEW');
 END;
 /

c.       Activate subscription

exec DBMS_CDC_SUBSCRIBE.activate_subscription('CDC_DEMO_SUB');

5)     Logon to user (Albert) and make an insert –
First let’s see what is there in PRASOON table-
SQL> select * from PERSOON;

 USERID NAME LASTNAME
---------- ------------------------------ --------------
 1 john west
 2 mary east

Two records.

Now make an isert –

SQL> insert into PERSOON values (3,'harry','north');

1 row created.

SQL> commit;

connect subs_cdc/sub_cdc

Now lets see what we have and what we can retrieve from the CDC objects:

SQL> select * from PUBL_CDC.CDC_PERSOON;

OP CSCN$ COMMIT_TI XIDUSN$ XIDSLT$ XIDSEQ$ RSID$ ROW_ID$ USERNAME$
-- ---------- --------- ---------- ---------- ---------- ---------- ------------------ -------------
I 1176467 29-MAR-08 2 19 488 1 AAAMlmAAGAAAAAgAAA ALBERT

Indeed, Albert inserted 1 record, and so that information must be recorded in the Change Table CDC_PERSOON.  But we cannot see anything yet from the Subscribers View:

SQL> select * from CDC_DEMO_SUB_VIEW;

no rows selected

This is correct behavior. The Subscriber needs to use the DBMS_CDC_SUBSCRIBE.extend_window and DBMS_CDC_SUBSCRIBE.purge_window  procedures for populating and purging the view that shows the changes to Albert's PERSOON table.

                                                                           i.      So Subscriber uses following statement to populate the view
SQL> exec dbms_cdc_subscribe.extend_window('CDC_DEMO_SUB');
SQL> select * from CDC_DEMO_SUB_VIEW;

OP CSCN$ COMMIT_TI XIDUSN$ XIDSLT$ XIDSEQ$ RSID$ ROW_ID$ USERNAME$
-- ---------- --------- ---------- ---------- ---------- ---------- ------------------ -------------
I 1176467 29-MAR-08 2 19 488 1 AAAMlmAAGAAAAAgAAA ALBERT
Now here is the trick: The Subscriber should use the two forementioned procedures in a "loop" so to speak:

use DBMS_CDC_SUBSCRIBE.extend_window Be very "careful" and "disciplined"
 in using extend_window and purge_window
 use the information from the view for obvious reasons.

use DBMS_CDC_SUBSCRIBE.purge_window

If the Subscriber now purges the window, by using DBMS_CDC_SUBSCRIBE.purge_window
the view does not show entries anymore.

SQL> exec dbms_cdc_subscribe.purge_window('CDC_DEMO_SUB');

PL/SQL procedure successfully completed.

SQL> select * from CDC_DEMO_SUB_VIEW;

no rows selected
Now Albert connects again and does another insert.

connect albert/albert

SQL> select * from PERSOON;

 USERID NAME LASTNAME
---------- ------------------------------ --------------
 1 john west
 2 mary east
 3 harry north

SQL> insert into PERSOON
 2 values
 3 (4,'betty','south');

1 row created.

SQL> commit;

Commit complete.

connect subs_cdc/sub_cdc

SQL> select * from CDC_DEMO_SUB_VIEW;

no rows selected

There is nothing to view here, because the Subscriber needs to execute DBMS_CDC_SUBSCRIBE.extend_window

SQL> exec dbms_cdc_subscribe.extend_window('CDC_DEMO_SUB');

PL/SQL procedure successfully completed.

SQL> select * from CDC_DEMO_SUB_VIEW;

OP CSCN$ COMMIT_TI XIDUSN$ XIDSLT$ XIDSEQ$ RSID$ ROW_ID$ USERNAME$
-- ---------- --------- ---------- ---------- ---------- ---------- ------------------ -------------
I 1184151 29-MAR-08 10 6 511 10001 AAAMlmAAGAAAAAdAAA ALBERT

Now we are able to view the new change, that is record no 4, inserted by Albert.

Note that the Change Table does not get purged.

SQL> select * from publ_cdc.cdc_PERSOON;

OP CSCN$ COMMIT_TI XIDUSN$ XIDSLT$ XIDSEQ$ RSID$ ROW_ID$ USERNAME$
-- ---------- --------- ---------- ---------- ---------- ---------- ------------------ -------------
I 1176467 29-MAR-08 2 19 488 1 AAAMlmAAGAAAAAgAAA ALBERT
I 1184151 29-MAR-08 10 6 511 10001 AAAMlmAAGAAAAAdAAA ALBERT

Which shows the two changes from Albert since the activation of Async CDC

Now there REALLY is some sort of a "timing issue" with CDC, which you need to be very carefull of.  According to Oracle, you are supposed to use the data from the Subscribers View.
Ofcourse, you could use the information from the Change table(s), because its just another table  which you could query as usual. But that is NOT recommended.  Now, the view, or the Window, gets populated and purged. So, if you use the view(s), for example to populate Data Warehouse tables, you need to be very disciplined in using "extend_window" and "purge_window".
A "good" way of dealing with this potential problem, is in creating functions or procedures that work like

..
BEGIN
dbms_cdc_subscribe.extend_window();
..
YOUR CODE
..
dbms_cdc_subscribe.purge_window();
EXCEPTION
..
END;


Note: Above steps are for Oracle RDBMS as source. What if Source is non-Oracle RDBMS or a flat file??

Below is link implementing CDC using Informatica utilizing CheckSUM.
http://www.disoln.org/2012/12/Change-Data-Capture-CDC-Implementation-Using-CHECKSUM-Number.html