Comparing and Merging Divergent Data in Oracle

By James Koopman on December 21, 2012


Extracting and loading data from one database to another can often leave us wondering how effective our migration attempts are and if the source and target tables are actually in sync after the process. Keeping two tables in sync can be cumbersome for development teams as well as DBAs that are often left with creating and executing a variety of scripts to test the quality of data.

To the rescue, Oracle has deployed a package and set of views to help validate and then re-sync database objects. Oracle’s package DBMS_COMPARISON enables users to compare two different objects, in the same database or across a database link, and then merge the two objects so that the objects are put back in sync. As an example, this post provides a comparison and merge of differences for an EMPLOYEE table that exists in the two different schemas HR_OLD & HR_NEW.

1. Create the comparison; specify a name of the comparison and the local and remote schema objects. Notice that a NULL database link means that both objects are local.

BEGIN
 DBMS_COMPARISON.CREATE_COMPARISON(
 comparison_name              => ‘employees_compare’,
 schema_name                     => ‘hr_new’,
 object_name                         => ‘employees’,
 remote_schema_name      => ‘hr_old’,
 remote_object_name          => ‘employees’,
 dblink_name                         => NULL);
 END;
 /
 PL/SQL procedure successfully completed.

2. Compare the two objects to see if there are any inconsistencies:

SET SERVEROUTPUT ON
 DECLARE
 consistent   BOOLEAN;
 scan_info    DBMS_COMPARISON.COMPARISON_TYPE;
 BEGIN
 consistent := DBMS_COMPARISON.COMPARE(
 comparison_name   => ‘employees_compare’,
 scan_info                    => scan_info,
 perform_row_dif        => TRUE);
 DBMS_OUTPUT.PUT_LINE(‘Scan ID: ‘||scan_info.scan_id);
 IF consistent=TRUE THEN
 DBMS_OUTPUT.PUT_LINE(‘No differences were found.’);
 ELSE
 DBMS_OUTPUT.PUT_LINE(‘Differences were found.’);
 END IF;
 END;
 /
 Scan ID: 24
 Differences were found.
 PL/SQL procedure successfully completed.

3. Query to show how many differences where found:

SELECT s.scan_id,
 c.comparison_name,
 c.schema_name,
 c.object_name,
 c.remote_schema_name,
 c.remote_object_name,
 s.current_dif_count
 FROM dba_comparison c, dba_comparison_scan_summary s
 WHERE c.comparison_name = s.comparison_name
 AND c.owner           = s.owner
 AND s.scan_id         = 24;
 
SCAN_ID  COMPARISON_NAME          SCHEMA   OBJECT_NA     REMOTE    REMOTE_OB   CUR_DIF_CNT
 ————–   —————————————     ————–     ———————    ————–     ———————    ————————
 24        EMPLOYEES_COMPARE    HR_NEW    EMPLOYEES   HR_OLD     EMPLOYEES           107

4. Query to see which rows were different. A ‘Yes’ in the local or remote column signifies that there is a row that exists both in the local and remote objects and a merge will have to occur. If there is a ‘No’ then a converge rule that must be specified in the next step on how to handle these; basically specifying which (local or remote) is the base system and causing a delete in the other.

SELECT c.column_name,
 r.index_value,
 DECODE(r.local_rowid,NULL, ‘No’,'Yes’) local_rowid,
 DECODE(r.remote_rowid,NULL,’No’,'Yes’) remote_rowid
 FROM dba_comparison_columns c,
 dba_comparison_row_dif r,
 dba_comparison_scan s
 WHERE r.scan_id                      = s.scan_id
 AND s.parent_scan_id         = 24
 AND r.status                           = ‘DIF’
 AND c.index_column            = ‘Y’
 AND c.comparison_name  = r.comparison_name
 AND c.owner                          = r.owner
 ORDER BY r.INDEX_VALUE;
 
COLUMN_NAME     INDEX_VALU     LOCAL     REMOTE
 ———————–     ——————      ———-     ———–
 EMPLOYEE_ID                100                   Yes          Yes
 EMPLOYEE_ID                101                   Yes          Yes
 EMPLOYEE_ID                 102                  Yes          Yes
 EMPLOYEE_ID                 103                  Yes          Yes
 EMPLOYEE_ID                 104                  Yes          Yes

5. Perform the merge; notice that the local object wins (replaces data in the remote object) when data is different between the two objects. This also causes rows to be deleted in the remote object if not found in the local object.

SET SERVEROUTPUT ON
 DECLARE
 scan_info    DBMS_COMPARISON.COMPARISON_TYPE;
 BEGIN
 DBMS_COMPARISON.CONVERGE(
 comparison_name  => ‘employees_compare’,
 scan_id          => 24,
 scan_info        => scan_info,
 converge_options => DBMS_COMPARISON.CMP_CONVERGE_LOCAL_WINS);
 DBMS_OUTPUT.PUT_LINE(‘Local Rows Merged :’||scan_info.loc_rows_merged);
 DBMS_OUTPUT.PUT_LINE(‘Remote Rows Merged:’||scan_info.rmt_rows_merged);
 DBMS_OUTPUT.PUT_LINE(‘Local Rows Deleted:’||scan_info.loc_rows_deleted);
 DBMS_OUTPUT.PUT_LINE(‘Remote Rows Deleted:’||scan_info.rmt_rows_deleted);
 END;
 /
 Local Rows Merged :0
 Remote Rows Merged:107
 Local Rows Deleted:0
 Remote Rows Deleted:0
 PL/SQL procedure successfully completed.

The compare and merge process become somewhat interesting when you trace the whole process. Take a look at the following trace file: 200903-01-SWI-db11FS_ora_3605.trc.out for some behind-the-scenes activity. I find it quite interesting, to compare and merge two tables that only had 107 rows of data in them, that:

  1. Nearly 450,000 buffers where retrieved
  2. It took nearly 30 seconds to compare and merge the two tables
  3. 300+ SQL statements were executed
  4. 5,500+ physical disk blocks where read from datafiles

There must be some exhaustive processing going on here. When looking at the tkprof output, you will quickly note that most of this “bad” time and processing was spent querying the ALL_TABLES, ALL_INDEXES, ALL_IND_COLUMNS, ALL_TAB_COLUMNS, and ALL_CONSTRAINTS views. If there is any good news about these atrocious numbers it’s that they probably won’t get worse as the process will always work on only two objects. The real numbers to watch will be those that work against the local and remote objects, EMPLOYEES in this example.

If you plan on using this comparison package I would strongly suggest you trace it a few times to check the overhead. I also fear that, in lieu of standard ETL processing, this package will be used. While the package is handy, depending on the size of the objects to compare, you might be better served to fine tune the processes that are responsible for the initial propagation of data between the two tables.

Related Posts

Leave a Reply