Tuesday, September 11, 2012

Loading Historical Data to any table in Oracle BI Apps

Loading Historical Data to any table in Oracle BI Apps
-----------------------------------------------------------------
Hello All,

In this post, i would be explaining how to perform a historical load on any table in your Oracle BI Apps environment. This methodology is applicable for both facts and dimensions.

First of all we need to understand the change capture logic used in the Informatica ETL Load mappings for any table. In any of the Data Warehouse table we should see columns like CHANGED_ON_DT, AUX1_CHANGED_ON_DT, AUX2_CHANGED_ON_DT, AUX3_CHANGED_ON_DT, AUX4_CHANGED_ON_DT. These columns would have directly been mapped to the LAST_UPDATE_DATE column in the source table. For example, in W_SALES_ORDER_LINE_F, the CHANGED_ON_DT column would be getting loaded from OE_ORDER_LINES_ALL.LAST_UPDATE_DATE. And if there are some other significant tables being used in the extract mapping, there LAST_UPDATE_DATE would be loaded to the AUX1, AUX2, AUX3 and AUX4.

The ETL logic would compare the CHANGED_ON_DT and other Auxiliary dates between the warehouse table and the stage table to decide whether to update the records or not the record already exists. Hence, if we need to do a complete historical load on any of the tables, first step would be to make one of these dates to NULL for all the records.

For example, if we need to do this on W_SALES_ORDER_LINE_F, we need to run below statement on the Warehouse Schema.

UPDATE W_SALES_ORDER_LINE_F SET CHANGED_ON_DT=NULL;

If we need to do this on a dimension table which is non SCD Type2, we need to run below statement

UPDATE W_PARTY_D SET CHANGED_ON_DT=NULL WHERE ROW_WID>0;

In case of SCD Type 2 dimension we need to make sure that we are including the CURRENT_FLG='Y' condition in the WHERE clause of the UPDATE query. For example if we need to perform historical load on W_PRODUCT_D the statement would look like

UPDATE W_PRODUCT_D SET CHANGED_ON_DT=NULL WHERE ROW_WID>0 AND CURRENT_FLG='Y';

Next step would be to set the dates from which we need to extract the data from source in DAC.

Here we need to understand how DAC behaves in different scenarios with respect to refresh dates.

As we know, DAC maintains the refresh dates for each of the table for all the connections under
Setup-Physical Data Source->Refresh Dates.

Note: If you have defined Micro ETL execution plans, DAC will track the refresh dates for all the tables in that EP separately. You should be able to see the EP name under the refresh dates tab for all those tables. So, if a table is used in regular EP as well as in micro EP, it will appear twice under the refresh dates tab one with the EP name and one without the EP name.

Below are the different scenarios

1. If the refresh date is not available for any of the primary source tables of a task, DAC will trigger a FULL command.

2. If the refresh date is NULL for any of the source tables of a task, DAC will use the refresh date whichever is minimum among the primary tables.

3. If same table is present in multiple EPs and the refresh date is not available in any of the EPs, DAC will trigger a FULL command.

4. If same table is present in multiple EPs and the refresh date is present in only one of the EPs, DAC will use that date for the other EP also where the refresh date is not available.

5. If same table is present in multiple EPs where one of them is Micro EP and the refresh date is latest in the non micro EP, then DAC will use the refresh date from non micro EP for the micro EP as well.

6. If the target table refresh date is less than that of the source table refresh date, DAC will use the target table refresh date and extract the data from source from that date.

Now, to do a historical load on W_SALES_ORDER_LINE_F from let's say 01/01/2000. We should execute the SQL statement given above in the Data Warehouse schema to set the CHANGED_ON_DT to NULL for all the records (Assumption here is the data that is present in W_SALES_ORDER_LINE_F is not older than 01/01/200).

After that, go to DAC Setup->Physical Data Sources->ORA_R1211(Connection that is pointing to your relational source)->Refresh Dates

Query for OE_ORDER_LINES_ALL

We have two options now. One is to set the refresh date of OE_ORDER_LINES_ALL to 01/01/2000, another is to delete that table from the refresh dates tab. If this table is present in multiple EPs repeat the step for all of the refresh dates.

Once this is done, start the Execution Plan and you should see that DAC uses the refresh date that you have set for the SDE task to extract the data from 01/01/2000 and subsequently since the CHANGED_ON_DT is NULL in the Warehouse table all the records will get updated in W_SALES_ORDER_LINE_F. If you have deleted the refresh dates, DAC will trigger FULL command any way for the SDE task to extract entire data and update the Warehouse table.

This process should be fairly straight forward if you understand the DAC Refresh Dates concept and the Informatica Update Logic in SILOS mapping. If you have any questions, you can post. I'll reply as soon as i can. Thanks.

2 comments:

  1. Hi, I have copied the DAC metadata schema from one db to another, this has DAC load run history. Now will it do an incremental load or full load?

    ReplyDelete
  2. Reading this in May, 2017. Awesome article! Thanks a lot.

    ReplyDelete