Skip to Content
avatar image
Former Member

Best way to extract data when source system tables don't have a last modified date

Hi,

I am working on creating a datamart and some of the big tables from oracle source system doesn't have last updated column. I am planning to extract full load to staging area which is on SQL server 2012 . So i will be deleting data from target before re-load in every run. The 4 tables in question from source system have 18.3 , 9.6 ,19.89 and 206 million records. I ran the job today , which has around 23 dataflow consisting of small and huge tables. My staging job is still running , its been more than 3 hours and i am anticipating it will run for 1-2 hours more.

Any suggestion on how to handle this scenario wherein im doing a full refresh of staging tables. I am afraid, if the simple extraction is taking a long time, how the performance will be when i will actually be doing transform and building dimensions and facts.

Thanks!

Vishwas

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

1 Answer

  • Oct 26, 2015 at 04:24 PM

    If you have no way to enable change-data capturing in your source database (triggers? log-based CDC? ...), your only option consists in speeding up the transfer process. But firrst you'll have to identify the bottleneck(s). Next to providing more hardware, look into:

    • source slow > increase parallellism, use partitioning...
    • target slow > disable all indexes
    • network slow > reduce transfer volumne by only copying those columns that are actually needed in target
    Add comment
    10|10000 characters needed characters exceeded

    • Former Member

      Thanks Dirk for the suggestions. I am fairly new to data services and facing few more challenges. Would you please provide inputs on below:

      I am trying to implement a dimension table having both scd type1 and type2 column. Now lets say my type 2 column is changed, it gonna create a new row with current flas as 'Y' and will expire the old row with 'N' and the column will have new values in the new row. Now, if TYPE1 column is changed, its jut updates the latest row with current flag as 'Y" and older row with flas 'N" still has older value for type1 column. How can i update all the type 1columns in older rows as well.

      The effective date for the record is 1/1/1900 and expiration date is 12/31/2050 when i run the load first time.When history is preserved the older record expiration date is set to sysdate and a new record with effective date as sysdate has to be added and the expiration date as 12/31/2050?

      I tried various things but somehow my dates are messed up.