cancel
Showing results for 
Search instead for 
Did you mean: 

How to consume date ranges while taking any Month/Year wise count?

SAPDataMigrate
Participant
0 Kudos

Hi guys,

I am facing a scenario and need logic on how to build this in Data Services. Please help me on this :

There is a HR Master Data table in which I've following columns :

EMP_CODE

EMP_NAME

AREA

START_DATE

END_DATE

Example, records will look like this :

1003     Samuel     California     01 Apr 2011     31 Mar 2013

1003     Samuel     Arizona        01 Apr 2013     30 Nov 2015

1003     Samuel     New York     01 Dec 2015    31 Dec 9999

1007     Nathan      xyz              10 Aug 2013     10 Dec 2015

1009     Samarth    Caracus       30 Jun 1989     31 Mar 2011

1009     Samarth    Delhi            01 Apr 2011     25 Dec 2014

Each employee has it's historical records as well which will indicate change in location based on dates. END_DATE of 31 Dec 9999 indicates that employee is working till date and present in organization.

Now the requirement is to build a dashboard on this data in which KPIs will be showing the headcount of employees Month & Year wise. That means for example, how many employees were present in Dec 1997, Mar 2010, Feb 2016 etc. The data model should be robust enough, so that any month/year headcount information could be pulled out of it based on these two columns i.e. START_DATE & END_DATE. I mean we can add further columns if required to table but as of now we have these two date columns through which we need to build logic.

If we consider END_DATE column to find headcounts, then confusion is how to make system understand that EMP_CODE 1003 was present in all 12 months of year 2012, 2014, 2015 as well when queried from reporting level(Tableau). If we consume the existing data as is, then it will show 1003 was present in Mar 2013, Nov 2015, and Dec 9999. How to show it's presence in rest of months? Do we need to add the data for dates via some logic or how to handle this? Need to make data generic i.e. user can see any month/year headcount from the data I make available for him in table.

Please help on this urgently...Any help would be highly appreciated. Thank you for reading and paying attention.

, : Any inputs Please?

Accepted Solutions (1)

Accepted Solutions (1)

former_member187605
Active Contributor
0 Kudos

There's a very simple solution to this problem.

Use the Date_generation transform to generate a list of dates, far away in the past (start before the first employee joined) and far away in the future (at least till the end of this year).

Join the output from this transform with your table.

Join condition:

Date_Generation.DI_GENERATED_DATE >= STARTDATE and Date_Generation.DI_GENERATED_DATE <= ENDDATE

Group by Date_Generation.DI_GENERATED_DATE.

And map count(*) to 2nd (numeric) column.

Former Member
0 Kudos

Isn't this exactly what I said?

Answers (1)

Answers (1)

former_member198401
Active Contributor
0 Kudos

Where are you getting the HR data from. What is the Source(ERP/ Flat File/Database)?

We had done this kind of reporting in BO Webi using the SAP HR Rapid marts for Oracle which were available in 3.2. This is possible if you are using Oracle ERP. When we deploy the SAP HR rapid mart package, the tables are universe and reports are automatically created and the tables associated with HR are created at the database level(default) like HR_ORG, HR_EMPLOYEE etc

SAPDataMigrate
Participant
0 Kudos

Hi Sasi,

Thanks for paying attention!

I am getting the data from staging tables in MySQL, which are loaded from ECC HR master tables as is.

My task is not limited to source/target database. It's a logic, which we need to design. We need to find out how to consume entire date ranges present in data, instead of what is just entered in column.

As explained above in my query, I need a solution to figure out how to consume all the months and years in the date range of for example 10 Jan 2011 to 30 June 2014. If I try to query straight forwardly, it will be read as Begin date of 10 Jan 2011 and End date of 30 June 2014. Acc. to it employee was present in company on 10 Jan 2011 and 30 Jun 2014. What about the implicit range in between?

Please try to figure out and help.

Thanks

Samarth

Former Member
0 Kudos

Use a date dimension table. Your date dimension can be on the left side of the join and then you can select months from it based on your desired range. Then you can join to your date using the to and from ranges in your table containing the data. This isn't the most efficient query, but will probably work. If you were using MS SQL, then you'd probably want to try a window function.

As an aside, this doesn't seem like a Data Services question, but more of a mySQL query question.

SAPDataMigrate
Participant
0 Kudos

Hi Robin,

Thanks for your inputs.

No MS SQL, so no MSSQL functions. So, what will be the structure of Date dimension table like what all fields will it be holding?

I understand that thinking logic seems to be a SQL query but as we need to build this all in Data Services, so it's legit enough to be asked in BODS forum. Also, at ground level, whatever we do in Data Services or ANY other ETL tool is nothing but inferring SQL queries...right?

Former Member
0 Kudos

Technically you are correct, which is the best kind of correct, but you lose a lot of advanced features of databases if you only live in the ETL layer.

A date dimension table is a data warehousing concept. It is simply a table where the primary key is a date and the columns contain metadata about the date. So, for example, you could store a date, the text value for the day of the week, the starting and ending dates of the month the date is in, if the date is a holiday, etc. You could also calculate these data on the fly, but when you are in the data tier only, it is often easier just to store it. Now that I think about it, in DS you can use the Date_Generation transform since your requirements are pretty simple.

I'm going to assume you will do your calculation on the first day of the month for simplicity. This is a case where the Date_Generation transform will work well for you. You'll need to create a data flow with two parameters, $StartDate and $EndDate, which is the range of dates you are trying to calculate, and they should both be the first day of a month. Set them as the starting and ending dates for the transform and set it to generate dates monthly. Then you can join this to your actual data. The simple case I'm doing here assumes you will be fine with getting the number of employees on the first of the month and not dealing with hires and terminations throughout the month, that one is a little more complex. Your join condition is going to be where $FromDate <= [Date from Date_Generation] AND $ToDate >= [Date from Date_Generation]. This will get you all the records, and if you need a count you can just aggregate on the date and use the count() aggregation function.

Addendum:

One other thing I thought of. If you use the Date_Generation transform, I don't believe you'll get pushdown to the data tier. If you have a large volume of data, there could be performance issues. If this is an issue for you, then maybe the date dimension table would work better because as long as it is in the same database as your data, or in a different database linked in DS, you should have your query pushed down to the data tier.

If you didn't see where I was going with the date dimension table, it would be a table filled with the same dates the Date_Generation transform would create, and in your query you'd select out the range you need using the $StartDate and $EndDate parameters. You'd just put the first date of every month in there for a long enough range of time so you don't need to keep filling it. You'd need to start with the earliest date in your data and maybe go all the way up to the year 3000. You'd be creating a Y3K problem for yourself, but I suspect this code won't still be in use in a thousand years, or at the very least you'll be long gone.