on 02-27-2016 10:23 PM
Hi guys,
I am facing a scenario and need logic on how to build this in Data Services. Please help me on this :
There is a HR Master Data table in which I've following columns :
EMP_CODE
EMP_NAME
AREA
START_DATE
END_DATE
Example, records will look like this :
1003 Samuel California 01 Apr 2011 31 Mar 2013
1003 Samuel Arizona 01 Apr 2013 30 Nov 2015
1003 Samuel New York 01 Dec 2015 31 Dec 9999
1007 Nathan xyz 10 Aug 2013 10 Dec 2015
1009 Samarth Caracus 30 Jun 1989 31 Mar 2011
1009 Samarth Delhi 01 Apr 2011 25 Dec 2014
Each employee has it's historical records as well which will indicate change in location based on dates. END_DATE of 31 Dec 9999 indicates that employee is working till date and present in organization.
Now the requirement is to build a dashboard on this data in which KPIs will be showing the headcount of employees Month & Year wise. That means for example, how many employees were present in Dec 1997, Mar 2010, Feb 2016 etc. The data model should be robust enough, so that any month/year headcount information could be pulled out of it based on these two columns i.e. START_DATE & END_DATE. I mean we can add further columns if required to table but as of now we have these two date columns through which we need to build logic.
If we consider END_DATE column to find headcounts, then confusion is how to make system understand that EMP_CODE 1003 was present in all 12 months of year 2012, 2014, 2015 as well when queried from reporting level(Tableau). If we consume the existing data as is, then it will show 1003 was present in Mar 2013, Nov 2015, and Dec 9999. How to show it's presence in rest of months? Do we need to add the data for dates via some logic or how to handle this? Need to make data generic i.e. user can see any month/year headcount from the data I make available for him in table.
Please help on this urgently...Any help would be highly appreciated. Thank you for reading and paying attention.
There's a very simple solution to this problem.
Use the Date_generation transform to generate a list of dates, far away in the past (start before the first employee joined) and far away in the future (at least till the end of this year).
Join the output from this transform with your table.
Join condition:
Date_Generation.DI_GENERATED_DATE >= STARTDATE and Date_Generation.DI_GENERATED_DATE <= ENDDATE
Group by Date_Generation.DI_GENERATED_DATE.
And map count(*) to 2nd (numeric) column.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Where are you getting the HR data from. What is the Source(ERP/ Flat File/Database)?
We had done this kind of reporting in BO Webi using the SAP HR Rapid marts for Oracle which were available in 3.2. This is possible if you are using Oracle ERP. When we deploy the SAP HR rapid mart package, the tables are universe and reports are automatically created and the tables associated with HR are created at the database level(default) like HR_ORG, HR_EMPLOYEE etc
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Sasi,
Thanks for paying attention!
I am getting the data from staging tables in MySQL, which are loaded from ECC HR master tables as is.
My task is not limited to source/target database. It's a logic, which we need to design. We need to find out how to consume entire date ranges present in data, instead of what is just entered in column.
As explained above in my query, I need a solution to figure out how to consume all the months and years in the date range of for example 10 Jan 2011 to 30 June 2014. If I try to query straight forwardly, it will be read as Begin date of 10 Jan 2011 and End date of 30 June 2014. Acc. to it employee was present in company on 10 Jan 2011 and 30 Jun 2014. What about the implicit range in between?
Please try to figure out and help.
Thanks
Samarth
Use a date dimension table. Your date dimension can be on the left side of the join and then you can select months from it based on your desired range. Then you can join to your date using the to and from ranges in your table containing the data. This isn't the most efficient query, but will probably work. If you were using MS SQL, then you'd probably want to try a window function.
As an aside, this doesn't seem like a Data Services question, but more of a mySQL query question.
Hi Robin,
Thanks for your inputs.
No MS SQL, so no MSSQL functions. So, what will be the structure of Date dimension table like what all fields will it be holding?
I understand that thinking logic seems to be a SQL query but as we need to build this all in Data Services, so it's legit enough to be asked in BODS forum. Also, at ground level, whatever we do in Data Services or ANY other ETL tool is nothing but inferring SQL queries...right?
Technically you are correct, which is the best kind of correct, but you lose a lot of advanced features of databases if you only live in the ETL layer.
A date dimension table is a data warehousing concept. It is simply a table where the primary key is a date and the columns contain metadata about the date. So, for example, you could store a date, the text value for the day of the week, the starting and ending dates of the month the date is in, if the date is a holiday, etc. You could also calculate these data on the fly, but when you are in the data tier only, it is often easier just to store it. Now that I think about it, in DS you can use the Date_Generation transform since your requirements are pretty simple.
I'm going to assume you will do your calculation on the first day of the month for simplicity. This is a case where the Date_Generation transform will work well for you. You'll need to create a data flow with two parameters, $StartDate and $EndDate, which is the range of dates you are trying to calculate, and they should both be the first day of a month. Set them as the starting and ending dates for the transform and set it to generate dates monthly. Then you can join this to your actual data. The simple case I'm doing here assumes you will be fine with getting the number of employees on the first of the month and not dealing with hires and terminations throughout the month, that one is a little more complex. Your join condition is going to be where $FromDate <= [Date from Date_Generation] AND $ToDate >= [Date from Date_Generation]. This will get you all the records, and if you need a count you can just aggregate on the date and use the count() aggregation function.
Addendum:
One other thing I thought of. If you use the Date_Generation transform, I don't believe you'll get pushdown to the data tier. If you have a large volume of data, there could be performance issues. If this is an issue for you, then maybe the date dimension table would work better because as long as it is in the same database as your data, or in a different database linked in DS, you should have your query pushed down to the data tier.
If you didn't see where I was going with the date dimension table, it would be a table filled with the same dates the Date_Generation transform would create, and in your query you'd select out the range you need using the $StartDate and $EndDate parameters. You'd just put the first date of every month in there for a long enough range of time so you don't need to keep filling it. You'd need to start with the earliest date in your data and maybe go all the way up to the year 3000. You'd be creating a Y3K problem for yourself, but I suspect this code won't still be in use in a thousand years, or at the very least you'll be long gone.
User | Count |
---|---|
75 | |
10 | |
8 | |
7 | |
6 | |
5 | |
5 | |
5 | |
5 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.