on 05-11-2016 3:41 PM
Hi community,
I'm currently working on a data preparation where i have a dataset that contains 3 columns: CustomerID, Date (yyymm), Invoice ID (see table with example).
ID | Date | Invoice ID |
---|---|---|
1 | 201601 | A |
1 | 201601 | B |
1 | 201602 | C |
2 | 201601 | D |
The aim is to build an aggregate on this table where ID+Date are keys (there is only one row per ID+Date) and a new field that contains the number of invoices this customer has received for the given date.
For the example in the table it should look as follows:
ID | Date | # of Invoices |
---|---|---|
1 | 201601 | 2 |
1 | 201602 | 1 |
2 | 201601 | 1 |
How can this be done in Predictive Analytics? I have tested different approaches with the Data Manager but did not find an easy Aggregate Approach.
Thanks for any hints in advance,
Regards,
Sabrina
Hi,
thanks Marc and Abdel for the description of how to add the Aggregation functionalities to the expression window.
I still struggle with the execution of the aggregation when having two keys. As the number of rows should be reduced through the aggregation i guess that i need to run the aggregation within a filter condition. Can anyone explain how the expression should look like with my two keys plus one new column with the Invoice ID count?
Thanks!
Sabrina
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Sabrina, is your question answered? Thanks & regards Antoine
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Data Manager allows to perform aggregations over time ; those aggregations can then be used as input variable to generate a predictive model. Let's take a typical predictive problem: "We want to target the Customers who are most likely to respond to a marketing campaign". We start in Data Manager and select the CUSTOMER table that is the entity for our predictions. Then we define aggregates on INVOICE_HEADER (the event table). We choose the function COUNT and the column INVOICE_ID to obtain the number of invoices per customer. You need to bring your customer table into Data Manager.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Marc, thanks for the quick reply! As i don't have two tables, i've used a copy of my basis table. I get the number of invoices per customer but not per customer plus date. So only one key is possible but in my table are two keys. Is there a solution in Data Manager that can handle more than one key?
Thanks!
Sabrina
Hi Sabrina,
What you can do is to define a new expression with formula: AggregateCount(invoice_id)
Then uncheck the Visibility check-box for the fields that you don't want to have in the result data set (ex: invoice id)
This function belongs to a family of basic aggregation functions: AggregateAvg() AggregateCount() AggregateMin() AggregateMax()
They are not exposed in the user interface by default. I need to check if there is a setting to make them visible in the function list.
Let us know if the formula works.
Marc
User | Count |
---|---|
87 | |
10 | |
10 | |
9 | |
7 | |
6 | |
6 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.