on 11-11-2012 6:07 PM
I'm using BODS as a data quality tool. So my data flows looks like this :
I'm looking for the easiest way to generate a score for this validation rule. I'm aware that I can use the "Count" feature in Audit to store the count of Pass/Fail records and then computing the score. Isn't there another way to generate a score for validation rules in BODS ?
I found this "Score" check box inside the function/rule in the validation transform. What does this check box mean ? and where does it store the score ?
You use the score in the following case:
Prerequiste: multiple paramters/bindings in your validation function
--> In case of a failed row, you can configure to which column this error belongs to. Enable the score field for all columns, where the error should be counted to.
See Reference guide p.581/582
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thats gonna be a little more complicated. I would do the following:
Create an error_log table where you store all failed rules with the corresponding failed column(s) and an error_id. In a static error table, you store the score contribution of a certain error per error_id.
After the staging process you're able to produce some kind of dq/error reporting based on those two tables.
Cheers, @rogermathis
I mean a manually maintained static error table, where you store error messages, scores etc. I'm thinking of a datamodel like that:
D_ERROR (static error table)
error_id, error_msg, error_type, error_score, etc.
1, 'data validation failed','data validation error',4
LOG_ERROR (fact data, populated during ETL load)
job_id, error_id, error_column, validation_rule, error_table, dataflow, workflow, error_pk[1-5], error_values[1-5]
1,1,'ZIP','bad zip code','ADDRESS','DF_STG_ADDRESS','WF_STG_ADDRESS',3234234,'44x'
In a report, you join both tables and aggregate the data as you wish...
Actually, I just want to give you an idea of a concept. Feel free to adapt. It's matter of your requirement.
Personally, I always implement a generic error, dq logging and metadata concept surronding all Jobs, Workflows and Dataflows in order to be able to create a central and easy to use metadata, error and dq reporting.
Thanks for the idea. Manually is very hard because I'll have +1000 validation rules
I thought BODS does something like this :
If 10 records out of 100 failed, then it outputs the score for this validation rule as 9 out of 10 automatically. (Like Information Steward). It looks like I cannot accomplish that with BODS.
User | Count |
---|---|
85 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.