cancel
Showing results for 
Search instead for 
Did you mean: 

Model Statistics/Compare for non-Auto algorithms--"Training" vs "Validation"???

Former Member
0 Kudos

The new Model Statistics and Model Compare functionality with version 2.2 is really great.  I am wondering, however, what the designation between "Training" and "Validation" datasets are in the Model Compare Results output?  There seems to be very little information in the user guide about how this is generated.

For the Auto Classification and Auto Regression algorithms, I understand how the Model Statistics will understand “Train” and “Validation” because splitting and auto-validation is part of those algorithms, but for the non-auto algorithms, there is no automated splitting of the data into Train and Validate samples, in fact 100% of the data passing through the predictive algorithm (for example R-CNR tree) is used for model training.  So what does the KR value represent?  Is this prediction consistency over repeated samples of training data?  And how are the charts under "Model Representation" generated with labels of "Train" and "Validate"?  I see differences in the 2 results for an R-CNR tree algorithm, but not sure how the Model Compare module is deciding what is "Train" and "Validate" data--I think it is all "Train".

Is there a possibility of designating Train vs Validate so that those charts are accurate?

Thanks!

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Hi Hillary,

In 2.2 release, the statistics node is following the default cutting strategy (similar to APL/Auto algorithms) that is random without test in absence of a partition node. In doing this, it splits the dataset into random partitions of 75% for training and 25% for validation and thus the KI is from training and KR from validation.

This was the first step towards model comparison and it'll become more configurable in coming releases

Regards,

Jayant

Answers (0)