The new Model Statistics and Model Compare functionality with version 2.2 is really great. I am wondering, however, what the designation between "Training" and "Validation" datasets are in the Model Compare Results output? There seems to be very little information in the user guide about how this is generated.
For the Auto Classification and Auto Regression algorithms, I understand how the Model Statistics will understand “Train” and “Validation” because splitting and auto-validation is part of those algorithms, but for the non-auto algorithms, there is no automated splitting of the data into Train and Validate samples, in fact 100% of the data passing through the predictive algorithm (for example R-CNR tree) is used for model training. So what does the KR value represent? Is this prediction consistency over repeated samples of training data? And how are the charts under "Model Representation" generated with labels of "Train" and "Validate"? I see differences in the 2 results for an R-CNR tree algorithm, but not sure how the Model Compare module is deciding what is "Train" and "Validate" data--I think it is all "Train".
Is there a possibility of designating Train vs Validate so that those charts are accurate?