Model Dev & Offline Eval


Model dev & training

Ensembles

Model Evaluation

essential to know the baseline your evaluating against the model -Random baseline if model predict at random, what’s the expected performance ?

[!example] data set with 90 negatives and 10 positives observations

random distribution Accuracy F1  
Uniform random (0.5) 0.5 0.167  
Task label distribution  0.82  0.1  

Evaluating Methods