Imagine a box where you put all of your machine learning stuff, Here it is. [WIP] will update the structure
Bias vs Varience
Metrics
Precision
Recall
Accuracy
F1-score
Cross-Validation
How do you choose which cross validation technique will be used for your project. THink about how your model will be sued and interact with the data in a deployed setting. if the dataset is huge, use Hold-out, which is basically 80-20 method
K fold
if the data points are independent to each other.
if the dataset is unbalanced: stratifiedKfold, as it should be aware with the classes. if very little data, shufflesplit
Time split
if time is influencial in generation of the data, use Timesplit
GroupKfold
if the data is generated by the a patient , and we have lets say n patient generating the x data point, it would be great to use group kfold. Group in this case in the participants.
stratified Groupkfold
if the data points skewed, we can use the stratisfied GroupKfold as well.