replacing


Nov 21, 2020 Contents Why does missing data matter? What are the options for missing data imputation? Missing data imputation using scikit-learn (0) Prepare data (1) Mean/median (2) Mode (most frequent category) (3) Arbitrary value (4) KNN imputer (5) Adding Missing Indicator What to use? References Why does missing data matter?



2. Imputing with the median is more robust than imputing with the mean, because it mitigates the effect of outliers. In practice though, both have comparable imputation results. However, these two methods do not take into account potential dependencies between columns, which may contain relevant information to estimate missing values.