This was my first ever machine learning competition. As a result, my goals are simply to produce a functional notebook to train an accurate model.
The Kaggle link above contains more detailed information about my work on this submission.
I began with exploratory data analysis (EDA) and dug into the distributions of different variables and their correlations to the target variable. I then used an XGBoost forest regressor to set a baseline performance on the training set. Afterward, I optimized the dataset with klib, reducing its footprint by 62.5%. The smaller dataset allowed for faster training. I used the optimized dataset with Optuna, a Python hyperparameter optimization library, to determine the best set of hyperparameters for the XGBoost classifier. Finally, I used these parameters to train a final model and predict the output of the submission data.
I successfully met each of the requirements I set for myself at the beginning of this competition. I achieved 81% accuracy and gained a large amount of data processing knowledge that I intend to use to improve my performance in the next competition.