PART 3 – Find the relevant features and model for predicting the best carry distance of golf ball

KNN (K-nearest-neighbors) method for features ranking and error analysis

Now we need to do the features ranking for predicting the golf ball carry distance. For that, we will use some experiments and run multivariate imputation. Multivariate imputation is to predict the missing value by using multiples features like ball speed, apex… from our dataset. It captures relationship among variables, leading to better estimate.

We will use the KNN regression tool, which goal is to find the closest Neighbors of the missing data based on similarities across our available features. It is a supervised learning algorithm that learns the relationship between input features and the output (golf ball carry distance) from the dataset, in order to make predictions on new data, so new shots of golf. So, for any shot, it looks up the k most similar swings (neighbors) and predicts carry distance by averaging their real outcomes. We will use the RMSE (root mean squared error) as an indicator to get our model accuracy, it measures the model prediction errors.

FeaturesK=1K=2K=3
Ball speed + Launch angle4.294.415.317
Ball speed + Launch angle + Apex4.3824.414.86

The results are that the single-neighbor model (at K=1) produces the tightest predictions, the most accurate predictions. Increasing K costs accuracy. Therefore, Ball speed + Launch angle are the top combined 2 features with less noisy predictions of that model.