Random Forest for Loan Performance Prediction

Random Forests are among the most powerful predictive analytic tools. They leverage the considerable strengths of decision trees, including handling non-linear relationships, being robust to noisy data and outliers, and determining predictor importance for you. Unlike single decision trees, however, they don’t need to be pruned, are less prone to overfitting, and produce aggregated results that tend to be more accurate.

This post presents code to prepare data for a random forest, run the analysis, and examine the output.

The specific question I answer with these analyses is: what is the predicted percentage of loan principal that will have been re-paid by the time the loan reaches maturity? I’m using publicly-available, 2007-2011 data from the Lending Club for these analyses. You can obtain the data here.

Read More