Perhaps one of the business areas that faces the greatest risk each day is the lending industry. Banks, mortgage companies, and other types of lenders face one specific risk many times every day: Are they going to be paid back when they make a loan? Organizations that make their money by lending money must be able to anticipate risk and predict the likelihood that they will be paid back, with interest, or else their business model will fail and they will have to close their doors. In this Assignment, you will use R with two data sets to predict the risk of loan default for a lender, and then report and explain your results.
Assignment Instructions
Complete the following steps:
- Using the university’s online Library and Internet resources, research the lending industry. In a Word document, prepare a risk management plan outline for loan default risk faced by lenders. Include all five parts of risk management planning: Identification, Understanding, Data Preparation, Modeling and Application. Cite all sources used to prepare your risk management plan.
- Download the Loans.csv and Applicants.csv files. Import both of these as data frames into RStudio. Give each a descriptive name. Show this in your Word document.
- Using the Loans.csv file, build a logistic regression model to predict the “Good Risk” dependent variable (use family=binomial() in the glm function in R). In this column, ‘1’ indicates that making the loan is a good risk for the lender; ‘0’ indicates that making the loan is a bad risk. Make sure that you do not use the Applicant ID as an independent variable! You will need to load the MASS package in R by issuing library(MASS), before using the glm function to build your model. Show the creation of the model in your Word document.
- In your Word document, document your logistic model’s output, and specifically explain which independent variables have the most predictive power and which have the least. Make sure you identify how you know, and explain why it matters.
- Apply your logistic regression model to the data in Applicants.csv to generate predictions of “Good Risk” for each loan applicant. If your glm model is stored in an R object called ‘LoanModel’, for example, and your Applicants.csv data is in a frame called ‘Appl’, then you would issue a command that looks like this: LoanPredictions <- predict(LoanModel, Appl, type=“response”). Document the application of your model to the Applications data in your Word document.
- In your Word document, interpret your predictions for the Applicants.csv data. Specifically address the following:
- How many loans do you predict to be a good risk for the lender?
- How many are predicted to be a bad risk?
- What are your highest and lowest post-probability percentages for predictions?
- How many loans have at least a 75% post-probability percentage and what does that mean for the lender?
- How many loans have less than a 25% post-probability percentage and what does that mean for the lender?
- Suppose that the lender is willing to accept a little higher risk and has decided they will make loans to applicants who have post-probability percentages between 40% and 65%. List two things the lender could do to mitigate risk when lending to this group, and explain how these will help.
- Make sure that you cite at least five supporting sources beyond the textbook in support of your writing and explanations. Cite correctly in APA format.
Assignment Requirements
Prepare your Assignment submission in Microsoft Word following standard APA formatting guidelines: Double spaced, Times New Roman 12-point font, one inch margins on all sides. Include a title page, table of contents and references page. You do not need to write an abstract. Label all tables and figures. Cite sources appropriately both in the text of your writing (parenthetical citations) and on your references page (full APA citation format).
For more information on APA style formatting, refer to the resources in the Academic Tools section of this course.