Exploring Loan Data from Prosper

By Kevin Vo


Based on Prosper website: Prosper is America’s first marketplace lending platform, with over $7 billion in funded loans. Prosper allows people to invest in each other in a way that is financially and socially rewarding. On Prosper, borrowers list loan requests between $2,000 and $35,000 and individual investors invest as little as $25 in each loan listing they select. Prosper handles the servicing of the loan on behalf of the matched borrowers and investors.

Data Set contains information about Prosper loans:

loan %>% dim
## [1] 113937     81
  • There are 113937 different loans which are categorized with 82 different variables.
loan %>% str
## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...

Univariate Plots Section

1. Original Loan:

  • LoanOriginalAmount:

Back

loan $LoanOriginalAmount %>% summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000
  • Loan Origination Quarter:

## .
## Q1 2006 Q1 2007 Q1 2008 Q1 2010 Q1 2011 Q1 2012 Q1 2013 Q1 2014 Q2 2006 
##     315    3079    3074    1243    1744    4435    3616   12172    1254 
## Q2 2007 Q2 2008 Q2 2009 Q2 2010 Q2 2011 Q2 2012 Q2 2013 Q3 2006 Q3 2007 
##    3118    4344      13    1539    2478    5061    7099    1934    2671 
## Q3 2008 Q3 2009 Q3 2010 Q3 2011 Q3 2012 Q3 2013 Q4 2005 Q4 2006 Q4 2007 
##    3602     585    1270    3093    5632    9180      22    2403    2592 
## Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012 Q4 2013 
##     532    1449    1600    3913    4425   14450

Back

  • Loan Term:

The length of the loan expressed in months.

## .
##    12    36    60 
##  1614 87778 24545

Back

  • Loan Status:

The current status of the loan: Cancelled, Chargedoff, Completed, Current, Defaulted, FinalPaymentInProgress, PastDue. The PastDue status will be accompanied by a delinquency bucket.

## .
##              Cancelled             Chargedoff              Completed 
##                      5                  11992                  38074 
##                Current              Defaulted FinalPaymentInProgress 
##                  56576                   5018                    205 
##   Past Due (>120 days)   Past Due (1-15 days)  Past Due (16-30 days) 
##                     16                    806                    265 
##  Past Due (31-60 days)  Past Due (61-90 days) Past Due (91-120 days) 
##                    363                    313                    304

Back

2. Interest Rate:

Back

loan[,c("LenderYield", "EstimatedEffectiveYield", "BorrowerAPR","BorrowerRate")] %>% summary
##   LenderYield      EstimatedEffectiveYield  BorrowerAPR     
##  Min.   :-0.0100   Min.   :-0.183          Min.   :0.00653  
##  1st Qu.: 0.1242   1st Qu.: 0.116          1st Qu.:0.15629  
##  Median : 0.1730   Median : 0.162          Median :0.20976  
##  Mean   : 0.1827   Mean   : 0.169          Mean   :0.21883  
##  3rd Qu.: 0.2400   3rd Qu.: 0.224          3rd Qu.:0.28381  
##  Max.   : 0.4925   Max.   : 0.320          Max.   :0.51229  
##                    NA's   :29084           NA's   :25       
##   BorrowerRate   
##  Min.   :0.0000  
##  1st Qu.:0.1340  
##  Median :0.1840  
##  Mean   :0.1928  
##  3rd Qu.:0.2500  
##  Max.   :0.4975  
## 

3. Loan Assessment By Credit & Rating:

  • Credit Grade:

The Credit rating that was assigned at the time the listing went live. Applicable for listings pre-2009 period and will only be populated for those listings.

## .
##           A    AA     B     C     D     E    HR    NC 
## 84984  3315  3509  4389  5649  5153  3289  3508   141

Back

Since Credit Grade Rating were used before 2009, Credit Grade are not applied to 74.6% of current loan. However, it is still useful to tell us the picture of the past.

To generate a more meaningful chart, I have to reorganize the levels of CreditGrade in terms of credit risk rating.

Back

  • Prosper Rating:

The Prosper Rating assigned at the time the listing was created: 0 - N/A, 1 - HR, 2 - E, 3 - D, 4 - C, 5 - B, 6 - A, 7 - AA. Applicable for loans originated after July 2009.

loan$ProsperRating..Alpha. %>% table
## .
##           A    AA     B     C     D     E    HR 
## 29084 14551  5372 15581 18345 14274  9795  6935

Back

4. Loan Assessment By Income & Asset:

  • Income Factor:

Back

  • Homeowner Factor:

Back

Univariate Analysis

What is the structure of your dataset?

The dataset contains 113937 observations. Each observation is described by 82 variables.

I choose with 17 features out of 81, and classify them into 4 groups:

  • Original Loan: LoanOriginalAmount,LoanOriginationQuarter, Term, LoanStatus
  • Interest Rate: LenderYield, EstimatedEffectiveYield, BorrowerAPR, BorrowerRate
  • Rating or Loan Assessment: Credit Grade, Prosper Rating, EmploymentStatus, IncomeRange, IsBorrowerHomeowner.
  • Supporting Variables: BorrowerState, Occupation, ListingCategory, EstimatedReturn.

What is/are the main feature(s) of interest in your dataset?

My main interest in this dataset is LenderYield which is the lender yield on the loan because I want to analyze this dataset at a glance of an investor. So there are some questions that I try to answer:

  • Which are factors that influence or have relationship with LenderYield ?
  • Is there any risk of investing money?
  • Which is a good strategy of investing money into Prosper Loan Platform?

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

  • Original Loan:
    • LoanOriginalAmount tells us the distribution of all the amounts of loan. We can see that most of the loans are under $10000. This density plot shows the frequency of the loan amount are peaked at multiple of $5000. Figure1
    • Loan Origination Quarter tells us which quarter has more loan. And how prosper operates each year. Figure2
    • Loan Term indicates that majority of loan terms are 36-months. Only a small fraction of loans are 12-months loans. Figure3
    • Loan Status : Defaulted and Chargedoff are accounted for nearly 15% of the number of loans. It means the risk of investing in the bad loan is pretty high. Also, I believe this variable provide a lot of information about the loan. Especially, I would like to focus on “Chargedoff”, “Defaulted” and “Completed” because it tells you how past loans performed. Figure4
  • Interest Rate: We cannot find the specific interest rate in the dataset. However, I believe Lender Yield, Estimated Effective Yield, Borrower APR and Borrower Rate can represent as the interest rate. Even though they are used for different purpose, their distribution are nigh identical (Based on Figure5 ) => My question is: Are they correlated among each other?

  • Loan Assessment/ Loan Rating:

    1. By Credit & Rating:
    • Since Credit Grade Rating were used before 2009, Credit Grade are not applied to 74.6% of current loan. Figure6 & Figure7
    • We can see that Prosper Rating Numeric and Prosper Rating Alpha are similar. Three most common loans are A, B and C. Figure8
    1. By Income & Asset:
    • Income Factor: Employed and full-time tend to borrow more than the rest. Among those employed people who have income ranging from $25000 to $75000 take out more loans. And most of them have proof for their income. Figure9
    • Half of the loans are made by homeowners. Figure10

Did you create any new variables from existing variables in the dataset?

I did not create any new variable. However, I have to reorder levels of CreditGrade and ProsperRating..Alpha. in term of risk rating.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

  • One of interesting observation is the multi-modal distribtuion of LoanOriginal Amount. It seems that Loans are made in the multiple of $5000.

  • Since I have used alot of bar plots in univariate plot session, I have created a function as called as draw_bar_plot. In this function, it has already clean the data such as removing NA and tidy them by generating factors for categorical variables.

Bivariate Plots Section

1. Loan Status:

Back

Back

2. Interest Rate:

Back

loan[,c("LenderYield", "EstimatedEffectiveYield", "BorrowerAPR","BorrowerRate")] %>%
subset(!is.na(LenderYield) & !is.na(EstimatedEffectiveYield) &
       !is.na(BorrowerAPR) & !is.na(BorrowerRate)) %>% cor
##                         LenderYield EstimatedEffectiveYield BorrowerAPR
## LenderYield               1.0000000               0.8953425   0.9933345
## EstimatedEffectiveYield   0.8953425               1.0000000   0.8956348
## BorrowerAPR               0.9933345               0.8956348   1.0000000
## BorrowerRate              0.9999958               0.8952825   0.9933333
##                         BorrowerRate
## LenderYield                0.9999958
## EstimatedEffectiveYield    0.8952825
## BorrowerAPR                0.9933333
## BorrowerRate               1.0000000

Back

3. Prosper Rating:

Back

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

  • Loan Status: I focus strongly on LoanStatus variable, especially “Chargedoff”, “Defaulted” and “Completed”. It indicates the health status of all the past loans.

    • The proportion of “ChargedOff” loan is higher when Prosper Rating is lower whereas The proportion of “Completed” loan is lower when Prosper Rating is lower => It means the risk of being charged off is higher with lower rating loan. Figure13 However, borrowers who have higher risk take larger loans. Figure14
    • As an investor, it is wise to ignore loans from unemployed persons. It is hard for them to repay the loan ( Based on IncomeVerifiable vs Loan Status chart). So I mainly focus on the loans of the employed. Income are splitted into 5 groups: $0-$24999, $25000-$49999, $50000-$74999, $74999-$100000, $100000+. We also see that proportion of “Chargedoff” loan is higher when the income is lower. However the difference in proportion is not large. Figure13
    • The proportion of “Chargedoff” loan is higher when the term of loan is higher. ==[Figure 14]== However, bigger loans are made under longer loan terms. Figure14
  • Interest Rate:
    • Clearly, LenderYield, EstimatedEffectiveYield, BorrowerAPR, BorrowerRate are strongly positively correlated. It means we only need to analyze the behavior of one of them. And I chose LenderYield to represent as interest rate.Figure15
    • There is a linear relationship between LenderYield and ProspeRating..numeric. And the slope of this relationship is downward. Figure16 => It means higher risk loans tends to yeild more.
  • Prosper Rating: There is a linear trend between Prosper Rating and
    • Credit Score Lower Range
    • Credit Score Upper Range
    • Number of Public Inquires in the last 6 months
    • Percentage of never-deliquent trades.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

I thought Homeowner is one of the factor contributing to the difference in loan status. However, it shows that the distribution of loan status between homeowner and non-homeowner are almost similar. It means the proportion of charged loans are in-different whether the borrowers own a house or not. Figure13

What was the strongest relationship you found?

The strongest relationship is between LenderYield and BorrowerRate. The correlation coefficient is nearly 1.

Multivariate Plots Section

Based on the bivariate analysis, I want to check the linear relationship between ProsperRatingNumeric and predictors such as CreditScoreRangeLower, CreditScoreRangeUpper, TradesNeverDelinquent..percentage., InquiriesLast6Months. Also I add 2 more predictors: PublicRecordsLast12Months, DebtToIncomeRatio to this model.

ProsperRating.lm %>% summary
## 
## Call:
## lm(formula = ProsperRating..numeric. ~ CreditScoreRangeLower + 
##     CreditScoreRangeUpper + TradesNeverDelinquent..percentage. + 
##     InquiriesLast6Months + PublicRecordsLast12Months + DebtToIncomeRatio - 
##     1, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.3206 -0.8063  0.1158  0.8861  9.3861 
## 
## Coefficients:
##                                     Estimate Std. Error  t value Pr(>|t|)
## CreditScoreRangeLower               0.499747   0.003779  132.239  < 2e-16
## CreditScoreRangeUpper              -0.481626   0.003688 -130.602  < 2e-16
## TradesNeverDelinquent..percentage.  1.202335   0.040808   29.463  < 2e-16
## InquiriesLast6Months               -0.278500   0.003306  -84.235  < 2e-16
## PublicRecordsLast12Months          -0.154405   0.044775   -3.448 0.000564
## DebtToIncomeRatio                  -0.766687   0.014584  -52.572  < 2e-16
##                                       
## CreditScoreRangeLower              ***
## CreditScoreRangeUpper              ***
## TradesNeverDelinquent..percentage. ***
## InquiriesLast6Months               ***
## PublicRecordsLast12Months          ***
## DebtToIncomeRatio                  ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.283 on 77551 degrees of freedom
##   (36380 observations deleted due to missingness)
## Multiple R-squared:  0.9169, Adjusted R-squared:  0.9169 
## F-statistic: 1.427e+05 on 6 and 77551 DF,  p-value: < 2.2e-16

Back

Back

LenderYield.lm <-
loan %>%
select(LenderYield, LoanStatus, ProsperRating..numeric., Term) %>%
filter(LoanStatus %in% c("Defaulted", "Completed", "Chargedoff")) %>%
lm(LenderYield ~ factor(ProsperRating..numeric.) +  factor(Term)  + factor(LoanStatus), data = .)

LenderYield.lm %>% summary
## 
## Call:
## lm(formula = LenderYield ~ factor(ProsperRating..numeric.) + 
##     factor(Term) + factor(LoanStatus), data = .)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.156886 -0.010819  0.000239  0.012114  0.189556 
## 
## Coefficients:
##                                    Estimate Std. Error  t value Pr(>|t|)
## (Intercept)                       0.2714997  0.0007577  358.316   <2e-16
## factor(ProsperRating..numeric.)2 -0.0126750  0.0005205  -24.352   <2e-16
## factor(ProsperRating..numeric.)3 -0.0643999  0.0004772 -134.955   <2e-16
## factor(ProsperRating..numeric.)4 -0.1175471  0.0005344 -219.966   <2e-16
## factor(ProsperRating..numeric.)5 -0.1617421  0.0005506 -293.737   <2e-16
## factor(ProsperRating..numeric.)6 -0.2097962  0.0005376 -390.248   <2e-16
## factor(ProsperRating..numeric.)7 -0.2356068  0.0006597 -357.161   <2e-16
## factor(Term)36                    0.0406862  0.0006017   67.615   <2e-16
## factor(Term)60                    0.0635214  0.0006921   91.775   <2e-16
## factor(LoanStatus)Completed      -0.0047244  0.0003593  -13.149   <2e-16
## factor(LoanStatus)Defaulted       0.0008353  0.0007717    1.082    0.279
##                                     
## (Intercept)                      ***
## factor(ProsperRating..numeric.)2 ***
## factor(ProsperRating..numeric.)3 ***
## factor(ProsperRating..numeric.)4 ***
## factor(ProsperRating..numeric.)5 ***
## factor(ProsperRating..numeric.)6 ***
## factor(ProsperRating..numeric.)7 ***
## factor(Term)36                   ***
## factor(Term)60                   ***
## factor(LoanStatus)Completed      ***
## factor(LoanStatus)Defaulted         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02244 on 25994 degrees of freedom
##   (29079 observations deleted due to missingness)
## Multiple R-squared:  0.929,  Adjusted R-squared:  0.9289 
## F-statistic: 3.4e+04 on 10 and 25994 DF,  p-value: < 2.2e-16
LenderYield.lm %>% anova
## Analysis of Variance Table
## 
## Response: LenderYield
##                                    Df  Sum Sq Mean Sq  F value    Pr(>F)
## factor(ProsperRating..numeric.)     6 166.464 27.7440 55111.56 < 2.2e-16
## factor(Term)                        2   4.578  2.2889  4546.72 < 2.2e-16
## factor(LoanStatus)                  2   0.105  0.0525   104.32 < 2.2e-16
## Residuals                       25994  13.086  0.0005                   
##                                    
## factor(ProsperRating..numeric.) ***
## factor(Term)                    ***
## factor(LoanStatus)              ***
## Residuals                          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

  • Based on the coefficient table and QQ-Plot Figure19 , it shows that there is a strong linear relationship and goodness of fit of the model between Prosper Rating and following predictors:
    • Credit Scores Range (Lower and Upper): It means that Prosper carefully review credit score of the borrowers and add credit scores into their rating assessment.
    • Deliquency status: borrowers who take good care of their payment history will be assess as less risk and high rating.
    • The number of public records on the borrower’s credit report. Public records include bankruptcies, liens and judgments.
    • the ratio between borrowers’ debt and income.
  • For each loan term, there is linear relationship between Prosper Rating and LoanStatus.

Were there any interesting or surprising interactions between features?

Since we have the relationship betwen Prosper Rating and other predictors, we can save time upon analyzing loans by looking at Prosper Rating. Prosper Rating is a good measure of risk.


Final Plots and Summary

Plot One

Description One

The above plot shows that higher risk loans yield more. As an investor, I think the yield for Prosper Rating 7 (AA-lowest risk) is too low. If we invest in these loan, the average yielding is about 6% to 7% annually. So we should focus on loans with rating 4, 5(which is C and B) since the risk of being charged off is less than 25% but the yielding is about 15% to 20%. It means instead of investing in a large amount of money into 1 loan, we limit our risk by divesify into different small loans with rating B and C, the expected return rate could be about 16% to 18%.

Plot Two

## .
##  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014 
##    23  6213 11557 11263  2206  5530 11442 19556 35413 10734

Description Two

I have created ListingYear variable to see whether the risk of bad loans are gradually exposure when the lending platform is more mature. And this is proven by the above plot. We can see that in 2009 and 2010, the number of blue dots are still noticable, but the risk of blue dots are dissolved by time when the number of loans increases.

Plot Three

## .
## Q1 2006 Q1 2007 Q1 2008 Q1 2010 Q1 2011 Q1 2012 Q1 2013 Q1 2014 Q2 2006 
##     315    3079    3074    1243    1744    4435    3616   12172    1254 
## Q2 2007 Q2 2008 Q2 2009 Q2 2010 Q2 2011 Q2 2012 Q2 2013 Q3 2006 Q3 2007 
##    3118    4344      13    1539    2478    5061    7099    1934    2671 
## Q3 2008 Q3 2009 Q3 2010 Q3 2011 Q3 2012 Q3 2013 Q4 2005 Q4 2006 Q4 2007 
##    3602     585    1270    3093    5632    9180      22    2403    2592 
## Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012 Q4 2013 
##     532    1449    1600    3913    4425   14450

Description Three

There are some risks that cannot be accounted for in this dataset are the health of our economy and business management of Prosper:

  • If the economy went bad, the number of bad loans would increase.
  • Because loan terms create some barriers to liquidity of the loans for investors. If Prosper poorly managed their business and cannot compete with other competitors, the risk of its bankcruptcy is still there, but we cannot control it.

Therefore the plot above can be used as an indicator to estimate how Prosper operates. If the proportion of Prosper loans of new quarter drops compared to the same quarter of last year, we can think of stopping our investment into new loans.


Reflection

The dataset of Prosper loan had nearly 114000 loans from Nov 2005 to March 2014. Over the year the number of loans has been increasing. Based on my analysis, I can conclude that Prosper Rating is reliable. It is good indicator of risks and bad loans. Also, we find out that higher risk loans tend to yield more. By March 2014, the lending platform has matured enough to exposure the risk of bad loans. In other words, if we divesify our investments, we can expect higher return with higher risk loans on average. However, there are some unaccountable risks such as economy recession or Prosper’s pooly management. Morever, it is hard to find short term loans as 12-months with higher yeilding rate. Therefore we tend to invest to 36-months or 60-months loans.