Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Predictive models for P2P lending

Predictive models for P2P lending

Predictive models for loans approval/rejections, loans default and risk calculation, and borrowers rating classification using data from www.prosper.com

jeyaramashok

May 03, 2014
Tweet

More Decks by jeyaramashok

Other Decks in Programming

Transcript

  1. Prosper Platform •Prosper has a rich dataset when compared to

    Lending Club. •Historical data from its inception(2005). •Prosper provides observations across 7 objects •Listings, Loans, Groups, Categories, Marketplaces, Members
  2. How it works? Prosper Platform Borrower Investor $$$.. Creates Listing

    Places BID Please read the offer documentation carefully  Monthly EMI Investment
  3. Prosper Data •3.2 GB XML •2.3 million records and 70+

    variables •Subset 2008 (95k) and 2013(50k) data •~70 variables
  4. Variables Predictors Response Quantitative AmountRequested, BidCount, EstimatedLoss, LenderYield, ProsperScore, DebtToIncomeRatio,

    OnTimeProsperPayments, ProsperPaymentsLessThanOneMonthLate, ProsperPaymentsOneMonthPlusLate, AmountFunded , AmountRemaining, BidMaximumRate, BorrowerMaximumRate, BorrowerRate, Category, Duration,ActiveProsperLoans, TotalProsperLoans, ProsperPrincipalBorrowed, ProsperPrincipalOutstanding, TotalProsperPaymentsBilled, CreditScoreRangeLower, CreditScoreRangeUpper, MonthlyLoanPayment, BankDraftFeeAnnualRate, GroupLeaderRewardRate, PercentFunded, Term Categorical HasVerifiedBankAccount, IsBorrowerHomeowner, FundingOption,City,State,GroupName,GroupRating ProsperRating, ListingStatus, LoanStatus
  5. Loan Default Prediction • 2013 data didn’t work out. •

    Loan term was 3 and 5 year • Binary Classification • Response: LoanStatus • Defaulted, Complete • Random forests • 86 % prediction accuracy
  6. Loan Approval Prediction •Approved/Rejected ? •Binary Classification •Response Variable: ListingStatus

    • Completed, Cancelled •Random forests didn’t work (more than 32 categories) •Naïve Bayes Classifier •87% prediction accuracy
  7. Borrower Ratings • Multiclass Classification • Response: ProsperRating • Completed

    - Naïve Bayes • 90% accuracy • In-Progress - Gradient Boosted Trees, SVM
  8. Tools •Grep, Sed, AWK •Libxml – python •MySQL •R –

    Caret, e1071, gbm, libsvm •Rattle •Weka
  9. Future Scope •Improving prediction accuracy – Analyzing individual credit profile

    data •Money flow – Analyzing bids placed •Impacts of Social Networking (Friends, References) •Influence of groups and categories