A Quick Classification Example with C5.0 in R

Get the data:

</pre>
#C5.0 for loan credit
german_credit<-read.csv("credit.csv")
str(german_credit)

## 'data.frame':    1000 obs. of  21 variables:
##  $ checking_balance    : Factor w/ 4 levels "< 0 DM","> 200 DM",..: 1 3 4 1 1 4 4 3 4 3 ...
##  $ months_loan_duration: int  6 48 12 42 24 36 24 36 12 30 ...
##  $ credit_history      : Factor w/ 5 levels "critical","delayed",..: 1 5 1 5 2 5 5 5 5 1 ...
##  $ purpose             : Factor w/ 10 levels "business","car (new)",..: 8 8 5 6 2 5 6 3 8 2 ...
##  $ amount              : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
##  $ savings_balance     : Factor w/ 5 levels "< 100 DM","> 1000 DM",..: 5 1 1 1 1 5 4 1 2 1 ...
##  $ employment_length   : Factor w/ 5 levels "> 7 yrs","0 - 1 yrs",..: 1 3 4 4 3 3 1 3 4 5 ...
##  $ installment_rate    : int  4 2 2 2 3 2 3 2 2 4 ...
##  $ personal_status     : Factor w/ 4 levels "divorced male",..: 4 2 4 4 4 4 4 4 1 3 ...
##  $ other_debtors       : Factor w/ 3 levels "co-applicant",..: 3 3 3 2 3 3 3 3 3 3 ...
##  $ residence_history   : int  4 2 3 4 4 4 4 2 4 2 ...
##  $ property            : Factor w/ 4 levels "building society savings",..: 3 3 3 1 4 4 1 2 3 2 ...
##  $ age                 : int  67 22 49 45 53 35 53 35 61 28 ...
##  $ installment_plan    : Factor w/ 3 levels "bank","none",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ housing             : Factor w/ 3 levels "for free","own",..: 2 2 2 1 1 1 2 3 2 2 ...
##  $ existing_credits    : int  2 1 1 1 2 1 1 1 1 2 ...
##  $ default             : int  1 2 1 1 2 1 1 1 1 2 ...
##  $ dependents          : int  1 1 2 2 2 2 1 1 1 1 ...
##  $ telephone           : Factor w/ 2 levels "none","yes": 2 1 1 1 1 2 1 2 1 1 ...
##  $ foreign_worker      : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ job                 : Factor w/ 4 levels "mangement self-employed",..: 2 2 4 2 2 4 2 1 4 1 ...

#quick check some variables and their categorized nubmers
table(german_credit$checking_balance)

##
##     < 0 DM   > 200 DM 1 - 200 DM    unknown
##        274         63        269        394

table(german_credit$savings_balance)

##
##      < 100 DM     > 1000 DM  101 - 500 DM 501 - 1000 DM       unknown
##           603            48           103            63           183

table(german_credit$default)#1=NO 2=Yes, change to factor below

##
##   1   2
## 700 300

trainset<-sample(1000,900)
gc_train<-german_credit[trainset,]
gc_test<-german_credit[-trainset,]
library(C50)
gc_model<-C5.0(x=gc_train[,-17],y=as.factor(gc_train$default))
summary(gc_model)

##
## Call:
## C5.0.default(x = gc_train[, -17], y = as.factor(gc_train$default))
##
##
## C5.0 [Release 2.07 GPL Edition]      Sat Mar  4 11:16:26 2017
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 900 cases (21 attributes) from undefined.data
##
## Decision tree:
##
## checking_balance in {> 200 DM,unknown}: 1 (413/51)
## checking_balance in {< 0 DM,1 - 200 DM}:
## :...credit_history in {fully repaid,fully repaid this bank}:
##     :...housing in {for free,rent}: 2 (29/2)
##     :   housing = own:
##     :   :...savings_balance in {> 1000 DM,501 - 1000 DM}: 1 (4)
##     :       savings_balance = < 100 DM:
##     :       :...other_debtors = guarantor: 1 (1)
##     :       :   other_debtors in {co-applicant,none}: 2 (18/5)
##     :       savings_balance = 101 - 500 DM:
##     :       :...months_loan_duration <= 16: 2 (3)
##     :       :   months_loan_duration > 16: 1 (3)
##     :       savings_balance = unknown:
##     :       :...months_loan_duration <= 27: 2 (2)
##     :           months_loan_duration > 27: 1 (3)
##     credit_history in {critical,delayed,repaid}:
##     :...months_loan_duration <= 22:
##         :...purpose in {business,others,retraining}: 1 (21/1)
##         :   purpose = car (used):
##         :   :...housing = for free: 2 (3/1)
##         :   :   housing in {own,rent}: 1 (7)
##         :   purpose = domestic appliances:
##         :   :...age <= 25: 2 (2)
##         :   :   age > 25: 1 (2)
##         :   purpose = education:
##         :   :...savings_balance in {< 100 DM,> 1000 DM,101 - 500 DM,
##         :   :   :                   501 - 1000 DM}: 2 (6)
##         :   :   savings_balance = unknown: 1 (3)
##         :   purpose = radio/tv:
##         :   :...property in {building society savings,other,
##         :   :   :            real estate}: 1 (63/12)
##         :   :   property = unknown/none: 2 (3)
##         :   purpose = repairs:
##         :   :...installment_plan in {bank,none}: 1 (8/1)
##         :   :   installment_plan = stores: 2 (1)
##         :   purpose = car (new):
##         :   :...installment_plan = stores: 1 (0)
##         :   :   installment_plan = bank: 2 (12/3)
##         :   :   installment_plan = none:
##         :   :   :...dependents > 1: 1 (15/1)
##         :   :       dependents <= 1:
##         :   :       :...savings_balance in {> 1000 DM,
##         :   :           :                   501 - 1000 DM}: 1 (5)
##         :   :           savings_balance = < 100 DM:
##         :   :           :...residence_history <= 2: 2 (9/1)
##         :   :           :   residence_history > 2: 1 (24/7)
##         :   :           savings_balance = 101 - 500 DM:
##         :   :           :...personal_status = female: 2 (1)
##         :   :           :   personal_status in {divorced male,married male,
##         :   :           :                       single male}: 1 (3)
##         :   :           savings_balance = unknown:
##         :   :           :...property in {building society savings,
##         :   :               :            other}: 2 (4/1)
##         :   :               property in {real estate,unknown/none}: 1 (3)
##         :   purpose = furniture:
##         :   :...employment_length = 4 - 7 yrs: 1 (5)
##         :       employment_length = unemployed:
##         :       :...job in {mangement self-employed,unskilled resident}: 1 (4)
##         :       :   job in {skilled employee,unemployed non-resident}: 2 (2)
##         :       employment_length = > 7 yrs:
##         :       :...job = mangement self-employed: 2 (2)
##         :       :   job in {unemployed non-resident,
##         :       :   :       unskilled resident}: 1 (2)
##         :       :   job = skilled employee:
##         :       :   :...savings_balance in {< 100 DM,> 1000 DM,101 - 500 DM,
##         :       :       :                   501 - 1000 DM}: 1 (7/1)
##         :       :       savings_balance = unknown: 2 (1)
##         :       employment_length = 0 - 1 yrs:
##         :       :...personal_status in {divorced male,
##         :       :   :                   married male}: 1 (0)
##         :       :   personal_status = single male: 2 (1)
##         :       :   personal_status = female:
##         :       :   :...installment_plan = bank: 2 (1)
##         :       :       installment_plan in {none,stores}: 1 (9/1)
##         :       employment_length = 1 - 4 yrs:
##         :       :...checking_balance = 1 - 200 DM: 2 (4)
##         :           checking_balance = < 0 DM:
##         :           :...residence_history > 3: 2 (4)
##         :               residence_history <= 3:
##         :               :...months_loan_duration <= 16: 1 (10)
##         :                   months_loan_duration > 16: 2 (3/1)
##         months_loan_duration > 22:
##         :...savings_balance = > 1000 DM: 1 (2)
##             savings_balance = 501 - 1000 DM: 2 (3/1)
##             savings_balance = 101 - 500 DM:
##             :...credit_history in {critical,delayed}:
##             :   :...installment_plan in {bank,none}: 1 (11/1)
##             :   :   installment_plan = stores: 2 (1)
##             :   credit_history = repaid:
##             :   :...installment_plan = bank: 1 (1)
##             :       installment_plan in {none,stores}: 2 (11/1)
##             savings_balance = unknown:
##             :...checking_balance = 1 - 200 DM: 1 (17/1)
##             :   checking_balance = < 0 DM:
##             :   :...telephone = none: 2 (8/1)
##             :       telephone = yes: 1 (4/1)
##             savings_balance = < 100 DM:
##             :...months_loan_duration > 47: 2 (21/2)
##                 months_loan_duration <= 47:
##                 :...purpose = business: 1 (7/3)
##                     purpose in {domestic appliances,repairs,
##                     :           retraining}: 2 (3/1)
##                     purpose = car (used):
##                     :...age <= 27: 2 (3)
##                     :   age > 27: 1 (12)
##                     purpose = education:
##                     :...checking_balance = < 0 DM: 2 (2)
##                     :   checking_balance = 1 - 200 DM: 1 (2)
##                     purpose = others:
##                     :...checking_balance = < 0 DM: 1 (2)
##                     :   checking_balance = 1 - 200 DM: 2 (2)
##                     purpose = car (new):
##                     :...installment_rate > 2: 2 (17/1)
##                     :   installment_rate <= 2:
##                     :   :...age <= 33: 2 (2)
##                     :       age > 33: 1 (3)
##                     purpose = furniture:
##                     :...personal_status in {divorced male,female}: 2 (7/1)
##                     :   personal_status = married male: 1 (2)
##                     :   personal_status = single male:
##                     :   :...checking_balance = 1 - 200 DM: 2 (1)
##                     :       checking_balance = < 0 DM:
##                     :       :...employment_length = > 7 yrs: 2 (2)
##                     :           employment_length in {0 - 1 yrs,1 - 4 yrs,
##                     :                                 4 - 7 yrs,
##                     :                                 unemployed}: 1 (7)
##                     purpose = radio/tv:
##                     :...employment_length = > 7 yrs: 1 (5)
##                         employment_length in {0 - 1 yrs,1 - 4 yrs,4 - 7 yrs,
##                         :                     unemployed}:
##                         :...checking_balance = < 0 DM: 2 (10)
##                             checking_balance = 1 - 200 DM: [S1]
##
## SubTree [S1]
##
## property in {building society savings,real estate}: 2 (3)
## property in {other,unknown/none}: 1 (3)
##
##
## Evaluation on training data (900 cases):
##
##      Decision Tree
##    ----------------
##    Size      Errors
##
##      73  103(11.4%)   <<
##
##
##     (a)   (b)    <-classified as
##    ----  ----
##     612    22    (a): class 1
##      81   185    (b): class 2
##
##
##  Attribute usage:
##
##  100.00% checking_balance
##   54.11% credit_history
##   48.33% months_loan_duration
##   38.33% purpose
##   30.44% savings_balance
##   13.22% installment_plan
##    9.44% employment_length
##    8.78% property
##    8.11% housing
##    7.11% dependents
##    5.56% residence_history
##    3.78% personal_status
##    2.67% age
##    2.44% installment_rate
##    2.11% other_debtors
##    2.00% job
##    1.33% telephone
##
##
## Time: 0.0 secs

predictgcr<-predict(gc_model,gc_test)
table(predictgcr,gc_test$default)

##
## predictgcr  1  2
##          1 56 23
##          2 10 11

errorrate01<-sum(predictgcr!=gc_test$default)/nrow(gc_test)
accurate_rate01<-1-errorrate01
accurate_rate01

## [1] 0.67

# tune the model
#Trial: a named vector with elements Requested (an echo of the
#function call) and Actual (how many the model used)
gc_model_b10<-C5.0(x=gc_train[,-17],y=as.factor(gc_train$default),
trials = 10)
predictgcr02<-predict(gc_model_b10,gc_test)
table(predictgcr02,gc_test$default)

##
## predictgcr02  1  2
##            1 59 25
##            2  7  9

errorrate02<-sum(predictgcr02!=gc_test$default)/nrow(gc_test)
accurate_rate02<-1-errorrate02
accurate_rate02

## [1] 0.68
<pre>

 

For a bank, to release loan to a customer who might defaul is risky.
To avoid this situation we can inscrease the cost to do so, thus educing the rate it happens with a costMatrix like this:
actual
predicted no yes
no   0   4
yes  1   0
Cost is a parametre of C5.0(and a lot of other algorithms), though the finla accuracy might decrease.

We can also use OneR and JRip from RWeka package to do the job.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s