Tune Multi-layer Perceptron (MLP) in R with MNIST

Googled MLP and so many “My Little Ponies” results popped out. LOL. 🙂

Generally speaking, a deep learning model means a neural network model with with more than just one hidden layer.

Whether a deep learning model would be successful depends largely on the parameters tuned.

Multi-layer Perceptron  or MLP provided by R package “RNNS” provides multiple arguments for tuning, including the size of hidden layers, maximum of iterations to learn, learning function to use, learning function parameters and so on.

In this case, we are trying to use as little training data as possible from MNIST data set and tune the MLP model to achieve the best result. The KPIs here are FOM rate and Accuracy.

load_image_file <- function(filename) {
  ret = list()
  f = file(filename,'rb')
  readBin(f,'integer',n=1,size=4,endian='big')
  ret$n = readBin(f,'integer',n=1,size=4,endian='big')
  nrow = readBin(f,'integer',n=1,size=4,endian='big')
  ncol = readBin(f,'integer',n=1,size=4,endian='big')
  x = readBin(f,'integer',n=ret$n*nrow*ncol,size=1,signed=F)
  ret$x = matrix(x, ncol=nrow*ncol, byrow=T)
  close(f)
  ret
}

load_label_file <- function(filename) { 
  f = file(filename,'rb')
  readBin(f,'integer',n=1,size=4,endian='big')
  n = readBin(f,'integer',n=1,size=4,endian='big')
  y = readBin(f,'integer',n=n,size=1,signed=F)
  close(f)
  y
}
#save data as dataframe
# though not sure what to do with label data set...now
# convert labels in to categorial value
imagetraining<-as.data.frame(load_image_file("train-images-idx3-ubyte"))
imagetest<-as.data.frame(load_image_file("t10k-images-idx3-ubyte"))
labeltraining<-as.factor(load_label_file("train-labels-idx1-ubyte"))
labeltest<-as.factor(load_label_file("t10k-labels-idx1-ubyte"))

# Combine label and image integer. Rebuild Training and Test.
imagetraining[,1]<-labeltraining
imagetest[,1]<-labeltest
Training<-imagetraining
Test<-imagetest 

######################################################
####change the size of data used for training here####
######################################################

sample_n<-5000
training<-Training[1:sample_n,]

#################################################
####  1.0 train a single MLP without tuning  ####
#################################################
library(RSNNS)
## Loading required package: Rcpp
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
## 
## Attaching package: 'caret'
## The following objects are masked from 'package:RSNNS':
## 
##     confusionMatrix, train
set.seed(2019)
digits_mlp<-mlp(as.matrix(training[,-1]),
                decodeClassLabels(training$n),
                size=40,learnFunc="Rprop",
                shufflePatterns = F,
                maxit=60)

mlp_p<-predict(digits_mlp,as.matrix(Test[,-1]))
d_mlp_p<-encodeClassLabels(mlp_p,method="WTA",l=0,h=0)-1
caret::confusionMatrix(xtabs(~d_mlp_p+Test$n))
## Confusion Matrix and Statistics
## 
##        Test$n
## d_mlp_p    0    1    2    3    4    5    6    7    8    9
##       0  922    0   15    6   12   34   41    6   19   21
##       1    0 1035   15   11   27    1    5   26   30   13
##       2    7   29  764   72   11   35   47   19   35    2
##       3    2    3   43  763    1  104    1   14   28   12
##       4    1   10   16    3  756   16   10   12   42  109
##       5   28    6   10   73   36  595   33    1  100   10
##       6   14   17   51    3   29   22  808    3   16   11
##       7    3    0    8   17    5   10    0  829   13   46
##       8    3   28  108   41   12   47   12   24  660   25
##       9    0    7    2   21   93   28    1   94   31  760
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7892          
##                  95% CI : (0.7811, 0.7972)
##     No Information Rate : 0.1135          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.7657          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            0.9408   0.9119   0.7403   0.7554   0.7699   0.6670
## Specificity            0.9829   0.9856   0.9713   0.9769   0.9757   0.9674
## Pos Pred Value         0.8569   0.8899   0.7483   0.7858   0.7754   0.6670
## Neg Pred Value         0.9935   0.9887   0.9702   0.9726   0.9750   0.9674
## Prevalence             0.0980   0.1135   0.1032   0.1010   0.0982   0.0892
## Detection Rate         0.0922   0.1035   0.0764   0.0763   0.0756   0.0595
## Detection Prevalence   0.1076   0.1163   0.1021   0.0971   0.0975   0.0892
## Balanced Accuracy      0.9619   0.9487   0.8558   0.8662   0.8728   0.8172
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity            0.8434   0.8064   0.6776   0.7532
## Specificity            0.9816   0.9886   0.9668   0.9692
## Pos Pred Value         0.8296   0.8904   0.6875   0.7329
## Neg Pred Value         0.9834   0.9781   0.9653   0.9722
## Prevalence             0.0958   0.1028   0.0974   0.1009
## Detection Rate         0.0808   0.0829   0.0660   0.0760
## Detection Prevalence   0.0974   0.0931   0.0960   0.1037
## Balanced Accuracy      0.9125   0.8975   0.8222   0.8612
error_rate_mlp<-sum(d_mlp_p!=Test$n)/nrow(Test)
paste0("error rate of MLP: ",error_rate_mlp)
## [1] "error rate of MLP: 0.2108"
FOM_MLP<-(sample_n/60000)/2+error_rate_mlp
paste0("FOM of MLP: ",FOM_MLP)
## [1] "FOM of MLP: 0.252466666666667"
#[1] "FOM of MLP: 0.252466666666667"

#####################################################
#######    1.1 tune Mulitple MLP Models       #######
#####################################################

#####1.1.1 set up tune parameters#####
library(foreach)
library(parallel)
library(doSNOW)
## Loading required package: iterators
## Loading required package: snow
## 
## Attaching package: 'snow'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, clusterSplit, makeCluster,
##     parApply, parCapply, parLapply, parRapply, parSapply,
##     splitIndices, stopCluster
tuning<-list(size=c(40,20,20,50,50),
             maxit=c(60,100,100,100,100),
             shuffle=c(F,F,T,F,F),
             params=list(F,F,F,F,c(0.1,20,3)))

#####1.1.2 set up parallel computing environment######
cl <- makeCluster(detectCores())
clusterEvalQ(cl,{
  library(RSNNS)
})
## [[1]]
##  [1] "RSNNS"     "Rcpp"      "snow"      "methods"   "stats"    
##  [6] "graphics"  "grDevices" "utils"     "datasets"  "base"     
## 
## [[2]]
##  [1] "RSNNS"     "Rcpp"      "snow"      "methods"   "stats"    
##  [6] "graphics"  "grDevices" "utils"     "datasets"  "base"     
## 
## [[3]]
##  [1] "RSNNS"     "Rcpp"      "snow"      "methods"   "stats"    
##  [6] "graphics"  "grDevices" "utils"     "datasets"  "base"     
## 
## [[4]]
##  [1] "RSNNS"     "Rcpp"      "snow"      "methods"   "stats"    
##  [6] "graphics"  "grDevices" "utils"     "datasets"  "base"
registerDoSNOW(cl)

####1.1.3 train multiple MLP models######
set.seed(2018)
mt_mlp<-foreach(i=1:5,.combine = "c")%dopar%{
  if(tuning$params[[i]][1]){
    set.seed(2017)
    list(m_mlp=mlp(
      as.matrix(training[,-1]),decodeClassLabels(training$n),
      size=tuning$size[[i]],learnFunc="Rprop",
      shufflePatterns=tuning$shuffle[[i]],
      learnFuncParams=tuning$params[[i]],
      maxit=tuning$maxit[[i]]
    ))
  } else {
    set.seed(2017)
    list(m_mlp=mlp(
      as.matrix(training[,-1]),decodeClassLabels(training$n),
      size=tuning$size[[i]],learnFunc="Rprop",
      shufflePatterns=tuning$shuffle[[i]],
      maxit=tuning$maxit[[i]]
    ))
  }
}
clusterExport(cl,"mt_mlp")

####1.1.4 store the models training results and predicted results####
mt_mlp_results<-foreach(i=1:5,.combine=c)%dopar%{
  list(list(
    Insample=encodeClassLabels(fitted.values(mt_mlp[[i]]))-1,
    Outsample=encodeClassLabels(predict(mt_mlp[[i]],
                                        newdata=as.matrix(Test[,-1])))-1
  ))
}

###1.1.5 get the training results of multiple models###
mlp_insample<-cbind(Y=training$n,do.call(
  cbind.data.frame,lapply(mt_mlp_results,"[[","Insample")
))
colnames(mlp_insample)<-c("Y",paste0("Results",1:5))

mlp_training_results<-do.call(rbind,lapply(1:5,function(n){
  f_in<-substitute(~Y+x,list(x=as.name(paste0("Results",n))))
  mlp_dat_in<-mlp_insample[mlp_insample[,paste0("Results",n)]!=0,]
  mlp_dat_in$Y<-factor(mlp_dat_in$Y,levels=1:6)
  mlp_dat_in[,paste0("Results",n)]<-factor(mlp_dat_in[,
                                                      paste0("Results",n)],levels=1:6)
  res_in<-caret::confusionMatrix(xtabs(f_in,data=mlp_dat_in))
  cbind(Size = tuning$size[[n]],
        Maxit = tuning$maxit[[n]],
        Shuffle = tuning$shuffle[[n]],
        learnFuncParams=tuning$params[[n]],
        as.data.frame(t(res_in$overall[
          c("AccuracyNull", "Accuracy", "AccuracyLower", "AccuracyUpper")])))
}))

###1.1.6 get the predcited results of multiple models###
mlp_outsample<-cbind(Y=Test$n,
                     do.call(cbind.data.frame,lapply(
                       mt_mlp_results,"[[","Outsample")))
colnames(mlp_outsample)<-c("Y",paste0("Results",1:5))

mlp_predict_results<-do.call(rbind,lapply(1:5,function(n){
  f_out<-substitute(~Y+x,list(x=as.name(paste0("Results",n))))
  mlp_dat_out<-mlp_insample[mlp_outsample[,paste0("Results",n)]!=0,]
  mlp_dat_out$Y<-factor(mlp_dat_out$Y,levels=1:6)
  mlp_dat_out[,paste0("Results",n)]<-factor(mlp_dat_out[,
                                                        paste0("Results",n)],levels=1:6)
  res_out<-caret::confusionMatrix(xtabs(f_out,data=mlp_dat_out))
  cbind(Size = tuning$size[[n]],
        Maxit = tuning$maxit[[n]],
        Shuffle = tuning$shuffle[[n]],
        learnFuncParams=tuning$params[[n]],
        as.data.frame(t(res_out$overall[
          c("AccuracyNull", "Accuracy", "AccuracyLower", "AccuracyUpper")])))
}))
options(width=80)
mlp_training_results[,-5]
##   Size Maxit Shuffle learnFuncParams  Accuracy AccuracyLower AccuracyUpper
## 1   40    60   FALSE             0.0 0.9216444     0.9111560     0.9312556
## 2   20   100   FALSE             0.0 0.8616514     0.8481160     0.8744046
## 3   20   100    TRUE             0.0 0.8616514     0.8481160     0.8744046
## 4   50   100   FALSE             0.0 0.9312133     0.9213741     0.9401688
## 5   50   100   FALSE             0.1 0.9718549     0.9652281     0.9775216
## 6   50   100   FALSE            20.0 0.9718549     0.9652281     0.9775216
## 7   50   100   FALSE             3.0 0.9718549     0.9652281     0.9775216
mlp_predict_results[,-5]
##   Size Maxit Shuffle learnFuncParams  Accuracy AccuracyLower AccuracyUpper
## 1   40    60   FALSE             0.0 0.9208605     0.9098175     0.9309454
## 2   20   100   FALSE             0.0 0.8578947     0.8434986     0.8714364
## 3   20   100    TRUE             0.0 0.8578947     0.8434986     0.8714364
## 4   50   100   FALSE             0.0 0.9294920     0.9190484     0.9389697
## 5   50   100   FALSE             0.1 0.9725667     0.9656290     0.9784364
## 6   50   100   FALSE            20.0 0.9725667     0.9656290     0.9784364
## 7   50   100   FALSE             3.0 0.9725667     0.9656290     0.9784364
#1.1.7 close parallel computing#
stopCluster(cl)

###1.1.8 caculate the error rate and FOM###
error_rate_tuned_mlp<-1-max(mlp_predict_results[,6])
paste0("error rate of MLP: ",error_rate_tuned_mlp)
## [1] "error rate of MLP: 0.0274332957534761"
FOM_Tuned_MLP<-(sample_n/60000)/2+error_rate_tuned_mlp
paste0("FOM of Tuned_MLP: ",FOM_Tuned_MLP)
## [1] "FOM of Tuned_MLP: 0.0690999624201427"
Advertisements

1 thought on “Tune Multi-layer Perceptron (MLP) in R with MNIST”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s