Regularization in Neural Network, with MNIST and Deepnet of R

Several regularization methods are helpful to reduce overfitting of nn model.
1. L1 penalty is also known as the Least Absolute Shrinkage and Selection Operator (lasso). The penalty term uses the sum of the absolute weights, so the degree of penalty is no smaller or larger for small or large weights
People are more familiar with Lasso regression.

2. Weight decay is L2 penalty in neural networks. Instead of adding a penalty based on the sum of the absolute weights, the penalty is based on the squared weights. People are more familiar with its application in Ridge regression.

https://charleshsliao.wordpress.com/2017/02/28/example-of-ridge-and-lasso-regression/

3. Ensemble and average models
This is quite straightforward, right?

4. Dropout
Dropout forces models to be more robust to perturbations. During the training of the model, units (for example, inputs, hidden neurons, and so on) are probabilistically dropped along with all connections to and from them.

5. Other methods such as auto-encoder can also be used for regularization.

We will use dropout method here.

###skip L1 Penalty of Regularization (lasso)###
### L2 penalty is weight decay in nnet###
load_image_file <- function(filename) {
  ret = list()
  f = file(filename,'rb')
  readBin(f,'integer',n=1,size=4,endian='big')
  ret$n = readBin(f,'integer',n=1,size=4,endian='big')
  nrow = readBin(f,'integer',n=1,size=4,endian='big')
  ncol = readBin(f,'integer',n=1,size=4,endian='big')
  x = readBin(f,'integer',n=ret$n*nrow*ncol,size=1,signed=F)
  ret$x = matrix(x, ncol=nrow*ncol, byrow=T)
  close(f)
  ret
}

load_label_file <- function(filename) { 
  f = file(filename,'rb')
  readBin(f,'integer',n=1,size=4,endian='big')
  n = readBin(f,'integer',n=1,size=4,endian='big')
  y = readBin(f,'integer',n=n,size=1,signed=F)
  close(f)
  y
}

#save data as dataframe
# though not sure what to do with label data set...now
# convert labels in to categorial value
imagetraining<-as.data.frame(load_image_file("train-images-idx3-ubyte"))
imagetest<-as.data.frame(load_image_file("t10k-images-idx3-ubyte"))
labeltraining<-as.factor(load_label_file("train-labels-idx1-ubyte"))
labeltest<-as.factor(load_label_file("t10k-labels-idx1-ubyte"))
imagetraining[,1]<-labeltraining
imagetest[,1]<-labeltest
Training<-imagetraining
Test<-imagetest 
sample_n<-5000
training<-Training[sample(60000,sample_n),]
digits_x<-training[,-1]
digits_y<-training$n
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(parallel)
library(doSNOW)
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: snow
## 
## Attaching package: 'snow'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, clusterSplit, makeCluster,
##     parApply, parCapply, parLapply, parRapply, parSapply,
##     splitIndices, stopCluster
library(foreach)
library(deepnet)
library(RSNNS)
## Loading required package: Rcpp
## 
## Attaching package: 'RSNNS'
## The following objects are masked from 'package:caret':
## 
##     confusionMatrix, train
cl <- makeCluster(detectCores())
registerDoSNOW(cl)
nn_digits_models<-foreach(i=1:6,.combine = "c")%dopar%{
  set.seed(2011)
  list(deepnet::nn.train(
    x=as.matrix(digits_x),
    y=model.matrix(~0+digits_y),
    hidden=c(40,80,40,80,40,80)[i],
    activationfun = "tanh",
    learningrate=0.7,
    momentum=0.5,
    numepochs = 150,
    output = "softmax",
    hidden_dropout = c(0,0,0.5,0.5,0.2,0.2)[i],
    visible_dropout = c(0,0,0.2,0.2,0.5,0.5)[i]))
}

###get multiple nn models and test results###
mt_nn_results_test<-lapply(nn_digits_models,function(obj){
  encodeClassLabels(nn.predict(obj,as.matrix(Test[,-1])))
})
nn_results_test<-do.call(cbind,lapply(mt_nn_results_test,function(results){
  caret::confusionMatrix(xtabs(~I(results-1)+Test$n))$overall
}))
colnames(nn_results_test)<-c("N40","N80","N40.5.2","N80.5.2","N40.2.5","N80.2.5")
stopCluster(cl)
###print them out###
options(digits=4)
nn_results_test
##                   N40    N80 N40.5.2 N80.5.2 N40.2.5 N80.2.5
## Accuracy       0.8770 0.8815  0.8965  0.9066  0.8956  0.8883
## Kappa          0.8633 0.8683  0.8850  0.8962  0.8839  0.8758
## AccuracyLower  0.8704 0.8750  0.8904  0.9007  0.8894  0.8820
## AccuracyUpper  0.8834 0.8878  0.9024  0.9122  0.9015  0.8944
## AccuracyNull   0.1135 0.1135  0.1135  0.1135  0.1135  0.1135
## AccuracyPValue 0.0000 0.0000  0.0000  0.0000  0.0000  0.0000
## McnemarPValue     NaN    NaN     NaN     NaN     NaN     NaN

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s