SVM(e1071 of R) Tuning with MNIST


Handwriting recognition is a well-studied subject in computer vision and has found wide applications in our daily life (such as USPS mail sorting). In this project, we will explore various machine learning techniques for recognizing handwriting digits. The dataset you will be using is the well-known MINST dataset.

(1) The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. (

Below is an example of some digits from the MNIST dataset:


The goal of this project is to build a 10-class classifier to recognize those handwriting digits as accurately as you can. Though deep learning has been widely used for this dataset, in this project, you should NOT use any deep neural nets (DNN) to do the recognition. Rather, you need to use the techniques we have learned so far from the class (such as logistic regression, SVM etc.) plus some other reasonable non-DNN related machine learning techniques (such as random forest, decision tree etc. – though we have not covered those subject in the class yet) to do the work.

Build a classifier using all pixels as features for handwriting recognition.

After loading the dataset with R, we have training dataset and test dataset.

Now we are trying to conduce classification and product predictive model based on SVM. This is original code within R with default attributes:

#SVM. It will take a lot of time so I stopped here. DO NOT TRY THIS!
pt <- proc.time() # system time tracking
svmmodel <- svm(Training$n ~ ., method="class",data = Training)
proc.time() - pt

Typical attributes of SVM function within e0171 package of R include:

formula,data,x,y,scale,kernel,degree,gamma ,cost


Usually, the decision is whether to use linear or an RBF (aka Gaussian) kernel. There are two main factors to consider:

Solving the optimisation problem for a linear kernel is much faster, see e.g. LIBLINEAR.

Typically, the best possible predictive performance is better for a nonlinear kernel (or at least as good as the linear one).



parameter needed for all kernels except linear (default: 1/(data dimension))
Intuitively, the C parameter trades off mis_classification of training examples against simplicity of the decision surface. Low  value C tends to make decision surface smooth, while a high C tries all training examples correctly by giving the model freedom to select more samples as support vectors.


Tuned code

pt <- proc.time()
svmmodel <- svm(formula=Training$n~.,data = Training,method="class",kernel="linear",scale=F, cost=10)
proc.time() - pt

3 thoughts on “SVM(e1071 of R) Tuning with MNIST”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s