Data and background:

https://charleshsliao.wordpress.com/2017/02/24/svm-tuning-based-on-mnist/

In this project, we used different classifiers to examine the dataset. The algorithms we have explored in our experiments are:

K-Nearest Neighbors algorithm(KNN) Support Vector Machine algorithm(SVM) Fast Nearest Neighbor algorithm(FNN) Naive Bayes algorithm(NBs)

Logistic Regression(Rpart).

We compared the results for each algorithm, and discussed the advantages and disadvantages of them based on accuracy and time consumed:

Compared the pros and cons of different classifiers.

KNN.

Pros:

- Simple and effective; Complex concepts can be learned by local approximation using simple procedures
- Makes no difficult assumptions

3. Fast training speed(relatively) ; We can preprocess training examples into fast data structures and Compute only an approximate distance. In addition, we can remove redundant data (condensing) to speed up training.

Cons:

- Receive only results instead of a model ; The model can not be interpreted (there is no description of the learned concepts)
- K value selection is art ;There is no simple rule for selecting K. When the K is too small the model, the classification of the row could be very sensitive. When Kis too large: neighbors include too many points from other classes
- Need to preprocess of non-numeric values
- It is computationally expensive to nd the k nearest neighbours when thedataset is very large

.FNN.

Just like KNN, with cover_tree applied here.

NBs. Pros:

- Simple and Fast
- Great with noise ; It is not sensitive to irrelevant features.
- Works well with both large and small data sets
- Easy to use for prediction

Cons:

- Naive Bayes makes a very strong assumption on the shape of your data distribution; features are equal and independent
- If the data set is large and some of the features are highly correlated, the model will perform poorly if independence assumptions do not hold.
- Not good at numeric data sets

Rpart. Pros:

- Combine strengths of decision tree to apply on numeric data
- Automatic features selection
- Easy to interpretation

Cons:

- Need large training data set
- Might be tricky to understand than decision tree

SVM.

Pros:

- Flexible use for both classification and numeric prediction
- It has a regularization parameter, which makes the user think about avoiding over-fitting.
- Great accuracy on classifiers

Cons:

- Multiple trials are needed to fit the best model
- Slow to train
- Difficult to interpret ; Only covers the determination of the parameters for a given value of the regularization and kernel parameters and choice of kernel.
- Problem need to be formulated as a two-class classification; SVM is a binary classifier. To do a multi-class classification a pairwise classification should be used instead (one class against all others, for all classes)

# 3.1 KNN algorithm. Read through before you try. Trainingk<-Training[,-1] Testk<-Test[,-1] Kfactork<-Training[,1] Tfactork<-Test[,1] pt<-proc.time()# check time consumed knnmodel<-knn(Trainingk,Testk,Kfactork,k=1) proc.time()-pt #user system elapsed #489.851 2.320 495.509 table(knnmodel,Tfactork) knnconfusion<-as.data.frame(table(knnmodel,Tfactork)) # build the confusion matrix write.csv(knnconfusion,file="knnconufusion.csv") # export confusion matrix for report writing knnerror<-sum(Test$n!=knnmodel)/nrow(Test) print(paste0("Accuary(Precision) of KNN: ",1-knnerror)) # [1] "Accuary(Precision) of Knn: 0.9691" # Change K value might increase accuracy. Cannot build individual predict example since # KNN in CLASS or FNN package can only deliver final predict result in form of factor. # 3.2 FNN algorithm Trainingf<-Training[,-1] Testf<-Test[,-1] Kfactorf<-Training[,1] Tfactorf<-Test[,1] pt<-proc.time() fnnmodel<-FNN::knn(Trainingf,Testf,Kfactorf,k=3,algorithm = "cover_tree") #knn in FNN package，with cover_tree to speed up original knn. k=3. proc.time()-pt write.csv(fnnmodel, file = "fnnmodel.csv") # user system elapsed # 1387.097 7.671 1399.589 # It took almost 1400s... table(fnnmodel,Tfactorf) fnnconfusion<-as.data.frame(table(fnnmodel,Tfactorf)) write.csv(fnnconfusion,file="fnnconufusion.csv") fnnerror<-sum(Test$n!=fnnmodel)/nrow(Test) print(paste0("Accuary(Precision) of Fnn: ",1-fnnerror)) # [1] "Accuary(Precision) of Fnn: 0.9705" # Predict with Fnnmodel. Take the value of the nth row label of test. row <-999 fnnp <- fnnmodel[row] print(paste0("Current Digit: ", as.character(Test$n[row]))) #[1] "Current Digit: 8" print(paste0("Predicted Digit: ",fnnp)) #[1] "Predicted Digit: 8" #Visualize the digit to see what it really look like. show_number(matrixtest$x[row,]) # 3.3 NBs(Naive Bayes); e1071 package needs R 3.3.2 pt <- proc.time() nbsmodel <- naiveBayes(Training$n ~ ., data = Training) proc.time() - pt # user system elapsed # 3.164 1.897 5.384 summary(nbsmodel) #Confusion Matrix nbsp<-predict(nbsmodel,newdata=Test,type = "class") nbstable<-table("Actual Value"=Test$n,"Predicted Value"=nbsp) nbstable nbsconfusion<-as.data.frame(nbstable) write.csv(nbsconfusion,file="nbsconufusion.csv") nbserror<-sum(Test$n!=nbsp)/nrow(Test) print(paste0("Accuracy(Precision) of NBs: ",1-nbserror)) # [1] "Accuracy(Precision) of NBs: 0.5352" # Predict with NBs row <- 123 nbspe <- predict(nbsmodel,newdata=Test[row,],type="class") print(paste0("Actual Digit: ", as.character(Test$n[row]))) #[1] "Actual Digit: 7" print(paste0("Predicted Digit: ",nbspe)) #[1] "Predicted Digit: 9" #Visualize the digit to see what it really look like. show_number(matrixtest$x[row,]) #3.4 SVM. It will take a lot of time so I stopped here. DO NOT TRY THIS! #samplenumber<-20000 # change sample size here vec<-seq(from=1,to=60000,by=1) mysample<-sample(vec,samplenumber) mysampleTraining<-Training[mysample,] pt <- proc.time() svmmodel <- svm(formula=mysampleTraining$n~.,data = mysampleTraining,method="class",kernel="linear",scale=F, cost=10) proc.time() - pt # user system elapsed # 199.228 1.867 202.425 svmp <- predict(svmmodel, newdata = Test, type = "class") svmtable<-table("Actual Value" = Test$n, "Predicted Value" = svmp) svmtable svmconfusion<-as.data.frame(svmtable) write.csv(svmconfusion,file="svmconufusion.csv") svmerror<-sum(Test$n!=svmp)/nrow(Test) print(paste0("Accuracy(Precision) of SVM: ",1-svmerror)) #[1] "Accuracy(Precision) of SVM: 0.9099" # Predict with SVMMODEL row <- 666 svmpe <- predict(svmmodel,newdata=Test[row,],type="class") print(paste0("Actual Digit: ", as.character(Test$n[row]))) #[1] "Actual Digit: 6" print(paste0("Predicted Digit: ",svmpe)) #[1] "Predicted Digit: 6" #Visualize the digit to see what it really look like. show_number(matrixtest$x[row,]) #3.5 rpart(Logstic Regression/Tree) pt <- proc.time() rpartmodel <- rpart(Training$n ~ ., method="class",data = Training) proc.time() - pt # user system elapsed # 109.692 1.975 112.673 printcp(rpartmodel) # give summary of rpartmodel #plot the tree structure plot(rpartmodel, uniform = TRUE, main = "Classification (RPART). Tree of Handwritten Digit Recognition") text(rpartmodel, all = TRUE, cex = 0.75) # draw the tree draw.tree(rpartmodel, cex = 0.5, nodeinfo = TRUE, col = gray(0:8/8)) rpartp <- predict(rpartmodel, newdata = Test, type = "class") rparttable<-table("Actual Value" = Test$n, "Predicted Value" = rpartp) rparttable rpartconfusion<-as.data.frame(rparttable) write.csv(rpartconfusion,file="rpartconufusion.csv") rparterror<-sum(Test$n!=rpartp)/nrow(Test) print(paste0("Accuracy(Precision) of Rpart: ",1-rparterror)) # [1] "Accuracy(Precision) of Rpart: 0.6196" # Predict with Rpart row <- 2017 rpartpe <- predict(rpartmodel,newdata=Test[row,],type="class") print(paste0("Actual Digit: ", as.character(Test$n[row]))) #[1] "Actual Digit: 7" print(paste0("Predicted Digit: ",rpartpe)) #[1] "Predicted Digit: 2" #Visualize the digit to see what it really look like. show_number(matrixtest$x[row,])