Auto Encoder to Detect Anomalous Cases in Smartphone Actimetry Data

We use a deep auto-encoder model to analyze actimetry data from smartphones. You can find the data here: 

http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones.

Why should we do this? An auto encoder can be useful for excluding unknown or unusual activities, rather than incorrectly classifying them, by examining whether any of the activities tend to have more or less anomalous values.

We can do this by finding which cases are anomalous, defined as the top 1% of error rates, and then extracting the activities of those cases and plotting them.

library(jsonlite)
library(h2o)
## 
## ----------------------------------------------------------------------
## 
## Your next step is to start H2O:
##     > h2o.init()
## 
## For H2O package documentation, ask for help:
##     > ??h2o
## 
## After starting H2O, you can use the Web UI at http://localhost:54321
## For more information visit http://docs.h2o.ai
## 
## ----------------------------------------------------------------------
## 
## Attaching package: 'h2o'
## The following objects are masked from 'package:stats':
## 
##     cor, sd, var
## The following objects are masked from 'package:base':
## 
##     &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
##     colnames<-, ifelse, is.character, is.factor, is.numeric, log,
##     log10, log1p, log2, round, signif, trunc
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:h2o':
## 
##     hour, month, week, year
options(width=70,digits=8)

train.x <- read.table("DATA_UCISMM/train/X_train.txt") 
train.y <- read.table("DATA_UCISMM/train/y_train.txt")[[1]] 
test.x <- read.table("DATA_UCISMM/test/X_test.txt") 
test.y <- read.table("DATA_UCISMM/test/y_test.txt")[[1]] 
labels <- read.table("DATA_UCISMM/activity_labels.txt") 
###set up cl to connect with h2o###
cl<-h2o.init(max_mem_size = "20G",nthreads = 10)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         1 hours 22 minutes 
##     H2O cluster version:        3.10.3.6 
##     H2O cluster version age:    1 month and 22 days  
##     H2O cluster name:           H2O_started_from_R_Charles_eyy930 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   17.56 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     R Version:                  R version 3.3.2 (2016-10-31)
h2o_act_train<-as.h2o(train.x, destination_frame ="h2o_act_train" )
## 
  |                                                                  
  |                                                            |   0%
  |                                                                  
  |============================================================| 100%
h2o_act_test<-as.h2o(test.x, destination_frame ="h2o_act_test" )
## 
  |                                                                  
  |                                                            |   0%
  |                                                                  
  |============================================================| 100%
encoder_act<-h2o.deeplearning(
  x=colnames(h2o_act_train),training_frame = h2o_act_train,
  validation_frame = h2o_act_test,activation = "TanhWithDropout",
  autoencoder = T,hidden = c(100,100),epochs = 30,
  sparsity_beta = 0,input_dropout_ratio = 0, hidden_dropout_ratios = c(0,0),
  l1=0,l2=0
)
## 
  |                                                                  
  |                                                            |   0%
  |                                                                                                                                
  |============================================================| 100%
encoder_act
## Model Details:
## ==============
## 
## H2OAutoEncoderModel: deeplearning
## Model ID:  DeepLearning_model_R_1492105732028_3 
## Status of Neuron Layers: auto-encoder, gaussian distribution, Quadratic loss, 122,961 weights/biases, 1.5 MB, 224,556 training samples, mini-batch size 1
##   layer units        type dropout       l1       l2 mean_rate
## 1     1   561       Input  0.00 %                            
## 2     2   100 TanhDropout  0.00 % 0.000000 0.000000  0.052854
## 3     3   100 TanhDropout  0.00 % 0.000000 0.000000  0.011127
## 4     4   561        Tanh         0.000000 0.000000  0.027136
##   rate_rms momentum mean_weight weight_rms mean_bias bias_rms
## 1                                                            
## 2 0.035932 0.000000    0.000654   0.091953 -0.010681 0.146476
## 3 0.003311 0.000000   -0.000452   0.059122 -0.007862 0.122332
## 4 0.012497 0.000000   -0.001589   0.053305  0.040287 0.046176
## 
## 
## H2OAutoEncoderMetrics: deeplearning
## ** Reported on training data. **
## 
## Training Set Metrics: 
## =====================
## 
## MSE: (Extract with `h2o.mse`) 0.00094088652
## RMSE: (Extract with `h2o.rmse`) 0.030673874
## 
## H2OAutoEncoderMetrics: deeplearning
## ** Reported on validation data. **
## 
## Validation Set Metrics: 
## =====================
## 
## MSE: (Extract with `h2o.mse`) 0.0010248366
## RMSE: (Extract with `h2o.rmse`) 0.032013069
error_ea<-as.data.frame(h2o.anomaly(encoder_act,h2o_act_train))
library(ggplot2)
puea<-ggplot(error_ea,aes(Reconstruction.MSE))+geom_histogram(binwidth = .001,fill="grey50")+
  geom_vline(xintercept = quantile(error_ea[[1]],probs=.99),linetype=2)+theme_bw()
print(puea)

Screen Shot 2017-04-13 at 3.21.24 PM.png

act.ano<-error_ea$Reconstruction.MSE>=quantile(error_ea[[1]],probs=.99)
pu.ano<-ggplot(as.data.frame(table(labels$V2[train.y[act.ano]])),aes(Var1,Freq))+
  geom_bar(stat="identity")+xlab("")+ylab("Frequency")+
  theme_classic()+theme(axis.text.x=element_text(angle=35,hjust=1,vjust=1))
plot(pu.ano)

Screen Shot 2017-04-13 at 3.22.03 PM.png

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s