Neural Networks for Concrete Strength Analysis in R, with Interpretation

Neural networks work best when the input data are scaled to a narrow range around zero, and here, we see values ranging anywhere from zero up to over a thousand.

Example – Modeling the strength of concrete with ANNs

Data Source: (http://archive.ics.uci.edu/ml)

At very first, we scale the data completely.

 

</pre>
concrete<-read.csv("concrete.csv")
str(concrete)

## 'data.frame':    1030 obs. of  9 variables:
##  $ cement      : num  540 540 332 332 199 ...
##  $ slag        : num  0 0 142 142 132 ...
##  $ ash         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ water       : num  162 162 228 228 192 228 228 228 228 228 ...
##  $ superplastic: num  2.5 2.5 0 0 0 0 0 0 0 0 ...
##  $ coarseagg   : num  1040 1055 932 932 978 ...
##  $ fineagg     : num  676 676 594 594 826 ...
##  $ age         : int  28 28 270 365 360 90 365 28 28 28 ...
##  $ strength    : num  80 61.9 40.3 41 44.3 ...

# have the data scaled since NN works best with normalized data
cnorm<-as.data.frame(scale(concrete))
summary(cnorm$strength)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
## -2.00500 -0.72480 -0.08218  0.00000  0.61760  2.80000

summary(concrete$strength)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##    2.33   23.71   34.44   35.82   46.14   82.60

cttrain<-cnorm[1:800,]
cttest<-cnorm[801:1030,]
# neuralnet package by Stefan Fritsch and Frauke Guenther provides
# a standard and easy-to-use implementation of such networks. It also
# offers a function to plot the network topology.
library(neuralnet)
# manually input every predictor
cnn<-neuralnet(strength~cement+slag+ash+water+superplastic+coarseagg+
fineagg+age,data=cttrain)
cnnrresults<-compute(cnn,cttest[,1:8])
cnnpr<-cnnrresults$net.result
cor(cnnpr,cttest$strength)
<pre>##              [,1]
 ## [1,] 0.7105475408

If we plot the model with plot(cnn), we have:
screen-shot-2017-03-05-at-11-53-06-am

The weights for each of the connections are also depicted, as are the bias terms (indicated by the nodes labeled with the number 1). The bias terms are numeric constants that allow the value at the indicated nodes to be shifted upward or downward, much like the intercept in a linear equation. The weight between each input node and the hidden node is similar to the regression coefficients, and the weight for the bias term is similar to the intercept. 

At the bottom of the figure, R reports the number of training steps and an error measure called the Sum of Squared Errors (SSE), which as you might expect, is the sum of the squared predicted minus actual values. A lower SSE implies better predictive performance. This is helpful for estimating the model’s performance on the training data, but tells us little about how it will perform on unseen data.

More detailed interpretation comes later. Now we tune the model with more hidden layers.

</pre>
#Tune the model
# Takes longer time to model
cnn2<-neuralnet(strength~cement+slag+ash+water+superplastic+coarseagg+
fineagg+age,data=cttrain,hidden=5,stepmax=1e6)
cnnrresults2<-compute(cnn2,cttest[,1:8])
cnnpr2<-cnnrresults2$net.result
cor(cnnpr2,cttest$strength)

##              [,1]
## [1,] 0.7690961938
<pre>

Still we plot with plot(cnn2), and we have
screen-shot-2017-03-05-at-11-30-13-am

More detailed interpretation :

To plot it is best way to think of the model with respect to images (something neural networks are very good at).

1. The left-most nodes (i.e. input nodes) are raw data variables.

2. The arrows in black (and associated numbers) are the weights which we can think of as how much that variable contributes to the next node. The blue lines are the bias weights—We want to be able to shift the entire curve to the right or to the left by some degree. In sigmoid formula, the bias enable us to change the output of the network becomes sig(w0*x + w1*1.0), as shown below

3. The middle nodes (i.e. anything between the input and output nodes) are the hidden nodes. This is where the image analogy helps. Each of these nodes constitute a component that the network is learning to recognize. For example a nose, mouth, or eye. This is not easily determined and is far more abstract when we are dealing with non-image data.

4. The far-right (output node(s)) node is the final output of the neural network. Note that this all is omitting the activation function that would be applied at each layer of the network as well.

More information about interpretation of NN:

http://labs.eeb.utoronto.ca/jackson/ecol.%20modelling%20ANN.pdf



We can also use nnet package in R to do the job, but the visualization would be a little tricky.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s