Neural networks work best when the input data are scaled to a narrow range around zero, and here, we see values ranging anywhere from zero up to over a thousand.

Example – Modeling the strength of concrete with ANNs

Data Source: (http://archive.ics.uci.edu/ml)

At very first, we scale the data completely.

</pre> concrete<-read.csv("concrete.csv") str(concrete) ## 'data.frame': 1030 obs. of 9 variables: ## $ cement : num 540 540 332 332 199 ... ## $ slag : num 0 0 142 142 132 ... ## $ ash : num 0 0 0 0 0 0 0 0 0 0 ... ## $ water : num 162 162 228 228 192 228 228 228 228 228 ... ## $ superplastic: num 2.5 2.5 0 0 0 0 0 0 0 0 ... ## $ coarseagg : num 1040 1055 932 932 978 ... ## $ fineagg : num 676 676 594 594 826 ... ## $ age : int 28 28 270 365 360 90 365 28 28 28 ... ## $ strength : num 80 61.9 40.3 41 44.3 ... # have the data scaled since NN works best with normalized data cnorm<-as.data.frame(scale(concrete)) summary(cnorm$strength) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -2.00500 -0.72480 -0.08218 0.00000 0.61760 2.80000 summary(concrete$strength) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 2.33 23.71 34.44 35.82 46.14 82.60 cttrain<-cnorm[1:800,] cttest<-cnorm[801:1030,] # neuralnet package by Stefan Fritsch and Frauke Guenther provides # a standard and easy-to-use implementation of such networks. It also # offers a function to plot the network topology. library(neuralnet) # manually input every predictor cnn<-neuralnet(strength~cement+slag+ash+water+superplastic+coarseagg+ fineagg+age,data=cttrain) cnnrresults<-compute(cnn,cttest[,1:8]) cnnpr<-cnnrresults$net.result cor(cnnpr,cttest$strength) <pre>## [,1] ## [1,] 0.7105475408

If we plot the model with plot(cnn), we have:

The weights for each of the connections are also depicted, as are the **bias terms **(indicated by the nodes labeled with the number **1**). The bias terms are numeric constants that allow the value at the indicated nodes to be shifted upward or downward, much like the intercept in a linear equation. The weight between each input node and the hidden node is similar to the regression coefficients, and the weight for the bias term is similar to the intercept.

At the bottom of the figure, R reports the number of training steps and an error measure called the **Sum of Squared Errors **(**SSE**), which as you might expect, is the sum of the squared predicted minus actual values. A lower SSE implies better predictive performance. This is helpful for estimating the model’s performance on the training data, but tells us little about how it will perform on unseen data.

More detailed interpretation comes later. Now we tune the model with more hidden layers.

</pre> #Tune the model # Takes longer time to model cnn2<-neuralnet(strength~cement+slag+ash+water+superplastic+coarseagg+ fineagg+age,data=cttrain,hidden=5,stepmax=1e6) cnnrresults2<-compute(cnn2,cttest[,1:8]) cnnpr2<-cnnrresults2$net.result cor(cnnpr2,cttest$strength) ## [,1] ## [1,] 0.7690961938 <pre>

Still we plot with plot(cnn2), and we have

More detailed interpretation :

To plot it is best way to think of the model with respect to images (something neural networks are very good at).

1. The left-most nodes (i.e. input nodes) are raw data variables.

2. The arrows in black (and associated numbers) are the **weights** which we can think of as **how much that variable contributes to the next node.** The blue lines are the bias weights—We want to be able to shift the entire curve to the right or to the left by some degree. In sigmoid formula, the bias enable us to change the output of the network becomes sig(w_{0}*x + w_{1}*1.0), as shown below

3. The middle nodes (i.e. anything between the input and output nodes) are the hidden nodes. This is where the image analogy helps. **Each of these nodes constitute a component that the network is learning to recognize.** For example a nose, mouth, or eye. This is not easily determined and is far more abstract when we are dealing with non-image data.

4. The far-right (output node(s)) node is the final output of the neural network. Note that this all is omitting the activation function that would be applied at each layer of the network as well.

More information about interpretation of NN:

http://labs.eeb.utoronto.ca/jackson/ecol.%20modelling%20ANN.pdf

We can also use nnet package in R to do the job, but the visualization would be a little tricky.