Pages

Friday, 22 December 2017

What is Mean, Variance & Standard Deviation?



The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
Find out the Mean, the Variance, and the Standard Deviation.
Your first step is to find the Mean:

Answer:

Mean  =  600 + 470 + 170 + 430 + 3005  =  19705  =  394
so the mean (average) height is 394 mm. Let's plot this on the chart:

Now we calculate each dog's difference from the Mean:



To calculate the Variance, take each difference, square it, and then average the result:
variance calc
So the Variance is 21,704
And the Standard Deviation is just the square root of Variance, so:
Standard Deviation
  
σ= √21,704
 = 147.32...
 = 147 (to the nearest mm)
And the good thing about the Standard Deviation is that it is useful. Now we can show which heights are within one Standard Deviation (147mm) of the Mean:


So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what is extra large or extra small.
Rottweilers are tall dogs. And Dachshunds are a bit short ... but don't tell them!

Relation Between Variance and Standard Deviation


Analogous to the discrete case, we can define the expected value, variance, and standard deviation of a continuous random variable. These quantities have the same interpretation as in the discrete setting. The expectation of a random variable is a measure of the centre of the distribution, its mean value. The variance and standard deviation are measures of the horizontal spread or dispersion of the random variable.

Definition: Expected Value, Variance, and Standard Deviation of a Continuous Random Variable
The expected value of a continuous random variable X, with probability density function f(x), is the number given by
The variance of X is:
As in the discrete case, the standard deviation, σ, is the positive square root of the variance:

Simple Example

The random variable X is given by the following PDF. Check that this is a valid PDF and calculate the standard deviation of X.

Solution

Part 1

To verify that f(x) is a valid PDF, we must check that it is everywhere nonnegative and that it integrates to 1.
We see that 2(1-x) = 2 - 2x ≥ 0 precisely when x ≤ 1; thus f(x) is everywhere nonnegative.
To check that f(x) has unit area under its graph, we calculate
So f(x) is indeed a valid PDF.

Part 2

To calculate the standard deviation of X, we must first find its variance. Calculating the variance of X requires its expected value:
Using this value, we compute the variance of X as follows
Therefore, the standard deviation of X is

An Alternative Formula for Variance

There is an alternative formula for the variance of a random variable that is less tedious than the above definition.
Alternate Formula for the Variance of a Continuous Random Variable
The variance of a continuous random variable X with PDF f(x) is the number given by
The derivation of this formula is a simple exercise and has been relegated to the exercises. We should note that a completely analogous formula holds for the variance of a discrete random variable, with the integral signs replaced by sums.

Simple Example Revisited

We can use this alternate formula for variance to find the standard deviation of the random variable X defined above.
Remembering that E(X) was found to be 1/3, we compute the variance of X as follows:
In the exercises, you will compute the expectations, variances and standard deviations of many of the random variables we have introduced in this chapter, as well as those of many new ones.
source:

Thursday, 21 December 2017

Data Normalization

The word “normalization” is used informally in statistics, and so the term normalized data can have multiple meanings. In most cases, when you normalize data you eliminate the units of measurement for data, enabling you to more easily compare data from different places. Some of the more common ways to normalize data include:

Transforming data using a z-score or t-score. This is usually called standardization. In the vast majority of cases, if a statistics textbook is talking about normalizing data, then this is the definition of “normalization” they are probably using.

Re scaling data to have values between 0 and 1. This is usually called feature scaling. One possible formula to achieve this is:








Standardizing residuals: Ratios used in regression analysis can force residuals into the shape of a normal distribution

Normalizing Moments using the formula μ/σ.

Normalizing vectors (in linear algebra) to a norm of one. Normalization in this sense means to transform a vector so that it has a length of one.


%% Matlab code for Normalization using iris dataset

Download UCI iris dataset and save as Xlsx format

clc; clear all; close all

irisdata = xlsread('Dataset.xlsx');
X = irisdata(:,1:4);       % 150 columns of length 4

for ii =1:length(X)   
    Y(ii,:) = (X(ii,:)-min(X))./(max(X)-min(X));   

end

Lets Consider the following example

X =     5.1000    3.5000    1.4000    0.2000
    4.9000    3.0000    1.4000    0.2000
    4.7000    3.2000    1.3000    0.2000
    4.6000    3.1000    1.5000    0.2000
    5.0000    3.6000    1.4000    0.2000
    5.4000    3.9000    1.7000    0.4000
    4.6000    3.4000    1.4000    0.3000
    5.0000    3.4000    1.5000    0.2000
    4.4000    2.9000    1.4000    0.2000
    4.9000    3.1000    1.5000    0.1000

find minimum and maximum for each collum

Xmin = 4.4000    2.9000    1.3000    0.1000
Xmax = 5.4000    3.9000    1.7000    0.4000

substitute the equation,

X-Xmin on every value and Xmax-Xmin

Xnew = 
    0.7000    0.6000    0.2500    0.3333
    0.5000    0.1000    0.2500    0.3333
    0.3000    0.3000         0    0.3333
    0.2000    0.2000    0.5000    0.3333
    0.6000    0.7000    0.2500    0.3333
    1.0000    1.0000    1.0000    1.0000
    0.2000    0.5000    0.2500    0.6667
    0.6000    0.5000    0.5000    0.3333
         0         0    0.2500    0.3333
    0.5000    0.2000    0.5000         0



Tuesday, 19 December 2017

3D Visualization of a Multi Layer Perceptron (MLP)

http://scs.ryerson.ca/~aharley/vis/fc/
Adam Harley has created a 3d visualization of a Multi Layer Perceptron which has already been trained (using Backpropagation) on the MNIST Database of handwritten digits.
The network takes 784 numeric pixel values as inputs from a 28 x 28 image of a handwritten digit (it has 784 nodes in the Input Layer corresponding to pixels). The network has 300 nodes in the first hidden layer, 100 nodes in the second hidden layer, and 10 nodes in the output layer (corresponding to the 10 digits)
Although the network described here is much larger (uses more hidden layers and nodes) compared to the one we discussed in the previous section, all computations in the forward propagation step and backpropagation step are done in the same way (at each node) as discussed before.
A node which has a higher output value than others is represented by a brighter color. In the Input layer, the bright nodes are those which receive higher numerical pixel values as input. Notice how in the output layer, the only bright node corresponds to the digit 5 (it has an output probability of 1, which is higher than the other nine nodes which have an output probability of 0). This indicates that the MLP has correctly classified the input digit. I highly recommend playing around with this visualization and observing connections between nodes of different layers.
Check the following example
Draw a label of 5 (Numeric Digit)

Numeric Digit 5

Numeric Digit 5 with Hidden layer weight

For Your Use:

How it Works: Back Propagation in Neural Network with Mathematical Example

Back propagation is one of the algorithms used to find the error as well as to minimize the error in order to get the target output.
Let us consider the following example, to achieve the target output using back propagation algorithm.


GIVEN VALUES:

At input layer:
Inputs: i1=0.05, i2=0.10
Weights: w1=0.15, w2=0.20, w3=0.25, w4=0.30

At hidden layer:
Bias: b1=0.35

w5= 0.40, w6= 0.45, w7= 0.50, w8= 0.55

At output layer:
Bias: b2=0.60

Target output: o1= 0.01, o2= 0.99

To find:
  To reduce error and achieve the target output by using backward propagation algorithm.
Solution:
To find the hidden layer output:
net h1 = {(w1*i1) + (w2*i2)} + (b1*1)
           = {(0.15*0.05) + (0.20*0.10)} + (0.35*1)
net h1 = 0.3775
net h2 = {(w3*i1) + (w4*i2)} + (b1*1)
           = {(0.25*0.05) + (0.30*0.10)} + (0.35*1)
net h2 = 0.3925

Here, the sigmoid function is used to convert linear into nonlinear function at activation function in hidden layer.

out h1 = 1/(1+e^(-net h1))
           = 1/(1+e^(-0.3775))
out h1 = 0.593269992
out h2 = 1/(1+e^(-net h2))
           = 1/(1+e^(-0.3925))
out h2 = 0.5968843782

To find the output at output layer:

net o1 = {(w5*out h1)+(w6*out h2)}+(b2*1)
   = {(0.40*0.593269992) + (0.45*0.5968843782)} + (0.60*1)
net o1 = 1.105905967
net o2 = {(w7*out h1)+(w8*out h2)}+(b2*1)
           = {(0.50*0.593269992) + (0.55*0.5968843782)} + (0.60*1)
net o2 = 1.224921404

Again the sigmoid function is used in output layer,
out o1 = 1/(1+e^(-net o1))
           = 1/(1+e^(-1.105905967))
out o1 = 0.7513650695

out o2 = 1/(1+e^(-net o2))
           = 1/(1+e^(-1.224921404))
out o2 = 0.7729284653

To find the total error at output layer:

E total = ∑1/2(target - out) 2
E out o1 = 1/2(target o1 – out o1)2
               = 1/2(0.01-0.7513650695)2
E out o1 = 0.274811773
E out o2 = 1/2(target o2 – out o2)2
       = 1/2(0.99-0.7729284653)2
E out o2 = 0.0235600256

E total = E out o1+E out o2
            = 0.274811773+0.0235600256
E total  = 0.2983717986

To find the error at hidden layer:
E out h1 = (E out o1*w5) + (E out o2*w6)
 = (0.274811773*0.40) + (0.0235600256*0.45)
           E out h1   = 0.1205267207
E out h2 = (E out o1*w7) + (E out o2*w8)
 = (0.274811773*0.50) + (0.0235600256*0.55)
          E out h2 = 0.15036390036

According to chain rule, we calculate the updated weight of w5, w6, w7, w8.
To calculate w5+,
    (ԺE totalw5)   = (ԺE totalout o1)*(Ժ out o1net o1)*(Ժ net ­o1w5)
   (ԺE totalout o1) = 1/2(target o1 – out o1)2 + 1/2(target o2 – out o2)2
                                       = 2/2(target o1 – out o1)2-1 *(-1) + 0
                                  = (-target o1 + out o1)

 (ԺE totalout o1) = (-0.01+0.75136560695)
(ԺE totalout o1)  = 0.74136507
 (Ժ out o1net o1) = out o1 (1- out o1)
                               = 0.7513650695(1-0.7513650695)
out o1net o1) = 0.186815602


 net o1 = {(w5*out  h1) + (w6*out  h2­)} + (b2*1)
 Ժ net o1/ Ժw5 = 1*out h1+0+0
                      = out h1
   Ժ net o1/ Ժw5 = 0.59326992
       ԺE total/ Ժw5 = 0.74136507*0.186815602*0.59326992
       ԺE total/ Ժw5 = 0.0821670407
           W5+= w5-{ȵ* (ԺE total/ Ժw5)}
                             = 0.40-(0.5*0.0821670407)
 W5+= 0.35891648
              W5+= 0.35891648 is the updated weight for w5.

To calculate W6+,
                   W6= w6- {ȵ* (ԺE total/ Ժw6)}
   (ԺE total/ Ժw6) = (ԺE totalout o1)*(Ժ out o1net o1)*(Ժ net ­o1/Ժw6)
 (ԺE totalout o1) = 1/2(target o1 – out o1)2 + 1/2(target o2 – out o2)2
                                     =2/2(target o1 – out o1)2-1 *(-1) + 0
       = (-target o1 + out o1)
                               = (-0.01+0.75136560695)
  (ԺE total/ Ժ out o1)=0.74136507

  (Ժ out o1net o1) =out o1 (1- out o1)
                                =0.7513650695(1-0.7513650695)
  (Ժ out o1net o1) =0.186815602

   (Ժ net ­o1/Ժw6) = {(w6*out h2) + (w7* out h2) }+ (b2*1)
                =1*out h2+0+0
   (Ժ net o1/ Ժw6) = 0.5968843782

Substitute above values in (ԺE total/ Ժw6) ,
     (ԺE total/ Ժw6) = 0.082667628
      W6= w6-{ȵ*(ԺE total/ Ժ w6)}
               = 0.45-(0.5*0.082667628
   W6+= 0.408666186 ;     [ updated weight for w6]

To calculate the updated weight for w7
W7+= w7-{ȵ*(ԺE total/ Ժw7)}
(ԺE totalw7)  = (ԺE totalout o2)*(Ժ out o2net o2)*(Ժ net ­02w7)
(ԺE total/Ժ out o2)  = 1/2(target o1 – out o1)2 + 1/2(target o2 – out o2)2
                                       = 0 + {2/2(target o1 – out o1)2-1 *(-1)}
                                = (-target o2 + out o2)
                                =-0.99+0.7729284653

(ԺE total/Ժ out o2) = -0.2170715347
out o2/ Ժ net o2) = out o2 (1-out o2)
                              = 0.7729284653(1-0.77292846653)
out o2/ Ժ net o2) =0.1755100528

 net o2 = (w7*out  h1) + (w8*out h2)
net o2/ Ժw7) = out h1
net o2/ Ժw7) = 0.593269992
(ԺE totalw7) = (ԺE totalout o2)*(Ժ out o2net o2)*(Ժ net ­02w7)
(ԺE total/Ժw7) = -0.2170715347*0.1755100528*0.59326992
                         =-0.0226025377
 W7+= 0.50-(0.5*-0.0226025377
W7+= 0.511301270;     [updated weight for w7]

To calculate the updated weight for w8,
                              
W8+= w8 – {ȵ*(ԺE totalw7)}
(ԺE total/Ժw8) = (ԺE totalout o2)*(Ժ out o2net o2)*(Ժ net ­o2/Ժw8)
(ԺE total/Ժ out o2) = -0.2170715347
 (Ժ out o2/ Ժ net o2) = 0.1755100528
                            
net o2 = (w7*out  h1)+(w8*out h2)
net o2/ Ժw8) = out h2
                         = 0.5968843782

 (ԺE total/Ժw8) = (-0.2170715347*0.1755100528*0.59688437)
                          = -0.0227402422
                  W8+= w8-{ȵ*(ԺE total/ Ժw8)}
                         = 0.55-(0.5*-0.0227402422)
      W8+= 0.5613701211;       [updated weight for w8]

To calculate the updated weight for w1,
W1+= w1-{ȵ*(ԺE total/ Ժw1)}
(ԺE total/Ժw1) = (ԺE totalout h1)*(Ժ out h1net h1)*(Ժ net ­h1/Ժw1)
(ԺE totalout h1) = (ԺE o1out h1) + (ԺE o2out h1)
(ԺE o1out h1) = (ԺE total/Ժ out o1)* (Ժ net o1out h1)
                           = 0.74136507*0.186815602

                (ԺE o1net o1) = 0.138498562
                                net o1 = {(w5*out  h1)+(w6*out  h2­)} + (b2*1)
 (Ժ net o1out h1) = w5+0+0
 (Ժ net o1out h1) = w5;           [W5=0.40]
    (ԺE o1out h1) = 0.138498562*0.40
                               = 0.055399425

    (ԺE o2out h1) = (ԺE o2net o2)* (Ժ net o2 out h1)
                (ԺE o2net o2) = (ԺE o2out o2 )*(Ժ out o2net o2 )
                   =-0.2170715347*0.1755100528
     (ԺE o2net o2) = -0.038098236
                                 net o2 = {(w7*out h1) + (w8*out h2)}+(b2*1)

         (Ժ net o2/Ժ out h1) = (w7+0) + 0;          [W7=0.50]
             (ԺE o2/Ժ out h1) = (-0.038098236*0.50)
                                          =-0.019049118
        (ԺE total/Ժ out h1) = 0.05539945-0.019049118
                                        = 0.03650307
         (Ժ out h1/Ժ net h1) = out h1 (1-out h1)
                                          = 0.593269992(1-0.593269992)
                    (Ժ out h1/Ժ net h1) = 0.241300709
                    (Ժ net o1/Ժ net h1) = net h1
                                          net h1 = {(w1*i1)+(w3*i2)} + (b1*1)
                         (Ժ net h1 /Ժw1) = (i1 + 0) + 0;      [i1=0.05]
                         (Ժ net h1 /Ժw1) = 0.05
                          (ԺE total/Ժw1) = (ԺE total/Ժ out h1)*(Ժ out h1/Ժ net h1)*(Ժ net ¬h1/Ժw1)
                          (ԺE total/Ժw1) = (0.036350307*0.241300709*0.05)
                                                     = 0.0004385677
                                            W1+ = w1- {ȵ*(ԺE total/ Ժw1)} 
                                                     = 0.15-(0.5*0.0004385677)
                                            W1+ = 0.1497807162;     [updated weight for w1]
Similarly,
 
W2+=0.1956143
W3+=0.24975114
W4+=0.29950229

 Again calculate the output at hidden layer by using updated weight,
                               h1+ = {(w1+*i1) + (w2+*i2)}+(b1*1)
                                     = {(0.1497807162*0.05) + (0.10*0.19956143)}+(0.35)
                                      = 0.0274451758+0.35
                               h1+ = 0.3774451758
                               h2+ = {(w3+*i1)+(w4+*i2)}+(b1*1)
                                     = {(0.24975114*0.05) + (0.29950299*0.10)} + (0.35*1)
                              h2+ = 0.392437786

Again the sigmoid function is used in hidden layer,
                     out h1+ = 1/(1+e^(-h1+))
                                  = 1/(1+e^(-(0.3774451758)))
                   out h1+ = 0.59325676
                     out h2+ = 1/(1+e^(-h2+))
                                  = 1/(1+e^(-(0.392437786)))
                   out h2+ = 0.5968694086

Again calculate the output at output layer,
                         o1+ = {(w5+*out h1+) + (w6+*out h2+)}+(b2*1)
                                 = {(0.35891648*0.59325676) + (0.408666186*0.5968694086)}+ (0.60*1)
                        o1+ = 1.0568499728
                          o2+ = {(w7+*out h1+) + (w8+*out h2+)} + (b2*1)
                             = {(0.51130127*0.59325676) + (0.5613701211*0.5968694086)}+ (0.6*1)
                          o2+ = 1.238297587

Again calculate the output at hidden layer by using sigmoid function,
                     out h1+ = 1/(1+e^(-o1+))
                                 = 1/(1+e^(-(1.0568499728)))
                                  = {(0.1497807162*0.05) + (0.10*0.19956143)}+(0.35)
                                  = 0.0274451758+0.35
                          h1+ = 0.3774451758
                          h2+ = {(w3+*i1)+(w4+*i2)}+(b1*1)
                                = {(0.24975114*0.05) + (0.29950299*0.10)} + (0.35*1)
                         h2+ = 0.392437786

Again the sigmoid function is used in hidden layer,
                    out h1+ = 1/(1+e^(-h1+))
                                 = 1/(1+e^(-(0.3774451758)))
                  out h1+ = 0.59325676
                       
                     out h2+ = 1/(1+e^(-h2+))
                                  = 1/(1+e^(-(0.392437786)))
                out h2+ = 0.5968694086
       Again calculate the output at output layer,

      o1+ = {(w5+*out h1+) + (w6+*out h2+)}+(b2*1)
             = {(0.35891648*0.59325676) + (0.408666186*0.5968694086)}+ (0.60*1)
   o1+ = 1.0568499728

     o2+ = {(w7+*out h1+) + (w8+*out h2+)} + (b2*1)
        = {(0.51130127*0.59325676) + (0.5613701211*0.5968694086)} + (0.6*1)
     o2+ = 1.238297587

     Again calculate the output at hidden layer by using sigmoid function,

   out h1+ = 1/(1+e^(-o1+))
               = 1/(1+e^(-(1.0568499728)))
   out h1+ = 0.742088111
   out h2+ = 1/(1+e^(-o2+))                     
              = 1/(1+e^(-(1.238297587)))  
  out h2+ = 0.7752675456
                                                                                 
 To find the total error at output layer,

E+ total = ∑1/2(target - out) 2                       
    E+o1 = 1/2(target – out o1+) 2
    E+o1 = 1/2(0.01 – 0.742088111)2
 E+ o1 = 0.2679765011
    E+o2 = 1/2(target – out o2+) 2
              = 1/2(0.99 – 0.7752675456) 2
E+o2 = 0.0230550133
E+ total = E+o1+ E+o2
E+ total = 0.2910315144

Before back propagation , the total error value is 0.2983717986.  After completion of first back propagation, the error value is reduced to 0.2910315144. The above process is repeated for 10,000 times to achieve the target output. For example, after the completion of 10,000 times, the error is reduced to 0.0000351085. At this point, feed forward input as 0.05 and 0.1, the two output neurons generate 0.015912196 9 VS (0.01 target) and 0.984065734 VS (0.99 target).  


Any Doubts reply back