Pages

Thursday, 21 December 2017

Data Normalization

The word “normalization” is used informally in statistics, and so the term normalized data can have multiple meanings. In most cases, when you normalize data you eliminate the units of measurement for data, enabling you to more easily compare data from different places. Some of the more common ways to normalize data include:

Transforming data using a z-score or t-score. This is usually called standardization. In the vast majority of cases, if a statistics textbook is talking about normalizing data, then this is the definition of “normalization” they are probably using.

Re scaling data to have values between 0 and 1. This is usually called feature scaling. One possible formula to achieve this is:








Standardizing residuals: Ratios used in regression analysis can force residuals into the shape of a normal distribution

Normalizing Moments using the formula μ/σ.

Normalizing vectors (in linear algebra) to a norm of one. Normalization in this sense means to transform a vector so that it has a length of one.


%% Matlab code for Normalization using iris dataset

Download UCI iris dataset and save as Xlsx format

clc; clear all; close all

irisdata = xlsread('Dataset.xlsx');
X = irisdata(:,1:4);       % 150 columns of length 4

for ii =1:length(X)   
    Y(ii,:) = (X(ii,:)-min(X))./(max(X)-min(X));   

end

Lets Consider the following example

X =     5.1000    3.5000    1.4000    0.2000
    4.9000    3.0000    1.4000    0.2000
    4.7000    3.2000    1.3000    0.2000
    4.6000    3.1000    1.5000    0.2000
    5.0000    3.6000    1.4000    0.2000
    5.4000    3.9000    1.7000    0.4000
    4.6000    3.4000    1.4000    0.3000
    5.0000    3.4000    1.5000    0.2000
    4.4000    2.9000    1.4000    0.2000
    4.9000    3.1000    1.5000    0.1000

find minimum and maximum for each collum

Xmin = 4.4000    2.9000    1.3000    0.1000
Xmax = 5.4000    3.9000    1.7000    0.4000

substitute the equation,

X-Xmin on every value and Xmax-Xmin

Xnew = 
    0.7000    0.6000    0.2500    0.3333
    0.5000    0.1000    0.2500    0.3333
    0.3000    0.3000         0    0.3333
    0.2000    0.2000    0.5000    0.3333
    0.6000    0.7000    0.2500    0.3333
    1.0000    1.0000    1.0000    1.0000
    0.2000    0.5000    0.2500    0.6667
    0.6000    0.5000    0.5000    0.3333
         0         0    0.2500    0.3333
    0.5000    0.2000    0.5000         0



No comments:

Post a Comment