Data Normalization
The
word “normalization” is used informally in statistics, and so the term normalized data can have multiple meanings. In
most cases, when you normalize data you eliminate the units of measurement for
data, enabling you to more easily compare data from different places. Some of
the more common ways to normalize data include:
Transforming data
using a z-score or t-score. This is usually called standardization. In the vast majority of cases, if a
statistics textbook is talking about normalizing data, then this is the
definition of “normalization” they are probably using.
Re scaling data to
have values between 0 and 1. This is usually called feature scaling. One
possible formula to achieve this is:
Standardizing
residuals: Ratios
used in regression analysis can force residuals into the shape
of a normal distribution
Normalizing
Moments using
the formula μ/σ.
Normalizing
vectors (in linear algebra) to a norm of one. Normalization in this sense
means to transform a vector so that it has a length of one.
%% Matlab code for Normalization using iris dataset
Download UCI iris dataset and save as Xlsx format
clc; clear all; close all
irisdata = xlsread('Dataset.xlsx');
X = irisdata(:,1:4); % 150 columns of length 4
for ii =1:length(X)
Y(ii,:) = (X(ii,:)-min(X))./(max(X)-min(X));
end
Lets Consider the following example
X = 5.1000 3.5000 1.4000 0.2000
4.9000 3.0000 1.4000 0.2000
4.7000 3.2000 1.3000 0.2000
4.6000 3.1000 1.5000 0.2000
5.0000 3.6000 1.4000 0.2000
5.4000 3.9000 1.7000 0.4000
4.6000 3.4000 1.4000 0.3000
5.0000 3.4000 1.5000 0.2000
4.4000 2.9000 1.4000 0.2000
4.9000 3.1000 1.5000 0.1000
find minimum and maximum for each collum
Xmin = 4.4000 2.9000 1.3000 0.1000
Xmax = 5.4000 3.9000 1.7000 0.4000
substitute the equation,
X-Xmin on every value and Xmax-Xmin
Xnew =
0.7000 0.6000 0.2500 0.3333
0.5000 0.1000 0.2500 0.3333
0.3000 0.3000 0 0.3333
0.2000 0.2000 0.5000 0.3333
0.6000 0.7000 0.2500 0.3333
1.0000 1.0000 1.0000 1.0000
0.2000 0.5000 0.2500 0.6667
0.6000 0.5000 0.5000 0.3333
0 0 0.2500 0.3333
0.5000 0.2000 0.5000 0
%% Matlab code for Normalization using iris dataset
Download UCI iris dataset and save as Xlsx format
clc; clear all; close all
irisdata = xlsread('Dataset.xlsx');
X = irisdata(:,1:4); % 150 columns of length 4
for ii =1:length(X)
Y(ii,:) = (X(ii,:)-min(X))./(max(X)-min(X));
end
Lets Consider the following example
X = 5.1000 3.5000 1.4000 0.2000
4.9000 3.0000 1.4000 0.2000
4.7000 3.2000 1.3000 0.2000
4.6000 3.1000 1.5000 0.2000
5.0000 3.6000 1.4000 0.2000
5.4000 3.9000 1.7000 0.4000
4.6000 3.4000 1.4000 0.3000
5.0000 3.4000 1.5000 0.2000
4.4000 2.9000 1.4000 0.2000
4.9000 3.1000 1.5000 0.1000
find minimum and maximum for each collum
Xmin = 4.4000 2.9000 1.3000 0.1000
Xmax = 5.4000 3.9000 1.7000 0.4000
substitute the equation,
X-Xmin on every value and Xmax-Xmin
Xnew =
0.7000 0.6000 0.2500 0.3333
0.5000 0.1000 0.2500 0.3333
0.3000 0.3000 0 0.3333
0.2000 0.2000 0.5000 0.3333
0.6000 0.7000 0.2500 0.3333
1.0000 1.0000 1.0000 1.0000
0.2000 0.5000 0.2500 0.6667
0.6000 0.5000 0.5000 0.3333
0 0 0.2500 0.3333
0.5000 0.2000 0.5000 0
No comments:
Post a Comment