K-Nearest Neighbors Algorithm-Vector style ( supervised learning)
The K-nearest neighbor algorithm is a supervised learning algorithm where the computer learns how to classify new data-points based on the training examples that you give it.
This algorithm can handle any number of categories unlike the logistic regression classifier .
Let us see what it does using an example...
Suppose you have to classify data into multiple categories based on any number of features. For illustrative purposes, we shall look at just two features.
The figure shows clusters of four categories (red ,green ,blue and brown) and their centroids plotted against two arbitrary features.
Now we need to find out which category the new point belongs to. This can simply be done by comparing the vector component of the new point on the centroid vectors of the four clusters. The cluster centroid on which the new point's vector subtends the largest component can safely be assumed to be the parent category.
For this algorithm, I am giving the Matlab code below. You can also download from my Github page.
Data format
- Your training data can be in the form of a .csv or a similarly syntaxed text file.
- Your complete dataset should consist of features which are NUMERICALLY DESCRIBABLE and only the last column of your dataset must contain the decimal category label.
- You should know the total number of categories that your dataset depicts.
###############################################
% k nearest neighbours vector style
fprintf(' welcome to the vector style k nearest neighbour module\n');
da=input('please mention the dataset file in single quotes :\n');
d=load(da);
s=size(d);
l=s(1,1);
b=s(1,2);
fprintf('there are %i datapoints with %i dimensions\n',l,b-1);
K=input('how many categories are depicted by your dataset?');
x=d;
group=zeros(1,b-1);
centroids=zeros(b-1,K);
temp_avg=zeros(b-1,K);
count=0;
%determining the centroid vector for each class
for j=1:K
for i=1:l
if x(i,b)==j
temp_avg(:,j)=temp_avg(:,j)+x(i,1:b-1)';
count=count+1;
group(1,:)=(1/count).*temp_avg(:,j);
centroids(:,j)=group;
end
end
end
fprintf('the following are the centroids of the %i categories\n',K);
disp(centroids);
test=input('you can now classify new vectors.\n please enter the new vector :');
component=zeros(1,K);
for i=1:K
component(1,i)=magnitude(test)*cosine(test,centroids(:,i));
end
[max_value, index] = max(component(:));
y=index;
fprintf('the vector that you want to test belongs to category : %i\n;',y);
##################################################
save the following function as 'magnitude.m' in the working directory.
############################################
function x = magnitude(a)
x=sqrt(sum(a.^2));
end
##########################################
save the following function as 'cosine.m' in the working directory
####################
function x=cosine(a,b)
c=sqrt(sum(a.^2));
d=sqrt(sum(b.^2));
x=dot(a,b)/(c*d);
end
###################
No comments:
Post a Comment