Find-S algorithm using Python

In this article, we will learn how to implement the Find-S algorithm in Machine Learning. It is a concept learning algorithm in ML. This algorithm only considers positive training examples. We try to find the most specific hypothesis that fits all the positive training examples. Initialize the hypothesis initially with the most specific hypothesis and whenever we get a positive example we update the hypothesis and move towards a general hypothesis.

Most Specific hypothesis: {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}

Most General hypothesis: {?,?,?,?,?,?}

In this ϕ indicates no value is accepted whereas? indicates any value is accepted. If a single value is required for a certain attribute then that single value is used to represent that attribute.

CSV link is given at the end of this tutorial.

Steps Followed in Find-S algorithm:

  1. Initialize the hypothesis with most specific hypothesis (i.e. {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}).
  2. Then iterate through training examples.
  3. If the current training example is a negative one then leave the hypothesis unchanged.
  4. If it is a positive training example then if the hypothesis is too specific for that example update the hypothesis and move towards a more general one (i.e. if the attribute value is the same as the hypothesis value leave it unchanged else change it to ‘?’).
  5.  Repeat this process until all training examples are completed.
  6. After all training examples are completed we get final hypothesis and it can be used to classify unseen data.

Example:

Consider the following training data:

OutlookTemperatureHumidityWindPlay
SunnyWarmNormalStrongYes
SunnyWarmHighStrongYes
RainyColdHighStrongNo

Initial hypothesis (h) = [ϕ, ϕ, ϕ, ϕ]

Training example 1:

It is a positive example and the hypothesis is more specific, update it:

[ϕ, ϕ, ϕ, ϕ] → [‘Sunny’, ‘Warm’, ‘Normal’, ‘Strong’]

Hypothesis after step 1: h = [‘Sunny’, ‘Warm’, ‘Normal’, ‘Strong’]

Training example 2:

It is a positive example and the hypothesis is more specific, update it again:

[‘Sunny’, ‘Warm’, ‘Normal’, ‘Strong’] → [‘Sunny’, ‘Warm’, ‘?’, ‘Strong’]

Here modify ‘Normal’ to ‘?’ since the previous and current values are not the same generalize it.

Hypothesis after step 2: h = [‘Sunny’, ‘Warm’, ‘?’, ‘Strong’]

Training example 3:

Since it is a negative training example don’t consider it. The hypothesis remains unchanged.

Hypothesis after step 3: h = [‘Sunny’, ‘Warm’, ‘?’, ‘Strong’]

All training examples are explored and the final hypothesis is obtained, it is h = [‘Sunny’, ‘Warm’, ‘?’, ‘Strong’]. This hypothesis can be used to classify unseen data.

Code Implementation in Python:

Importing the necessary libraries and reading the dataset:

import pandas as pd
df = pd.read_csv('./data.csv')
df

Output:

Find-S algorithm using Python

Exploring the dataset:

df.info()

Output:

 

for i in df.columns:
    print(i, ':', df[i].unique())

Output:

 

Creating concepts and target variables:

target = df['enjoy sport'].values
target

Output:

 

concepts = df[df.columns[df.columns != 'enjoy sport']].values
concepts

Output:

 

Setting initial hypothesis to most specific hypothesis:

initial_hypothesis = ['0']*concepts.shape[1]
initial_hypothesis

Output:

 

Implementing Find-S algorithm:

Here we are iterating through every row of our dataset and if it is a positive example then check if it has the most specific value and then update it to the values of the current row of the dataset. Else if it already has a value in it then check if it matches the value of the current row of the dataset else change it to the general hypothesis. Finally, return the final hypothesis.

rows, columns = concepts.shape
def Find_S(initial_hypothesis):
    for i in range(rows):    
        if target[i] == 'yes':
            for j in range(columns):
                if initial_hypothesis[j] == '0':
                    initial_hypothesis[j] = concepts[i][j]
                elif initial_hypothesis[j] != concepts[i][j]:
                    initial_hypothesis[j] = '?'
        print('Step ', i+1, ': Hypothesis', initial_hypothesis)
        
    return initial_hypothesis

Getting the final hypothesis:

final_hypothesis = Find_S(initial_hypothesis)
print('Final Hypothesis is : ', final_hypothesis)

Output:

Link to GitHub repository: https://github.com/OmTejaswini/CodeSpeedy/tree/main/Find-S%20Algorithm

Leave a Reply

Your email address will not be published. Required fields are marked *