machine_learning.linear_discriminant_analysis

Linear Discriminant Analysis

Assumptions About Data :
  1. The input variables has a gaussian distribution.

  2. The variance calculated for each input variables by class grouping is the same.

  3. The mix of classes in your training set is representative of the problem.

Learning The Model :
The LDA model requires the estimation of statistics from the training data :
  1. Mean of each input value for each class.

  2. Probability of an instance belong to each class.

  3. Covariance for the input data for each class

Calculate the class means :

mean(x) = 1/n ( for i = 1 to i = n –> sum(xi))

Calculate the class probabilities :

P(y = 0) = count(y = 0) / (count(y = 0) + count(y = 1)) P(y = 1) = count(y = 1) / (count(y = 0) + count(y = 1))

Calculate the variance :
We can calculate the variance for dataset in two steps :
  1. Calculate the squared difference for each input variable from the group mean.

Squared_Difference = (x - mean(k)) ** 2 Variance = (1 / (count(x) - count(classes))) *

(for i = 1 to i = n –> sum(Squared_Difference(xi)))

Making Predictions :
discriminant(x) = x * (mean / variance) -

((mean ** 2) / (2 * variance)) + Ln(probability)

After calculating the discriminant value for each class, the class with the largest discriminant value is taken as the prediction.

Author: @EverLookNeverSee

Attributes

num

Functions

accuracy(→ float)

Calculate the value of accuracy based-on predictions

calculate_mean(→ float)

Calculate given class mean

calculate_probabilities(→ float)

Calculate the probability that a given instance will belong to which class

calculate_variance(→ float)

Calculate the variance

gaussian_distribution(→ list)

Generate gaussian distribution instances based-on given mean and standard deviation

main()

This function starts execution phase

predict_y_values(→ list)

This function predicts new indexes(groups for our data)

valid_input(→ num)

Ask for user value and validate that it fulfill a condition.

y_generator(→ list)

Generate y values for corresponding classes

Module Contents

machine_learning.linear_discriminant_analysis.accuracy(actual_y: list, predicted_y: list) float

Calculate the value of accuracy based-on predictions :param actual_y:a list containing initial Y values generated by ‘y_generator’

function

Parameters:

predicted_y – a list containing predicted Y values generated by ‘predict_y_values’ function

Returns:

percentage of accuracy

>>> actual_y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
... 1, 1 ,1 ,1 ,1 ,1 ,1]
>>> predicted_y = [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0,
... 0, 0, 1, 1, 1, 0, 1, 1, 1]
>>> accuracy(actual_y, predicted_y)
50.0
>>> actual_y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
... 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
>>> predicted_y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,
... 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
>>> accuracy(actual_y, predicted_y)
100.0
machine_learning.linear_discriminant_analysis.calculate_mean(instance_count: int, items: list) float

Calculate given class mean :param instance_count: Number of instances in class :param items: items that related to specific class(data grouping) :return: calculated actual mean of considered class

>>> items = gaussian_distribution(5.0, 1.0, 20)
>>> calculate_mean(len(items), items)
5.011267842911003
machine_learning.linear_discriminant_analysis.calculate_probabilities(instance_count: int, total_count: int) float

Calculate the probability that a given instance will belong to which class :param instance_count: number of instances in class :param total_count: the number of all instances :return: value of probability for considered class

>>> calculate_probabilities(20, 60)
0.3333333333333333
>>> calculate_probabilities(30, 100)
0.3
machine_learning.linear_discriminant_analysis.calculate_variance(items: list, means: list, total_count: int) float

Calculate the variance :param items: a list containing all items(gaussian distribution of all classes) :param means: a list containing real mean values of each class :param total_count: the number of all instances :return: calculated variance for considered dataset

>>> items = gaussian_distribution(5.0, 1.0, 20)
>>> means = [5.011267842911003]
>>> total_count = 20
>>> calculate_variance([items], means, total_count)
0.9618530973487491
machine_learning.linear_discriminant_analysis.gaussian_distribution(mean: float, std_dev: float, instance_count: int) list

Generate gaussian distribution instances based-on given mean and standard deviation :param mean: mean value of class :param std_dev: value of standard deviation entered by usr or default value of it :param instance_count: instance number of class :return: a list containing generated values based-on given mean, std_dev and

instance_count

>>> gaussian_distribution(5.0, 1.0, 20) 
[6.288184753155463, 6.4494456086997705, 5.066335808938262, 4.235456349028368,
 3.9078267848958586, 5.031334516831717, 3.977896829989127, 3.56317055489747,
  5.199311976483754, 5.133374604658605, 5.546468300338232, 4.086029056264687,
   5.005005283626573, 4.935258239627312, 3.494170998739258, 5.537997178661033,
    5.320711100998849, 7.3891120432406865, 5.202969177309964, 4.855297691835079]
machine_learning.linear_discriminant_analysis.main()

This function starts execution phase

machine_learning.linear_discriminant_analysis.predict_y_values(x_items: list, means: list, variance: float, probabilities: list) list

This function predicts new indexes(groups for our data) :param x_items: a list containing all items(gaussian distribution of all classes) :param means: a list containing real mean values of each class :param variance: calculated value of variance by calculate_variance function :param probabilities: a list containing all probabilities of classes :return: a list containing predicted Y values

>>> x_items = [[6.288184753155463, 6.4494456086997705, 5.066335808938262,
...                4.235456349028368, 3.9078267848958586, 5.031334516831717,
...                3.977896829989127, 3.56317055489747, 5.199311976483754,
...                5.133374604658605, 5.546468300338232, 4.086029056264687,
...                5.005005283626573, 4.935258239627312, 3.494170998739258,
...                5.537997178661033, 5.320711100998849, 7.3891120432406865,
...                5.202969177309964, 4.855297691835079], [11.288184753155463,
...                11.44944560869977, 10.066335808938263, 9.235456349028368,
...                8.907826784895859, 10.031334516831716, 8.977896829989128,
...                8.56317055489747, 10.199311976483754, 10.133374604658606,
...                10.546468300338232, 9.086029056264687, 10.005005283626572,
...                9.935258239627313, 8.494170998739259, 10.537997178661033,
...                10.320711100998848, 12.389112043240686, 10.202969177309964,
...                9.85529769183508], [16.288184753155463, 16.449445608699772,
...                15.066335808938263, 14.235456349028368, 13.907826784895859,
...                15.031334516831716, 13.977896829989128, 13.56317055489747,
...                15.199311976483754, 15.133374604658606, 15.546468300338232,
...                14.086029056264687, 15.005005283626572, 14.935258239627313,
...                13.494170998739259, 15.537997178661033, 15.320711100998848,
...                17.389112043240686, 15.202969177309964, 14.85529769183508]]
>>> means = [5.011267842911003, 10.011267842911003, 15.011267842911002]
>>> variance = 0.9618530973487494
>>> probabilities = [0.3333333333333333, 0.3333333333333333, 0.3333333333333333]
>>> predict_y_values(x_items, means, variance,
...                  probabilities)  
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2]
machine_learning.linear_discriminant_analysis.valid_input(input_type: collections.abc.Callable[[object], num], input_msg: str, err_msg: str, condition: collections.abc.Callable[[num], bool] = lambda _: ..., default: str | None = None) num

Ask for user value and validate that it fulfill a condition.

Input_type:

user input expected type of value

Input_msg:

message to show user in the screen

Err_msg:

message to show in the screen in case of error

Condition:

function that represents the condition that user input is valid.

Default:

Default value in case the user does not type anything

Returns:

user’s input

machine_learning.linear_discriminant_analysis.y_generator(class_count: int, instance_count: list) list

Generate y values for corresponding classes :param class_count: Number of classes(data groupings) in dataset :param instance_count: number of instances in class :return: corresponding values for data groupings in dataset

>>> y_generator(1, [10])
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> y_generator(2, [5, 10])
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
>>> y_generator(4, [10, 5, 15, 20]) 
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
machine_learning.linear_discriminant_analysis.num