machine_learning.linear_discriminant_analysis¶
Linear Discriminant Analysis
- Assumptions About Data :
The input variables has a gaussian distribution.
The variance calculated for each input variables by class grouping is the same.
The mix of classes in your training set is representative of the problem.
- Learning The Model :
- The LDA model requires the estimation of statistics from the training data :
Mean of each input value for each class.
Probability of an instance belong to each class.
Covariance for the input data for each class
- Calculate the class means :
mean(x) = 1/n ( for i = 1 to i = n –> sum(xi))
- Calculate the class probabilities :
P(y = 0) = count(y = 0) / (count(y = 0) + count(y = 1)) P(y = 1) = count(y = 1) / (count(y = 0) + count(y = 1))
- Calculate the variance :
- We can calculate the variance for dataset in two steps :
Calculate the squared difference for each input variable from the group mean.
Squared_Difference = (x - mean(k)) ** 2 Variance = (1 / (count(x) - count(classes))) *
(for i = 1 to i = n –> sum(Squared_Difference(xi)))
- Making Predictions :
- discriminant(x) = x * (mean / variance) -
((mean ** 2) / (2 * variance)) + Ln(probability)
After calculating the discriminant value for each class, the class with the largest discriminant value is taken as the prediction.
Author: @EverLookNeverSee
Attributes¶
Functions¶
|
Calculate the value of accuracy based-on predictions |
|
Calculate given class mean |
|
Calculate the probability that a given instance will belong to which class |
|
Calculate the variance |
|
Generate gaussian distribution instances based-on given mean and standard deviation |
|
This function starts execution phase |
|
This function predicts new indexes(groups for our data) |
|
Ask for user value and validate that it fulfill a condition. |
|
Generate y values for corresponding classes |
Module Contents¶
- machine_learning.linear_discriminant_analysis.accuracy(actual_y: list, predicted_y: list) float ¶
Calculate the value of accuracy based-on predictions :param actual_y:a list containing initial Y values generated by ‘y_generator’
function
- Parameters:
predicted_y – a list containing predicted Y values generated by ‘predict_y_values’ function
- Returns:
percentage of accuracy
>>> actual_y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, ... 1, 1 ,1 ,1 ,1 ,1 ,1] >>> predicted_y = [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, ... 0, 0, 1, 1, 1, 0, 1, 1, 1] >>> accuracy(actual_y, predicted_y) 50.0
>>> actual_y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, ... 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2] >>> predicted_y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, ... 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2] >>> accuracy(actual_y, predicted_y) 100.0
- machine_learning.linear_discriminant_analysis.calculate_mean(instance_count: int, items: list) float ¶
Calculate given class mean :param instance_count: Number of instances in class :param items: items that related to specific class(data grouping) :return: calculated actual mean of considered class
>>> items = gaussian_distribution(5.0, 1.0, 20) >>> calculate_mean(len(items), items) 5.011267842911003
- machine_learning.linear_discriminant_analysis.calculate_probabilities(instance_count: int, total_count: int) float ¶
Calculate the probability that a given instance will belong to which class :param instance_count: number of instances in class :param total_count: the number of all instances :return: value of probability for considered class
>>> calculate_probabilities(20, 60) 0.3333333333333333 >>> calculate_probabilities(30, 100) 0.3
- machine_learning.linear_discriminant_analysis.calculate_variance(items: list, means: list, total_count: int) float ¶
Calculate the variance :param items: a list containing all items(gaussian distribution of all classes) :param means: a list containing real mean values of each class :param total_count: the number of all instances :return: calculated variance for considered dataset
>>> items = gaussian_distribution(5.0, 1.0, 20) >>> means = [5.011267842911003] >>> total_count = 20 >>> calculate_variance([items], means, total_count) 0.9618530973487491
- machine_learning.linear_discriminant_analysis.gaussian_distribution(mean: float, std_dev: float, instance_count: int) list ¶
Generate gaussian distribution instances based-on given mean and standard deviation :param mean: mean value of class :param std_dev: value of standard deviation entered by usr or default value of it :param instance_count: instance number of class :return: a list containing generated values based-on given mean, std_dev and
instance_count
>>> gaussian_distribution(5.0, 1.0, 20) [6.288184753155463, 6.4494456086997705, 5.066335808938262, 4.235456349028368, 3.9078267848958586, 5.031334516831717, 3.977896829989127, 3.56317055489747, 5.199311976483754, 5.133374604658605, 5.546468300338232, 4.086029056264687, 5.005005283626573, 4.935258239627312, 3.494170998739258, 5.537997178661033, 5.320711100998849, 7.3891120432406865, 5.202969177309964, 4.855297691835079]
- machine_learning.linear_discriminant_analysis.main()¶
This function starts execution phase
- machine_learning.linear_discriminant_analysis.predict_y_values(x_items: list, means: list, variance: float, probabilities: list) list ¶
This function predicts new indexes(groups for our data) :param x_items: a list containing all items(gaussian distribution of all classes) :param means: a list containing real mean values of each class :param variance: calculated value of variance by calculate_variance function :param probabilities: a list containing all probabilities of classes :return: a list containing predicted Y values
>>> x_items = [[6.288184753155463, 6.4494456086997705, 5.066335808938262, ... 4.235456349028368, 3.9078267848958586, 5.031334516831717, ... 3.977896829989127, 3.56317055489747, 5.199311976483754, ... 5.133374604658605, 5.546468300338232, 4.086029056264687, ... 5.005005283626573, 4.935258239627312, 3.494170998739258, ... 5.537997178661033, 5.320711100998849, 7.3891120432406865, ... 5.202969177309964, 4.855297691835079], [11.288184753155463, ... 11.44944560869977, 10.066335808938263, 9.235456349028368, ... 8.907826784895859, 10.031334516831716, 8.977896829989128, ... 8.56317055489747, 10.199311976483754, 10.133374604658606, ... 10.546468300338232, 9.086029056264687, 10.005005283626572, ... 9.935258239627313, 8.494170998739259, 10.537997178661033, ... 10.320711100998848, 12.389112043240686, 10.202969177309964, ... 9.85529769183508], [16.288184753155463, 16.449445608699772, ... 15.066335808938263, 14.235456349028368, 13.907826784895859, ... 15.031334516831716, 13.977896829989128, 13.56317055489747, ... 15.199311976483754, 15.133374604658606, 15.546468300338232, ... 14.086029056264687, 15.005005283626572, 14.935258239627313, ... 13.494170998739259, 15.537997178661033, 15.320711100998848, ... 17.389112043240686, 15.202969177309964, 14.85529769183508]]
>>> means = [5.011267842911003, 10.011267842911003, 15.011267842911002] >>> variance = 0.9618530973487494 >>> probabilities = [0.3333333333333333, 0.3333333333333333, 0.3333333333333333] >>> predict_y_values(x_items, means, variance, ... probabilities) [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
- machine_learning.linear_discriminant_analysis.valid_input(input_type: collections.abc.Callable[[object], num], input_msg: str, err_msg: str, condition: collections.abc.Callable[[num], bool] = lambda _: ..., default: str | None = None) num ¶
Ask for user value and validate that it fulfill a condition.
- Input_type:
user input expected type of value
- Input_msg:
message to show user in the screen
- Err_msg:
message to show in the screen in case of error
- Condition:
function that represents the condition that user input is valid.
- Default:
Default value in case the user does not type anything
- Returns:
user’s input
- machine_learning.linear_discriminant_analysis.y_generator(class_count: int, instance_count: list) list ¶
Generate y values for corresponding classes :param class_count: Number of classes(data groupings) in dataset :param instance_count: number of instances in class :return: corresponding values for data groupings in dataset
>>> y_generator(1, [10]) [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] >>> y_generator(2, [5, 10]) [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] >>> y_generator(4, [10, 5, 15, 20]) [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
- machine_learning.linear_discriminant_analysis.num¶