machine_learning.linear_discriminant_analysis ============================================= .. py:module:: machine_learning.linear_discriminant_analysis .. autoapi-nested-parse:: Linear Discriminant Analysis Assumptions About Data : 1. The input variables has a gaussian distribution. 2. The variance calculated for each input variables by class grouping is the same. 3. The mix of classes in your training set is representative of the problem. Learning The Model : The LDA model requires the estimation of statistics from the training data : 1. Mean of each input value for each class. 2. Probability of an instance belong to each class. 3. Covariance for the input data for each class Calculate the class means : mean(x) = 1/n ( for i = 1 to i = n --> sum(xi)) Calculate the class probabilities : P(y = 0) = count(y = 0) / (count(y = 0) + count(y = 1)) P(y = 1) = count(y = 1) / (count(y = 0) + count(y = 1)) Calculate the variance : We can calculate the variance for dataset in two steps : 1. Calculate the squared difference for each input variable from the group mean. 2. Calculate the mean of the squared difference. ------------------------------------------------ Squared_Difference = (x - mean(k)) ** 2 Variance = (1 / (count(x) - count(classes))) * (for i = 1 to i = n --> sum(Squared_Difference(xi))) Making Predictions : discriminant(x) = x * (mean / variance) - ((mean ** 2) / (2 * variance)) + Ln(probability) --------------------------------------------------------------------------- After calculating the discriminant value for each class, the class with the largest discriminant value is taken as the prediction. Author: @EverLookNeverSee Attributes ---------- .. autoapisummary:: machine_learning.linear_discriminant_analysis.num Functions --------- .. autoapisummary:: machine_learning.linear_discriminant_analysis.accuracy machine_learning.linear_discriminant_analysis.calculate_mean machine_learning.linear_discriminant_analysis.calculate_probabilities machine_learning.linear_discriminant_analysis.calculate_variance machine_learning.linear_discriminant_analysis.gaussian_distribution machine_learning.linear_discriminant_analysis.main machine_learning.linear_discriminant_analysis.predict_y_values machine_learning.linear_discriminant_analysis.valid_input machine_learning.linear_discriminant_analysis.y_generator Module Contents --------------- .. py:function:: accuracy(actual_y: list, predicted_y: list) -> float Calculate the value of accuracy based-on predictions :param actual_y:a list containing initial Y values generated by 'y_generator' function :param predicted_y: a list containing predicted Y values generated by 'predict_y_values' function :return: percentage of accuracy >>> actual_y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, ... 1, 1 ,1 ,1 ,1 ,1 ,1] >>> predicted_y = [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, ... 0, 0, 1, 1, 1, 0, 1, 1, 1] >>> accuracy(actual_y, predicted_y) 50.0 >>> actual_y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, ... 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2] >>> predicted_y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, ... 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2] >>> accuracy(actual_y, predicted_y) 100.0 .. py:function:: calculate_mean(instance_count: int, items: list) -> float Calculate given class mean :param instance_count: Number of instances in class :param items: items that related to specific class(data grouping) :return: calculated actual mean of considered class >>> items = gaussian_distribution(5.0, 1.0, 20) >>> calculate_mean(len(items), items) 5.011267842911003 .. py:function:: calculate_probabilities(instance_count: int, total_count: int) -> float Calculate the probability that a given instance will belong to which class :param instance_count: number of instances in class :param total_count: the number of all instances :return: value of probability for considered class >>> calculate_probabilities(20, 60) 0.3333333333333333 >>> calculate_probabilities(30, 100) 0.3 .. py:function:: calculate_variance(items: list, means: list, total_count: int) -> float Calculate the variance :param items: a list containing all items(gaussian distribution of all classes) :param means: a list containing real mean values of each class :param total_count: the number of all instances :return: calculated variance for considered dataset >>> items = gaussian_distribution(5.0, 1.0, 20) >>> means = [5.011267842911003] >>> total_count = 20 >>> calculate_variance([items], means, total_count) 0.9618530973487491 .. py:function:: gaussian_distribution(mean: float, std_dev: float, instance_count: int) -> list Generate gaussian distribution instances based-on given mean and standard deviation :param mean: mean value of class :param std_dev: value of standard deviation entered by usr or default value of it :param instance_count: instance number of class :return: a list containing generated values based-on given mean, std_dev and instance_count >>> gaussian_distribution(5.0, 1.0, 20) # doctest: +NORMALIZE_WHITESPACE [6.288184753155463, 6.4494456086997705, 5.066335808938262, 4.235456349028368, 3.9078267848958586, 5.031334516831717, 3.977896829989127, 3.56317055489747, 5.199311976483754, 5.133374604658605, 5.546468300338232, 4.086029056264687, 5.005005283626573, 4.935258239627312, 3.494170998739258, 5.537997178661033, 5.320711100998849, 7.3891120432406865, 5.202969177309964, 4.855297691835079] .. py:function:: main() This function starts execution phase .. py:function:: predict_y_values(x_items: list, means: list, variance: float, probabilities: list) -> list This function predicts new indexes(groups for our data) :param x_items: a list containing all items(gaussian distribution of all classes) :param means: a list containing real mean values of each class :param variance: calculated value of variance by calculate_variance function :param probabilities: a list containing all probabilities of classes :return: a list containing predicted Y values >>> x_items = [[6.288184753155463, 6.4494456086997705, 5.066335808938262, ... 4.235456349028368, 3.9078267848958586, 5.031334516831717, ... 3.977896829989127, 3.56317055489747, 5.199311976483754, ... 5.133374604658605, 5.546468300338232, 4.086029056264687, ... 5.005005283626573, 4.935258239627312, 3.494170998739258, ... 5.537997178661033, 5.320711100998849, 7.3891120432406865, ... 5.202969177309964, 4.855297691835079], [11.288184753155463, ... 11.44944560869977, 10.066335808938263, 9.235456349028368, ... 8.907826784895859, 10.031334516831716, 8.977896829989128, ... 8.56317055489747, 10.199311976483754, 10.133374604658606, ... 10.546468300338232, 9.086029056264687, 10.005005283626572, ... 9.935258239627313, 8.494170998739259, 10.537997178661033, ... 10.320711100998848, 12.389112043240686, 10.202969177309964, ... 9.85529769183508], [16.288184753155463, 16.449445608699772, ... 15.066335808938263, 14.235456349028368, 13.907826784895859, ... 15.031334516831716, 13.977896829989128, 13.56317055489747, ... 15.199311976483754, 15.133374604658606, 15.546468300338232, ... 14.086029056264687, 15.005005283626572, 14.935258239627313, ... 13.494170998739259, 15.537997178661033, 15.320711100998848, ... 17.389112043240686, 15.202969177309964, 14.85529769183508]] >>> means = [5.011267842911003, 10.011267842911003, 15.011267842911002] >>> variance = 0.9618530973487494 >>> probabilities = [0.3333333333333333, 0.3333333333333333, 0.3333333333333333] >>> predict_y_values(x_items, means, variance, ... probabilities) # doctest: +NORMALIZE_WHITESPACE [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2] .. py:function:: valid_input(input_type: collections.abc.Callable[[object], num], input_msg: str, err_msg: str, condition: collections.abc.Callable[[num], bool] = lambda _: True, default: str | None = None) -> num Ask for user value and validate that it fulfill a condition. :input_type: user input expected type of value :input_msg: message to show user in the screen :err_msg: message to show in the screen in case of error :condition: function that represents the condition that user input is valid. :default: Default value in case the user does not type anything :return: user's input .. py:function:: y_generator(class_count: int, instance_count: list) -> list Generate y values for corresponding classes :param class_count: Number of classes(data groupings) in dataset :param instance_count: number of instances in class :return: corresponding values for data groupings in dataset >>> y_generator(1, [10]) [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] >>> y_generator(2, [5, 10]) [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] >>> y_generator(4, [10, 5, 15, 20]) # doctest: +NORMALIZE_WHITESPACE [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3] .. py:data:: num