machine_learning.decision_tree ============================== .. py:module:: machine_learning.decision_tree .. autoapi-nested-parse:: Implementation of a basic regression decision tree. Input data set: The input data set must be 1-dimensional with continuous labels. Output: The decision tree maps a real number input to a real number output. Classes ------- .. autoapisummary:: machine_learning.decision_tree.DecisionTree machine_learning.decision_tree.TestDecisionTree Functions --------- .. autoapisummary:: machine_learning.decision_tree.main Module Contents --------------- .. py:class:: DecisionTree(depth=5, min_leaf_size=5) .. py:method:: mean_squared_error(labels, prediction) mean_squared_error: @param labels: a one-dimensional numpy array @param prediction: a floating point value return value: mean_squared_error calculates the error if prediction is used to estimate the labels >>> tester = DecisionTree() >>> test_labels = np.array([1,2,3,4,5,6,7,8,9,10]) >>> test_prediction = float(6) >>> bool(tester.mean_squared_error(test_labels, test_prediction) == ( ... TestDecisionTree.helper_mean_squared_error_test(test_labels, ... test_prediction))) True >>> test_labels = np.array([1,2,3]) >>> test_prediction = float(2) >>> bool(tester.mean_squared_error(test_labels, test_prediction) == ( ... TestDecisionTree.helper_mean_squared_error_test(test_labels, ... test_prediction))) True .. py:method:: predict(x) predict: @param x: a floating point value to predict the label of the prediction function works by recursively calling the predict function of the appropriate subtrees based on the tree's decision boundary .. py:method:: train(x, y) train: @param x: a one-dimensional numpy array @param y: a one-dimensional numpy array. The contents of y are the labels for the corresponding X values train() does not have a return value Examples: 1. Try to train when x & y are of same length & 1 dimensions (No errors) >>> dt = DecisionTree() >>> dt.train(np.array([10,20,30,40,50]),np.array([0,0,0,1,1])) 2. Try to train when x is 2 dimensions >>> dt = DecisionTree() >>> dt.train(np.array([[1,2,3,4,5],[1,2,3,4,5]]),np.array([0,0,0,1,1])) Traceback (most recent call last): ... ValueError: Input data set must be one-dimensional 3. Try to train when x and y are not of the same length >>> dt = DecisionTree() >>> dt.train(np.array([1,2,3,4,5]),np.array([[0,0,0,1,1],[0,0,0,1,1]])) Traceback (most recent call last): ... ValueError: x and y have different lengths 4. Try to train when x & y are of the same length but different dimensions >>> dt = DecisionTree() >>> dt.train(np.array([1,2,3,4,5]),np.array([[1],[2],[3],[4],[5]])) Traceback (most recent call last): ... ValueError: Data set labels must be one-dimensional This section is to check that the inputs conform to our dimensionality constraints .. py:attribute:: decision_boundary :value: 0 .. py:attribute:: depth :value: 5 .. py:attribute:: left :value: None .. py:attribute:: min_leaf_size :value: 5 .. py:attribute:: prediction :value: None .. py:attribute:: right :value: None .. py:class:: TestDecisionTree Decision Tres test class .. py:method:: helper_mean_squared_error_test(labels, prediction) :staticmethod: helper_mean_squared_error_test: @param labels: a one dimensional numpy array @param prediction: a floating point value return value: helper_mean_squared_error_test calculates the mean squared error .. py:function:: main() In this demonstration we're generating a sample data set from the sin function in numpy. We then train a decision tree on the data set and use the decision tree to predict the label of 10 different test values. Then the mean squared error over this test is displayed.