machine_learning.decision_tree¶
Implementation of a basic regression decision tree. Input data set: The input data set must be 1-dimensional with continuous labels. Output: The decision tree maps a real number input to a real number output.
Classes¶
Decision Tres test class |
Functions¶
|
In this demonstration we're generating a sample data set from the sin function in |
Module Contents¶
- class machine_learning.decision_tree.DecisionTree(depth=5, min_leaf_size=5)¶
- mean_squared_error(labels, prediction)¶
mean_squared_error: @param labels: a one-dimensional numpy array @param prediction: a floating point value return value: mean_squared_error calculates the error if prediction is used to
estimate the labels
>>> tester = DecisionTree() >>> test_labels = np.array([1,2,3,4,5,6,7,8,9,10]) >>> test_prediction = float(6) >>> bool(tester.mean_squared_error(test_labels, test_prediction) == ( ... TestDecisionTree.helper_mean_squared_error_test(test_labels, ... test_prediction))) True >>> test_labels = np.array([1,2,3]) >>> test_prediction = float(2) >>> bool(tester.mean_squared_error(test_labels, test_prediction) == ( ... TestDecisionTree.helper_mean_squared_error_test(test_labels, ... test_prediction))) True
- predict(x)¶
predict: @param x: a floating point value to predict the label of the prediction function works by recursively calling the predict function of the appropriate subtrees based on the tree’s decision boundary
- train(x, y)¶
train: @param x: a one-dimensional numpy array @param y: a one-dimensional numpy array. The contents of y are the labels for the corresponding X values
train() does not have a return value
Examples: 1. Try to train when x & y are of same length & 1 dimensions (No errors) >>> dt = DecisionTree() >>> dt.train(np.array([10,20,30,40,50]),np.array([0,0,0,1,1]))
2. Try to train when x is 2 dimensions >>> dt = DecisionTree() >>> dt.train(np.array([[1,2,3,4,5],[1,2,3,4,5]]),np.array([0,0,0,1,1])) Traceback (most recent call last):
…
ValueError: Input data set must be one-dimensional
3. Try to train when x and y are not of the same length >>> dt = DecisionTree() >>> dt.train(np.array([1,2,3,4,5]),np.array([[0,0,0,1,1],[0,0,0,1,1]])) Traceback (most recent call last):
…
ValueError: x and y have different lengths
4. Try to train when x & y are of the same length but different dimensions >>> dt = DecisionTree() >>> dt.train(np.array([1,2,3,4,5]),np.array([[1],[2],[3],[4],[5]])) Traceback (most recent call last):
…
ValueError: Data set labels must be one-dimensional
This section is to check that the inputs conform to our dimensionality constraints
- decision_boundary = 0¶
- depth¶
- left = None¶
- min_leaf_size¶
- prediction = None¶
- right = None¶
- class machine_learning.decision_tree.TestDecisionTree¶
Decision Tres test class
- static helper_mean_squared_error_test(labels, prediction)¶
helper_mean_squared_error_test: @param labels: a one dimensional numpy array @param prediction: a floating point value return value: helper_mean_squared_error_test calculates the mean squared error
- machine_learning.decision_tree.main()¶
In this demonstration we’re generating a sample data set from the sin function in numpy. We then train a decision tree on the data set and use the decision tree to predict the label of 10 different test values. Then the mean squared error over this test is displayed.