machine_learning.decision_tree

Implementation of a basic regression decision tree. Input data set: The input data set must be 1-dimensional with continuous labels. Output: The decision tree maps a real number input to a real number output.

Classes

DecisionTree

TestDecisionTree

Decision Tres test class

Functions

main()

In this demonstration we're generating a sample data set from the sin function in

Module Contents

class machine_learning.decision_tree.DecisionTree(depth=5, min_leaf_size=5)
mean_squared_error(labels, prediction)

mean_squared_error: @param labels: a one-dimensional numpy array @param prediction: a floating point value return value: mean_squared_error calculates the error if prediction is used to

estimate the labels

>>> tester = DecisionTree()
>>> test_labels = np.array([1,2,3,4,5,6,7,8,9,10])
>>> test_prediction = float(6)
>>> bool(tester.mean_squared_error(test_labels, test_prediction) == (
...     TestDecisionTree.helper_mean_squared_error_test(test_labels,
...         test_prediction)))
True
>>> test_labels = np.array([1,2,3])
>>> test_prediction = float(2)
>>> bool(tester.mean_squared_error(test_labels, test_prediction) == (
...     TestDecisionTree.helper_mean_squared_error_test(test_labels,
...         test_prediction)))
True
predict(x)

predict: @param x: a floating point value to predict the label of the prediction function works by recursively calling the predict function of the appropriate subtrees based on the tree’s decision boundary

train(x, y)

train: @param x: a one-dimensional numpy array @param y: a one-dimensional numpy array. The contents of y are the labels for the corresponding X values

train() does not have a return value

Examples: 1. Try to train when x & y are of same length & 1 dimensions (No errors) >>> dt = DecisionTree() >>> dt.train(np.array([10,20,30,40,50]),np.array([0,0,0,1,1]))

2. Try to train when x is 2 dimensions >>> dt = DecisionTree() >>> dt.train(np.array([[1,2,3,4,5],[1,2,3,4,5]]),np.array([0,0,0,1,1])) Traceback (most recent call last):

ValueError: Input data set must be one-dimensional

3. Try to train when x and y are not of the same length >>> dt = DecisionTree() >>> dt.train(np.array([1,2,3,4,5]),np.array([[0,0,0,1,1],[0,0,0,1,1]])) Traceback (most recent call last):

ValueError: x and y have different lengths

4. Try to train when x & y are of the same length but different dimensions >>> dt = DecisionTree() >>> dt.train(np.array([1,2,3,4,5]),np.array([[1],[2],[3],[4],[5]])) Traceback (most recent call last):

ValueError: Data set labels must be one-dimensional

This section is to check that the inputs conform to our dimensionality constraints

decision_boundary = 0
depth
left = None
min_leaf_size
prediction = None
right = None
class machine_learning.decision_tree.TestDecisionTree

Decision Tres test class

static helper_mean_squared_error_test(labels, prediction)

helper_mean_squared_error_test: @param labels: a one dimensional numpy array @param prediction: a floating point value return value: helper_mean_squared_error_test calculates the mean squared error

machine_learning.decision_tree.main()

In this demonstration we’re generating a sample data set from the sin function in numpy. We then train a decision tree on the data set and use the decision tree to predict the label of 10 different test values. Then the mean squared error over this test is displayed.