machine_learning.decision_tree
==============================

.. py:module:: machine_learning.decision_tree

.. autoapi-nested-parse::

   Implementation of a basic regression decision tree.
   Input data set: The input data set must be 1-dimensional with continuous labels.
   Output: The decision tree maps a real number input to a real number output.


Classes
-------

.. autoapisummary::

   machine_learning.decision_tree.DecisionTree
   machine_learning.decision_tree.TestDecisionTree


Functions
---------

.. autoapisummary::

   machine_learning.decision_tree.main


Module Contents
---------------

.. py:class:: DecisionTree(depth=5, min_leaf_size=5)

   .. py:method:: mean_squared_error(labels, prediction)

      mean_squared_error:
      @param labels: a one-dimensional numpy array
      @param prediction: a floating point value
      return value: mean_squared_error calculates the error if prediction is used to
          estimate the labels
      >>> tester = DecisionTree()
      >>> test_labels = np.array([1,2,3,4,5,6,7,8,9,10])
      >>> test_prediction = float(6)
      >>> bool(tester.mean_squared_error(test_labels, test_prediction) == (
      ...     TestDecisionTree.helper_mean_squared_error_test(test_labels,
      ...         test_prediction)))
      True
      >>> test_labels = np.array([1,2,3])
      >>> test_prediction = float(2)
      >>> bool(tester.mean_squared_error(test_labels, test_prediction) == (
      ...     TestDecisionTree.helper_mean_squared_error_test(test_labels,
      ...         test_prediction)))
      True


   .. py:method:: predict(x)

      predict:
      @param x: a floating point value to predict the label of
      the prediction function works by recursively calling the predict function
      of the appropriate subtrees based on the tree's decision boundary


   .. py:method:: train(x, y)

      train:
      @param x: a one-dimensional numpy array
      @param y: a one-dimensional numpy array.
      The contents of y are the labels for the corresponding X values

      train() does not have a return value

      Examples:
      1. Try to train when x & y are of same length & 1 dimensions (No errors)
      >>> dt = DecisionTree()
      >>> dt.train(np.array([10,20,30,40,50]),np.array([0,0,0,1,1]))

      2. Try to train when x is 2 dimensions
      >>> dt = DecisionTree()
      >>> dt.train(np.array([[1,2,3,4,5],[1,2,3,4,5]]),np.array([0,0,0,1,1]))
      Traceback (most recent call last):
          ...
      ValueError: Input data set must be one-dimensional

      3. Try to train when x and y are not of the same length
      >>> dt = DecisionTree()
      >>> dt.train(np.array([1,2,3,4,5]),np.array([[0,0,0,1,1],[0,0,0,1,1]]))
      Traceback (most recent call last):
          ...
      ValueError: x and y have different lengths

      4. Try to train when x & y are of the same length but different dimensions
      >>> dt = DecisionTree()
      >>> dt.train(np.array([1,2,3,4,5]),np.array([[1],[2],[3],[4],[5]]))
      Traceback (most recent call last):
          ...
      ValueError: Data set labels must be one-dimensional

      This section is to check that the inputs conform to our dimensionality
      constraints


   .. py:attribute:: decision_boundary
      :value: 0


   .. py:attribute:: depth
      :value: 5


   .. py:attribute:: left
      :value: None


   .. py:attribute:: min_leaf_size
      :value: 5


   .. py:attribute:: prediction
      :value: None


   .. py:attribute:: right
      :value: None


.. py:class:: TestDecisionTree

   Decision Tres test class


   .. py:method:: helper_mean_squared_error_test(labels, prediction)
      :staticmethod:


      helper_mean_squared_error_test:
      @param labels: a one dimensional numpy array
      @param prediction: a floating point value
      return value: helper_mean_squared_error_test calculates the mean squared error


.. py:function:: main()

   In this demonstration we're generating a sample data set from the sin function in
   numpy.  We then train a decision tree on the data set and use the decision tree to
   predict the label of 10 different test values. Then the mean squared error over
   this test is displayed.