Machine Learning Laboratory

Decision Tree Induction

A decision tree is a representation of a decision procedure for determining a class label to associate with a given example. At each internal node of the tree, there is a test (question), and a branch corresponding to each of the possible outcomes of the test. At each leaf node, there is a class label (answer). Traversing a path from the root to a leaf is much like playing a game of twenty questions.

Decision trees have a great many uses, particularly for solving problems that can be cast in terms of producing a single answer in the form of a class name. For example, one can build a decision tree that could be used to answer a question such as `Does this patient have hepatitis?' The answer may be as simple as `yes' or `no'. Based on answers to the questions at the decision nodes, one can find the appropriate leaf and the answer it contains.

Decision trees are constructed from examples that are already labeled. For example, if one has established for a variety of patients with varying attributes which of them do and do not have hepatitis, then these examples can guide the tree construction process. There is much ongoing research on decision tree induction. The subject is studied within several disciplines. A good place to start is Machine Learning journal or Machine Learning conference proceedings.