Machine Learning Laboratory
How to Run ITI/DMTI
The ITI program has a command line interface, as is customary for Unix
environments.
ITI builds a decision tree in incremental or batch mode. In
incremental mode, ITI incorporates each instance into the tree, and
restructures the tree as needed so that it becomes the same tree that
one would have gotten with the batch algorithm. Incremental induction
is usually much less expensive than rebuilding the tree from scratch.
One would want to use incremental induction in a serial learning task,
such as ongoing knowledge maintenance.
In addition to incremental induction, the ability to restructure a
tree makes it possible in many cases to travel through tree-space
inexpensively. The program includes direct metric tree induction
(non-incremental) in which it tries various tests at a node, and
evaluates the quality of each resulting tree.
You will need to prepare at least two files for any task you
wish to run. These files follow Quinlan's C4.5 format (see chapter 9
of Quinlan's book for more detail). First
create a subdirectory (of your data directory) in which you will
put all files related to the task. Then, in the subdirectory
create a `names' file. The first line of this file is a comma-separated
list of the allowable class names that will appear in the data file. This
first line is terminated by a period. Each successive line of the `names'
file gives the variable name followed by a colon, followed by a period.
This is enough for ITI, but if you want to be able to use c4.5 on your
data, you will need to provide more information between the colon and
the period. For a numeric variable (all its values are numeric),
put the word `continuous'. For a discrete variable, list the possible
values, separated by commas.
Second, create a data file of any name you choose, but with
extension `.data'. This file contains as one data item (record)
per line. Each line is a comma-separated list of values, one
per variable, followed by a comma, followed by the class label,
followed by a terminating period. The special symbol `?' indicates
that the value of the corresponding variable is unknown. The
code distribution includes a `weather' task, with a `names' file
and a `all.data' file.
To run ITI, type:
iti stem [[-d..][-e][-E][-f][-h..][-i][-j][-l..][-m][-M][-L][-p][-q..][-r..][-s..][-t][-u][-v][-w]]*
where `stem' is the name of the task (subdirectory of your data
directory), and the various options are interpreted as described
below. The actions specified by the options are taken in order, so
one must specify them in the order intended. Repetition of options is
meaningful. Some sequences of options are nonsensical but the idea
has been to provide maximum flexibility.
Options:
- -d..
- Draw the current tree into the specified pst file (`.pst'
will be appended to the specified name). The pst file can be
converted to a postscript file by running the `pst' program, which is
part of this distribution and is described below.
- -E
- Invoke the dmti algorithm to build a tree using the expected
number of tests as the direct metric. The tree is built by
restructuring the current tree.
- -e
- Update the current tree using error correction mode on the
current set of training instances. ITI will repeatedly incorporate a
training instance until no unincorporated training instance would be
misclassified by the current tree.
- -f
- In batch (non-incremental) mode, build a tree from the
current set of training instances. For a given set of instances, it
is faster to build a tree in batch mode than it is to build it
incrementally. (Note that it is usually faster to incorporate one new
instance than it is to rebuild the tree from scratch in batch mode.)
This is the fastest way to build a tree from a set of training
instances.
- -h..
- Include the specified heading on performance graph. This
option only makes sense when -m has also been specified.
- -i
- Update the current tree incrementally from the current
unincorporated training instances.
- -j
- Shuffle the training set before starting. This makes sense
only prior to training in error correction mode (see -e option).
Otherwise it is largely useless because ITI always builds the same
tree for the same set of incorporated instances.
- -l..
- Load the training instances from the specified file.
These instances define the current training set of unincorporated
instances.
- -L
- Invoke the dmti algorithm to build a tree using the number
of leaves as the direct metric. Because the tree is binary, the total
number of nodes is 2*leaves-1. The tree is built by restructuring the
current tree.
- -M
- Invoke the dmti algorithm to build a tree using the minimum
description length as the direct metric. The tree is built by
restructuring the current tree.
- -m
- Do performance measuring. This will cause a file (with
`.trace' appended to name) to be created during training that can be
used for producing postscript performance graphs. You can use the
PLOT program, as in
plot < stem.trace > stem.ps
,
to produce performance figures for papers or slides. The PLOT
program is not supported, but is included in case it is useful.
You may need to hack PLOT to get what you want.
- -P
- Set the minimum number of instances in the second most
frequently occuring class at a node that define the node to be
considered impure. The default value is 1.
- -p
- Virtually prune the current tree now, and virtually prune
it whenever the tree changes hereafter. See -u. Virtual pruning is
done according to the minimum description length principle. This is
generally useful when the instances are known to be noisy or
overfitting is likely.
- -q..
- Load the testing instances from the specified file.
These instances define the current testing set.
- -R
- Print the root node of the tree in a verbose form. This is
not generally useful. It is for debugging, and is mentioned here only
for the sake of completeness.
- -r..
- Restore the tree from the specified file (`.iti' is
appended to the specified name) as the current tree.
- -s..
- Save the current tree to the specified file (`.iti' is
appended to the specified name).
- -t
- Test the testing set on the current tree. Various
measures, including classification accuracy are reported.
- -u
- Unprune (virtually) the current tree now, and do not
virtually prune it whenever the tree changes. See -p.
- -v
- Toggle (initially off) printing of the instances as they
are loaded. This is often useful for identifying a syntax error in
the names or data files.
- -R
- Print the entire tree in a verbose form. This is not
generally useful. It is for debugging, and is mentioned here only for
the sake of completeness.
- -w
- Print ascii version of the current tree.
Here are two examples:
- To load the led7 training instances, then build a tree quickly,
then prune it, then print it, then draw it, one could type:
iti led7 -lcart -f -p -w -dcart
- To load the 6-bit multiplexor training instances, then build a tree
incrementally, then draw it, then restructure the tree via dmti, and
then draw it, one could type:
iti mplex-6 -lall -i -dm6iti -E -dm6E
For running cross-validation tests, xval-prep and a modified xval.sh
are included from Quinlan's C4.5 distribution (with permission of and
thanks to J.R. Quinlan). The xval-prep.c requires no modification,
but you might want to edit xval.sh to suit your own purposes.
Last Updated: March 30, 1999
© Copyright 1997, All Rights Reserved, Paul Utgoff, University of Massachusetts