MIT Department of Electrical Engineering & Computer Science

E E C S

Accuracy Amplification by Top-Down Decision Tree Learning Algorithms

Michael Kearns
AT&T Labs Research

Monday, March 10, 1997
3:00 PM (2:45 refreshments)
Room NE43-518
EECS Special Seminar

Abstract

The theory of machine learning initiated by L.G. Valiant is now approaching its fifteenth year. The field has enjoyed great success in elucidating the boundaries of efficient learning in several natural frameworks, and the information-theoretic aspects of learning theory (such as work on the VC dimension) have had considerable impact on the thinking of machine learning practitioners. But another goal of the field is to shed theoretical light on the empirical success of the heuristics used in practice, and less has been accomplished in this direction.

This talk will be about some recent work that makes progress on this more elusive goal. I will describe results that give performance guarantees for two of the most widely used and empirically successful heuristics in machine learning: the decision tree learning algorithms C4.5 (Quinlan 1993) and CART (Breiman et al. 1984). The main result is a proof that these algorithms are in fact *boosting* algorithms. By this we mean that if the functions used to label the internal nodes of the decision tree are weakly correlated with the unknown target function, then the top-down algorithms we study will amplify this weak correlation to build a tree achieving any desired level of accuracy. The bounds we obtain for this amplification show an interesting dependence on the *splitting criterion* used by the algorithm.

One nice aspect of these results is that they directly lead to some interesting improvements and experiments. In particular, the theoretical analysis suggests that a new splitting criterion may lead to smaller trees, and we confirm this experimentally using data sets from the UC Irvine repository. Furthermore, by giving performance bounds for the top-down algorithms in the well-studied boosting model, we permit direct comparison with algorithms developed especially for that model. We investigate this line of thought by experimentally comparing C4.5 and the AdaBoost algorithm (Freund and Schapire 1995), and find that their differences of behavior are nicely explained by the so-called "advantage sequence" of the formal analysis.

This talk will assume no prior knowledge of the theory or practice of machine learning.

HOSTS: Professor E. Grimson and Professor R. Rivest


URL of this page: http://www-eecs.mit.edu/AY96-97/events/20.html
Created: Mar 5, 1997  | Modified: Jun 24, 1997
This announcement is from the MIT EECS 1996-97 archive.  | Current events
To MIT EECS home page  | Your comments and inquiries are welcome.