Directed Extended Dependency Analysis for Data Mining
Thaddeus T. Shannon and Martin Zwick
Systems Science
Program, Portland State University
Abstract
Extended Dependency Analysis (EDA) is a heuristic search technique for
finding significant relationships between nominal variables in large datasets.
The directed version of EDA searches for maximally predictive sets of
independent variables with respect to a target dependent variable. The original
implementation of EDA was an extension of reconstructability analysis. Our new
implementation adds a variety of statistical significance tests at each
decision point that allow the user to tailor the algorithm to a particular
objective. It also utilizes data structures appropriate for the sparse datasets
customary in contemporary data mining problems. Two examples that illustrate different approaches to assessing
model quality tests are given.
Discrete Multivariate
Modeling Page
Entire Paper (pdf)