Machine and statistical learning
Machine and statistical learning
I have also done some methodological research in unsupervised and supervised sta-
tistical learning. The original context of my work in model selection was choosing
the number of components for finite mixtures, which can be considered an unsuper-
vised learning problem. I have also worked with three graduate students from my
computational statistics course (Matt Taddy, Ioana Cosma, and Vittorio Addona) to
develop reasonable automatic priors for finite mixture models (a special case of un-
supervised learning) that help to clarify problems due to non-identifiability resulting
from uncertainty in the number of mixture components. I also wrote a paper with
Matt Taddy, whose Master’s pro ject examined the problem of fully Bayesian model
selection for neural networks (in the area of supervised learning). We developed an
approach for selecting input features to Bayesian neural network models via Bayes
factors estimated using a chained bridge sampling approach.
I was co-organizer and lead instructor for the 5-day Spring School on Statistical
and Machine Learning held at the Centre des Recherche des Mathematiques (CRM)
in Spring 2006. I was responsible for designing the syllabus and organizing the
instructors, as well as giving lectures for 2 of the 5 days of the workshop. The work-
shop was funded by grants from the National Program on Complex Data Structures
(NPCDS) and MITACS, the center for Mathematics of Information Technology and
Complex Systems.
I am also co-investigator on a CIHR grant that will allow us to develop machine
learning approaches for the psychometric models. There currently exist a suite of
linear methods (factor analysis, principal component models, partial least squares,
and structural equation models) that have been used to develop psychometric indices
for latent traits in the population (such as intelligence). A Master’s student in
statistics (Marilyse Julien) will develop non-linear versions of these methods (using
ideas from functional data analysis and non-parametric statistics) that are better
suited to the development of indices for highly interactive systems, in particular for
the development of new measures of disease activity and severity in Scleroderma.