Machine and statistical learning

 

I have also done some methodological research in unsupervised and supervised sta-

tistical learning. The original context of my work in model selection was choosing

the number of components for finite mixtures, which can be considered an unsuper-

vised learning problem. I have also worked with three graduate students from my

computational statistics course (Matt Taddy, Ioana Cosma, and Vittorio Addona) to

develop reasonable automatic priors for finite mixture models (a special case of un-

supervised learning) that help to clarify problems due to non-identifiability resulting

from uncertainty in the number of mixture components. I also wrote a paper with

Matt Taddy, whose Master’s pro ject examined the problem of fully Bayesian model

selection for neural networks (in the area of supervised learning). We developed an

approach for selecting input features to Bayesian neural network models via Bayes

factors estimated using a chained bridge sampling approach.


I was co-organizer and lead instructor for the 5-day Spring School on Statistical

and Machine Learning held at the Centre des Recherche des Mathematiques (CRM)

in Spring 2006. I was responsible for designing the syllabus and organizing the

instructors, as well as giving lectures for 2 of the 5 days of the workshop. The work-

shop was funded by grants from the National Program on Complex Data Structures

(NPCDS) and MITACS, the center for Mathematics of Information Technology and

Complex Systems.


I am also co-investigator on a CIHR grant that will allow us to develop machine

learning approaches for the psychometric models. There currently exist a suite of

linear methods (factor analysis, principal component models, partial least squares,

and structural equation models) that have been used to develop psychometric indices

for latent traits in the population (such as intelligence). A Master’s student in

statistics (Marilyse Julien) will develop non-linear versions of these methods (using

ideas from functional data analysis and non-parametric statistics) that are better

suited to the development of indices for highly interactive systems, in particular for

the development of new measures of disease activity and severity in Scleroderma.