Statistics seminars 2011-2012

McGill Statistics Seminar Series 2011-2012

Fall Term 2011

Date Event Speaker(s) Title Time Location

August 31, 2011

Wednesday!

McGill Statistics Seminar
Johanna Ziegel
Precision estimation for stereological volumes

15:30-16:30

BURN 1205
Abstract:

Volume estimators based on Cavalieri’s principle are widely used in the bio-
sciences. For example in neuroscience, where volumetric measurements of brain
structures are of interest, systematic samples of serial sections are obtained by
magnetic resonance imaging or by a physical cutting procedure. The volume v is
then estimated by ˆv, which is the sum over the areas of the structure of interest in
the section planes multiplied by the width of the sections, t > 0.
Assessing the precision of such volume estimates is a question of great practical
importance, but statistically a challenging task due to the strong spatial dependence
of the data and typically small sample sizes. In this talk, an overview of classical
and new approaches to this problem will be presented. A special focus will be given
to some recent advances on distribution estimators and confidence intervals for ˆv;
see Hall and Ziegel (2011).


References
Hall, P. and Ziegel, J. (2011). Distribution estimators and confidence intervals for
stereological volumes. Biometrika 98, 417–431.

Speaker: Johanna Ziegel is a Postdoctoral Fellow in the Institute of Applied Mathematics at Heidelberg University. She holds a Ph.D. in Statistics from ETH Zürich and spent a year as a Postdoctoral Fellow with Peter Hall at the University of Melbourne.
September 9, 2011
CRM-ISM-GERAD Colloque de statistique
Ed Susko and Aurélie Labbe

Susko: Properties of Bayesian posteriors and bootstrap support in phylogenetic inference

Labbe: An integrated hierarchical Bayesian model for multivariate eQTL genetic mapping

14:00-16:30

UdeM, Pav. André-Aisenstadt,
SALLE 1360
Abstract:

Susko: The data generated by large scale sequencing projects is complex, high-dimensional, multivariate discrete data. In studies of evolutionary biology, the parameter space of evolutionary trees is an unusual additional complication from a statistical perspective. In this talk I will briefly introduce the general approaches to utilizing sequence data in phylogenetic inference. A particular issue of interest in phylogenetic inference is assessments of uncertainty about the true tree or structures that might be present in it. The primary way in which uncertainty is assessed in practice is through bootstrap support (BP) for splits, large values indicating strong support for the split. A difficulty with this measure, however, has been deciding how large is large enough. We discuss the interpretation of BP and ways of adjusting it so that it has an interpretation similar to a p-value. A related issue, having to do with the behaviour of methods when data are generated from a star tree, gives rise to an interesting example in which, due to the unusual statistical nature,Bayesian and maximum likelihood methods give strikingly different results, even asymptotically.

Labbe: Recently, expression quantitative loci (eQTL) mapping studies, where expression levels of thousands of genes are viewed as quantitative traits, have been used to provide greater insight into the biology of gene regulation. Current data analysis and interpretation of eQTL studies involve the use of multiple methods and applications, the output of which is often fragmented. In this talk, we present an integrated hierarchical Bayesian model that jointly models all genes and SNPs to detect eQTLs.
We propose a model (named iBMQ) that is speci cally designed to handle a large number G of gene expressions, a large number S of regressors (genetic markers) and a small number n of individuals in what we call a "large G, large S, small n" paradigm. This method incorporates genotypic and gene expression data into a single model while 1) specifically coping with the high dimensionality of eQTL data (large number of genes), 2) borrowing strength from all gene expression data for the mapping procedures, and 3) controlling the number of false positives to a desirable level.

Speakers:
Ed Susko, Dalhousie University

Aurélie Labbe, McGill University

Schedule:

Talk 1: Aurélie Labbe 14:00-15:00

Coffee Break 15:00-15:30

Talk 2: Ed Susko 15:30-16:30

September 16, 2011
McGill Statistics Seminar
Elif F. Acar
Inference and model selection for pair-copula constructions

15:30-16:30

BURN 1205
Abstract: Pair-copula constructions (PCCs) provide an elegant way to construct highly flexible multivariate distributions. However, for convenience of inference, pair-copulas are often assumed to depend on the conditioning variables only indirectly. In this talk, I will show how nonparametric smoothing techniques can be used to avoid this assumption. Model selection for PCCs will also be addressed within the proposed method.
Speaker: Elif F. Acar is a Postdoctoral Fellow in the Department of Mathematics and Statistics at McGill University. She holds a Ph.D. in Statistics from the University of Toronto.
September 23, 2011
McGill Statistics Seminar
Shaowei Lin
What is singular learning theory?

15:30-16:30

BURN 1205
Abstract: In this talk, we give a basic introduction to Sumio Watanabe's
Singular Learning Theory, as outlined in his book "Algebraic Geometry
and Statistical Learning Theory". Watanabe's key insight to studying
singular models was to use a deep result in algebraic geometry known
as Hironaka's Resolution of Singularities. This result allows him to
reparametrize the model in a normal form so that central limit
theorems can be applied. In the second half of the talk, we discuss
new algebraic methods where we define fiber ideals for discrete/Gaussian models. We show that the key to understanding the singular model lies in monomializing its fiber ideal.
Speaker: Shaowei Lin is a Postdoctoral Fellow at UC Berkeley. He received a B.Sc. from Stanford and a Ph.D. from UC Berkeley under the supervision of Bernd Sturmfels.
September 30, 2011
McGill Statistics Seminar
Ioana A. Cosma
Data sketching for cardinality and entropy estimation

15:30-16:30

BURN 1205
Abstract:

Streaming data is ubiquitous in a wide range of areas from engineering and information technology, finance, and commerce, to atmospheric physics, and earth sciences. The online approximation of properties of data streams is of great interest, but this approximation process is hindered by the sheer size of the data and the speed at which it is generated. Data stream algorithms typically allow only one pass over the data, and maintain sub-linear representations of the data from which target properties can be inferred with high efficiency.

In this talk we consider the online approximation of two important characterizations of data streams: cardinality and empirical Shannon entropy. We assume that the number of distinct elements observed in the stream is prohibitively large, so that the vector of cumulative
quantities cannot be stored on main computer memory for fast and efficient access. We focus on two techniques that use pseudo-random variates to form low-dimensional data sketches (using hashing and random projections), and derive estimators of the cardinality and empirical entropy. We discuss various properties of our estimators such as relative asymptotic efficiency, recursive computability, and error and complexity bounds. Finally, we present results on simulated data and seismic measurements from a volcano.


References:
Peter Clifford and Ioana A. Cosma (2011) “A statistical analysis of probabilistic counting algorithms” (to appear in the Scandinavian Journal of Statistics, preprint on arXiv:0801.3552).

Peter Clifford and Ioana A. Cosma (2009)“A simple sketching algorithm for entropy estimation” (in preparation, preprint on arXiv:0908.3961).

Speaker: Ioana A. Cosma is a Postdoctoral Fellow in the Statistical Laboratoty at Cambridge University, England. She holds a Ph.D. in Statistics from the University of Oxford.
October 7, 2011
McGill Statistics Seminar
Nikolai Kolev
 
Nonexchangeability and radial asymmetry identification via bivariate quantiles, with financial applications

15:30-16:30

BURN 1205
Abstract: In this talk, the following topics will be discussed: A class of bivariate probability integral transforms and Kendall distribution; bivariate quantile curves, central and lateral regions; non-exchangeability and radial asymmetry identification; new measures of nonexchangeability and radial asymmetry; financial applications and a few open problems (joint work with Flavio Ferreira).
Speaker:
Nikolai Kolev is a Professor of Statistics at the University of Sao Paulo, Brazil.
October 14, 2011
CRM-ISM-GERAD Colloque de statistique
Debbie Dupuis and Richard A. Davis

Dupuis: Modeling non-stationary extremes: The case of heat waves

Davis: Estimating extremal dependence in time series via the extremogram

14:00-16:30

McGill

TROTTIER 1080

Abstract:

Dupuis: Environmental processes are often non-stationary since climate patterns cause systematic seasonal effects and long-term climate changes cause trends.  The usual limit models are not applicable for non-stationary processes, but models from standard extreme value theory can be used along with statistical modeling to provide useful inference. Traditional approaches include letting model parameters be a function of covariates or using time-varying thresholds. These approaches are inadequate for the study of heat waves however and we show how a recent pre-processing approach by Eastoe and Tawn (2009) can be used in conjunction with an innovative change-point analysis to model daily maximum temperature.  The model is then fitted to data from four U.S. cities and used to estimate the recurrence probabilities of runs over seasonally high temperatures.  We show that the probability of long and intense heat waves has increased considerably over 50 years.

Davis: The extremogram is a flexible quantitative tool that measures various types of extremal dependence in a stationary time series.  In many respects, the extremogram can be viewed as an extreme-value analogue of the autocorrelation function (ACF) for a time series.  Under mixing conditions, the asymptotic normality of the empirical extremogram was derived in Davis and Mikosch (2009).  Unfortunately, the limiting variance is a difficult quantity to estimate.  Instead we employ the stationary bootstrap to the empirical extremogram and establish that this resampling  procedure provides an asymptotically correct approximation to the central limit theorem.  This in turn can be used for constructing credible confidence bounds for the sample extremogram. The use of the stationary bootstrap for the extremogram is illustrated in a variety of real and simulated data sets. The cross-extremogram measures cross-sectional extremal dependence in multivariate time series. A measure of this dependence, especially left tail dependence, is of great importance in the calculation of portfolio risk.  We find that after devolatilizing  the marginal series, extremal dependence still remains, which suggests that the extremal dependence is not due solely to the heteroskedasticity in the stock returns process. However, for the univariate series, the filtering removes all extremal dependence.  Following Geman and Chang (2010), a return time extremogram which measures the waiting time between rare or extreme events in univariate and bivariate stationary time series is calculated. The return time extremogram suggests the existence of extremal clustering in the return times of extreme events for financial assets. The stationary bootstrap can again provide an asymptotically correct approximation to the central limit theorem and can be used for constructing credible confidence bounds for this return time extremogram.  (This is joint work with Thomas Mikosch and Ivor Cribben.)

Speaker:

Debbie Dupuis is a Professor of Statistics at HEC Montréal. She works in extreme-value theory, robust estimation and computational statistics.

Richard A. Davis is a Professor of Statistics at Columbia University. He works in applied probability, time series, stochastic processes and extreme-value theory.  Together with P. J. Brockwell, he is the author of the well-known textbook Introduction to Time Series and Forecasting.

Schedule:

Talk 1: Debbie Dupuis 14:00 -- 15:00

Coffee Break 15:00 -- 15:30

Talk 2: Richard A. Davis 15:30 -- 16:30

October 21, 2011
McGill Statistics Seminar
William Astle
Bayesian modelling of GWAS data using linear mixed models

15:30-16:30

BURN 1205
Abstract: Genome-wide association studies (GWAS) are used to identify physical positions (loci) on the genome where genetic variation is causally associated with a phenotype of interest at the population level. Typical studies are based on the measurement of several hundred thousand single nucleotide polymorphism (SNP) variants spread across the genome, in a few thousand individuals. The resulting datasets are large and require computationally efficient methods of statistical analysis.

Two variance components linear mixed models have recently been proposed as a method of analysis for GWAS data that can control for the confounding effects of population stratification, by modelling the correlation between study subjects induced by relatedness. Unfortunately, standard methods for fitting linear mixed models are computationally intensive because computation of the likelihood depends on the inversion of a large matrix which is a function of the model parameters.  I will describe a fast method for calculating the likelihood of a two variance components linear model which allows analysis of a large GWAS dataset using mixed models by Bayesian inference. A Bayesian analysis of GWAS provides a natural way of overcoming the so-called "multiple-testing" problem which arises from the large dimension of the predictor variable space. In the Bayesian framework we should have low prior belief that any particular genetic variant explains a large proportion of the phenotypic variation. The normal-exponential-gamma prior as been proposed as a good representation of such belief and I will describe an efficient MCMC algorithm which allows to incorporate this prior into the modelling.
Speaker:

William Astle is a Postdoctoral Fellow at McGill University, working with Aurélie Labbe and David A. Stephens. He holds a Ph.D. from Imperial Colledge, London.

October 28, 2011
McGill Statistics Seminar
Andrew Patton
Simulated method of moments estimation for copula-based multivariate models

15:00-16:00

BURN 1205
Abstract: This paper considers the estimation of the parameters of a copula via a simulated method of moments type approach. This approach is attractive when the likelihood of the copula model is not known in closed form, or when the researcher has a set of dependence measures or other functionals of the copula, such as pricing errors, that are of particular interest. The proposed approach naturally also nests method of moments and generalized method of moments estimators. Combining existing results on simulation based estimation with recent results from empirical copula process theory, we show the consistency and asymptotic normality of the proposed estimator, and obtain a simple test of over-identifying restrictions as a goodness-of-fit test. The results apply to both iid and time series data. We analyze the finite-sample behavior of these estimators in an extensive simulation study. We apply the model to a group of seven financial stock returns and find evidence of statistically significant tail dependence, and that the dependence between these assets is stronger in crashes than booms.
Speaker: Andrew Patton is an Associate Professor of Economics at Duke University, Durham, North Carolina.
November 3, 2011
McGill Statistics Seminar
Alessandro Rinaldo
Maximum likelihood estimation in network models

16:00-17:00

BURN 1205
Abstract: This talk is concerned with maximum likelihood estimation (MLE) in exponential statistical models for networks (random graphs) and, in particular, with the beta model, a simple model for undirected graphs in which the degree sequence is the minimal sufficient statistic. The speaker will present necessary and sufficient conditions for the existence of the MLE of the beta model parameters that are based on a geometric object known as the polytope of degree sequences. Using this result, it is possible to characterize in a combinatorial fashion sample points leading to a non-existent MLE and non-estimability of the probability parameters under a non-existent MLE. The speaker will further indicate some conditions guaranteeing that the MLE exists with probability tending to 1 as the number of nodes increases. Much of this analysis applies also to other well-known models for networks, such as the Rasch model, the Bradley-Terry model and the more general p1 model of Holland and Leinhardt. These results are in fact instantiations of rather general geometric properties of exponential families with polyhedral support that will be illustrated with a simple exponential random graph model.
Speaker: Alessandro Rinaldo is an Assistant Professor of Statistics at Carnegie Mellon University, Pittsburgh, Pennsylvania.
November 4, 2011
McGill Statistics Seminar
Martin Lysy
A Bayesian method of parametric inference for diffusion processes

15:30-16:30

BURN 1205
Abstract: Diffusion processes have been used to model a multitude of continuous-time phenomena in Engineering and the Natural Sciences, and as in this case, the volatility of financial assets.  However, parametric inference has long been complicated by an intractable likelihood function.  For many models the most effective solution involves a large amount of missing data for which the typical Gibbs sampler can be arbitrarily slow.  On the other hand, joint parameter and missing data proposals can lead to a radical improvement, but their acceptance rate tends to scale exponentially with the number of observations.

We consider here a novel method of dividing the inference process into separate data batches, each small enough to benefit from joint proposals, to be processed consecutively.  A filter combines batch contributions to produce likelihood inference based on the whole dataset.  Although the result is not always unbiased, it has very low variability, often achieving considerable accuracy in a short amount of time.  We present an example using Heston's popular model for option pricing, but much of the methodology can be extended beyond diffusions to Hidden Markov and other State-Space models.
Speaker:

Martin Lysy is finishing his Ph.D. in the Deparment of Statistics, Harvard University.

November 11, 2011
CRM-ISM-GERAD Colloque de statistique
Hélène Guérin and Ana-Maria Staicu

Guérin: An ergodic variant of the telegraph process for a toy model of bacterial chemotaxis

Staicu: Skewed functional processes and their applications

14:00-16:30

UdeM
Abstract:

Guérin: I will study the long time behavior of a variant of the classic telegraph process, with non-constant jump rates that induce a drift towards the origin. This process can be seen as a toy model for velocity-jump processes recently proposed as mathematical models of bacterial chemotaxis. I will give its invariant law and construct an explicit coupling for velocity and position, providing exponential ergodicity with moreover a quantitative control of the total variation distance to equilibrium at each time instant. It is a joint work with Joaquin Fontbona (Universidad de Santiago, Chile) and Florent Malrieu (Université Rennes 1, France).

Staicu: We introduce a novel class of models for functional data exhibiting skewness or other shape characteristics that vary with spatial location. Such data are not envisaged by the current approaches to model functional data, due to the lack of Gaussian – like features. Our methodology allows modeling the pointwise quantiles, has interpretability advantages and is computationally feasible. The methods were motivated by and are illustrated with a state-of-the-art study of neuronal tracts in multiple sclerosis patients and healthy controls.

Speaker:

Ana-Maria Staicu obtained her PhD from the University in Toronto and is currently Assistant Professor at the North Carolina State University. She works in functional data analysis and likelihood methods.

Hélène Guérin is an Associate Professor at Université Rennes 1. She obtained her PhD at the Université Paris X Nanterre. Her main research interests are in the probabilistic interpretation of nonlinear partial differential equations.

Schedule:

Hélène Guérin: 14:00 -- 15:00

Coffee Break 15:00 -- 15:30

Ana-Maria Staicu: 15:30 -- 16:30

November 18, 2011
McGill Statistics Seminar
 Amparo Casanova
Construction of bivariate distributions via principal components

15:30-16:30

BURN 1205
Abstract: The diagonal expansion of a bivariate distribution (Lancaster, 1958) has been used as a tool to construct bivariate distributions; this method has been generalized using principal dimensions of random variables (Cuadras 2002). Sufficient and necessary conditions are given for uniform, exponential, logistic and Pareto marginals in the one and two-dimensional case. The corresponding copulas are obtained.
Speaker: 

Amparo Casanova is an Assistant Professor at the Dalla Lana School of Public Health, Division of Biostatistics, University of Toronto.

November 25, 2011
McGill Statistics Seminar
François Bellavance

Estimation of the risk of a collision when using a cell phone while driving

15:30-16:30 BURN 1205
Abstract:   The use of cell phone while driving raises the question of whether it is associated with an increased collision risk and if so, what is its magnitude. For policy decision making, it is important to rely on an accurate estimate of the real crash risk of cell phone use while driving. Three important epidemiological studies were published on the subject, two using the case-crossover approach  and one using a more conventional longitudinal cohort design. The methodology and results of these studies will be presented and discussed.
Speaker: 

François Bellavance is a Professor of Statistics at HEC Montréal and the Director of the Transportation Safety Laboratory.

December 2, 2011
McGill Statistics Seminar
 Alberto Carabarin

Path-dependent estimation of a distribution under generalized censoring

15:30-16:30 BURN 1205
Abstract: This talk focuses on the problem of the estimation of a distribution on an arbitrary complete separable metric space when the data points are subject to censoring by a general class of random sets. A path-dependent estimator for the distribution is proposed; among other properties, the estimator is sequential in the sense that it only uses data preceding any fixed point at which it is evaluated. If the censoring mechanism is totally ordered, the paths may be chosen in such a way that the estimate of the distribution defines a measure. In this case, we can prove a functional central limit theorem for the estimator when the underlying space is Euclidean. This is joint work with Gail Ivanoff (University of Ottawa)
Speaker:

Alberto Carabarin is a Postdoctoral Fellow at McGill University. He works with Christian Genest and Johanna Nešlehová. He holds a PhD from the University of Ottawa.

December 9, 2011
CRM-ISM-GERAD Colloque de statistique
Giles Hooker

Detecting evolution in experimental ecology: Diagnostics for missing state variables

15:30-16:30

UQAM

Salle 5115

Abstract:

This talk considers goodness of fit diagnostics for time-series data from processes approximately modeled by systems of nonlinear ordinary differential equations. In particular, we seek to determine three nested causes of lack of fit: (i) unmodeled stochastic forcing, (ii) mis-specified functional forms and (iii) mis-specified state variables. Testing lack of fit in differential equations is challenging since the model is expressed in terms of rates of change of the measured variables. Here, lack of fit is represented on the model scale via time-varying parameters. We develop tests for each of the three cases above through bootstrap and permutation methods.

A motivating example is presented from laboratory-based ecology in which algae are grown on nitrogen-rich medium and rotifers are introduced as a predator. The resulting data exhibit dynamics that do not correspond to those generated by classical ecological models. A hypothesized explanation is that more than one algal species are present in the chemostat. We assess the statistical evidence for this claim and show that while models incorporating multiple algal species provide better agreement with the data, their existence cannot be demonstrated without strong model assumptions. We conclude with an examination of the use of control theory to design inputs into dynamic systems to improve parameter estimation and power to detect missing components.

Speaker:

Giles Hooker is an Assistant Professor in the Department of Statistical Science and the Department of Biological Statistics and Computational Biology at Cornell University. His main research interests include functional data analysis, machine learning and data analysis for dynamical systems.

 
 

 

Winter Term 2012

Date Event Speaker(s) Title Time Location
January 13, 2012
CRM-ISM-GERAD Colloque de statistique
Yulei He Bayesian approaches to evidence synthesis in clinical practice guideline development

15:30-16:30

Concordia, Library Building

LB-921.04

Abstract: The American College of Cardiology Foundation (ACCF) and the American Heart Association (AHA) have jointly engaged in the production of guideline in the area of cardiovascular disease since 1980. The developed guidelines are intended to
assist health care providers in clinical decision making by describing a range of generally acceptable approaches for the diagnosis, management, or prevention of specific diseases or conditions. This talk describes some of our work under a contract with ACCF/AHA for applying Bayesian methods to guideline recommendation development. In a demonstration example, we use Bayesian meta-analysis strategies to summarize evidence on the comparative effectiveness between Percutaneous coronary intervention and Coronary artery bypass grafting for patients with unprotected left main coronary artery disease. We show the usefulness and flexibility of Bayesian methods in handling data arisen from studies with different designs (e.g. RCTs and observational studies), performing indirect comparison among treatments when studies with direct comparisons are unavailable, and accounting for historical data.
Speakers:
Yulei He is Assistant Professor in the Department of Health Care Policy at the Harvard Medical School. His research focuses on the development and application of statistical methods for health services and policy research.
Schedule

 Talk: 15:30-16:30

January 20, 2012
McGill Statistics Seminar
Martin Larsson A concave regularization technique for sparse mixture models

15:30-16:30

BURN 1205
Abstract: Latent variable mixture models are a powerful tool for exploring the structure in large datasets. A common challenge for interpreting such models is a desire to impose sparsity, the natural assumption that each data point only contains few latent features. Since mixture distributions are constrained in their L1 norm, typical sparsity techniques based on L1 regularization become toothless, and concave regularization becomes necessary. Unfortunately concave regularization typically results in EM algorithms that must perform problematic non-convex M-step optimization. In this work, we introduce a technique
for circumventing this difficulty, using the so-called Mountain Pass Theorem to provide easily verifiable conditions under which the M-step is well-behaved despite the lacking convexity. We also develop a correspondence between logarithmic regularization and what we term the pseudo-Dirichlet distribution, a generalization of the ordinary Dirichlet distribution well-suited for inducing sparsity. We demonstrate our approach on a text corpus, inferring a sparse topic mixture model for 2,406 weblogs.
Speaker: Martin Larsson is a Ph.D. candidate in the School of Operations Research and Information Engineering at Cornell University; his advisor is Robert Jarrow.
January 27, 2012
McGill Statistics Seminar
Sepideh Farsinezhad
Applying Kalman filtering to problems in causal inference

15:30-16:30

BURN 1205
Abstract: A common problem in observational studies is estimating the causal effect of time-varying treatment in the presence of a time varying confounder.  When random assignment of subjects to comparison groups is not possible, time-varying confounders can cause bias in estimating causal effects even after standard regression adjustment if past treatment history is a predictor of future confounders. To eliminate the bias of standard methods for estimating the causal effect of time varying treatment, Robins developed a number of innovative methods for discrete treatment levels, including G-computation, G-estimation, and marginal structural models (MSMs).   However, there does not currently exist straight-forward applications of G-Estimation and MSMs for continuous treatment.  In this talk, I will introduce an alternative approach to previous methods which utilize the Kalman filter. The key advantage to the Kalman filter approach is that the model easily accommodates continuous levels of treatment.
Speaker: Sepideh Farsinezhad is a Ph.D. candidate in our department. She works with Russell Steele.
February 3, 2012
McGill Statistics Seminar
Yeting Du and Daphna Harel

Du: Simultaneous fixed and random effects selection in finite mixtures of linear mixed-effects models

Harel: Measuring fatigue in systemic sclerosis: a comparison of the SF-36 vitality subscale and FACIT fatigue scale using item response theory

15:30-16:30

BURN 1205
Abstract:

Du: Linear mixed-effects (LME) models are frequently used for modeling longitudinal data. One complicating factor in the analysis of such data is that samples are sometimes obtained from a population with significant underlying heterogeneity, which would be hard to capture by a single LME model. Such problems may be addressed by a finite mixture of linear mixed-effects (FMLME) models, which segments the population into subpopulations and models each subpopulation by a distinct LME model. Often in the initial stage of a study, a large number of predictors are introduced. However, their associations to the response variable vary from one component to another of the FMLME model. To enhance predictability and to obtain a parsimonious model, it is of great practical interest to identify the important effects, both fixed and random, in the model. Traditional variable selection techniques such as stepwise deletion and subset selection are computationally expensive as the number of covariates and components in the mixture model increases. In this talk, we introduce a penalized likelihood approach and propose a nested EM algorithm for efficient numerical computations. Our estimators are shown to possess desirable properties such as consistency, sparsity and asymptotic normality. We illustrate the performance of our method through simulations and a systemic sclerosis data example.

Harel: Multi-item, self-reported questionnaires are frequently used to measure aspects of health-related quality of life. Due to the latent nature of the constructs underlying these instruments, Item Response Theory models are often used to relate the observed item scores to the latent trait. However, there are no well-established guidelines for how to compare two such questionnaires. In this talk I will explore graphical methods for the comparison of multi-item self-reported questionnaires by using Partial Credit Models. This will be illustrated with the comparison of two fatigue questionnaires in patients with Systemic Sclerosis.

Speaker:

Ye Ting Du is an M.Sc. student in our department. He works with Abbas Khalili and Johanna Neslehova.

Daphna Harel is a Ph.D. candidate in our department. She works with Russell Steele.

February 10, 2012
CRM-ISM-GERAD Colloque de statistique
Winfried Stute and Jochen Blath

Stute: Principal component analysis of the Poisson Process

Blath: Longterm properties of the symbiotic branching model

14:00-16:30

Concordia
Abstract:

Stute: The Poisson Process constitutes a well-known model for describing random events over time.  It has many applications in marketing research, insurance mathematics and finance.  Though it has been studied for decades not much is known how to check (in a non-asymptotic way) the validity of the Poisson Process.  In this talk we present the principal component decomposition of the Poisson Process which enables us to derive finite sample properties of associated goodness-of-fit tests.  In the first step we show that the Fourier-transforms of the components contain Bessel and Struve functions.  Inversion leads to densities which are modified arc sin distributions.

Blath: In this talk we consider properties of the so-called 'symbiotic branching model' describing the spatial evolution of two populations which can only reproduce if they are both present at the same location at the same time. We will put particular emphasis on the long-term dynamics of this population model. To this end, we consider a 'critical curve' separating the asymptotic behaviour of the moments of the symbiotic branching process into two qualitatively different regimes. From this result, various properties can be derived. For example, we improve a result of Etheridge and Fleischmann on the speed of the propagation of the area in which both species are simultaneously present.

Speaker:

Winfried Stute is Professor of Statistics and Probability at the Universität Giessen.

Jochen Blath is Professor of Mathematics at the Technische Universität Berlin.

Schedule:

Talk 1: Jochen Blath 14:00 -- 15:00

Coffee Break 15:00 -- 15:30

Talk 2: Winfried Stute 15:30 -- 16:30

February 17, 2012
McGill Statistics Seminar
Annaliza McGillivray and Ana Best

McGillivray: A penalized quasi-likelihood approach for estimating the number of states in a hidden Markov model

Best: Risk-set sampling and left truncation in survival analysis

15:30-16:30

BURN 1205
Abstract:

McGillivray: In statistical applications of hidden Markov models (HMMs), one may have no knowledge of the number of hidden states (or order) of the model needed to be able to accurately represent the underlying process of the data. The problem of estimating the number of hidden states of the HMM is thus brought to the forefront. In this talk, we present a penalized quasi-likelihood approach for order estimation in HMMs which makes use of the fact that the marginal distribution of the observations from a HMM is a finite mixture model. The method starts with a HMM with a large number of states and obtains a model of lower order by clustering and combining similar states of the model through two penalty functions. We assess the performance of the new method via extensive simulation studies for Normal and Poisson HMMs.

Best: Statisticians are often faced with budget concerns when conducting studies. The collection of some covariates, such as genetic data, is very expensive. Other covariates, such as detailed histories, might be difficult or time-consuming to measure. This helped bring about the invention of the nested case-control study, and its more generalized version, risk-set sampled survival analysis. The literature has a good discussion of the properties of risk-set sampling in standard right-censored survival data. My interest is in extending the methods of risk-set sampling to left-truncated survival data. Left-truncated survival data arise in prevalent longitudinal studies. Since prevalent studies are easier and cheaper to conduct than incident studies, this extension is extremely practical and relevant. I will introduce the partial likelihood in this scenario.

Speaker:

Annaliza McGillivray is an M.Sc. student in our department. Se works with Abbas Khalli.

Ana Best is a Ph.D. candidate in our department. She works with David Wolfson.

March 2, 2012
McGill Statistics Seminar
 James O. Ramsay
Estimating a variance-covariance surface for functional and longitudinal data

15:30-16:30

BURN 1205
Abstract:

In functional data analysis, as in its multivariate counterpart, estimates of the bivariate covariance kernel σ(s,t )and its inverse are useful for many things, and we need the inverse of a covariance matrix or kernel especially often.  However, the dimensionality of functional observations often exceeds the sample size available to estimate σ(s,t, and then the analogue S of the multivariate sample estimate is singular and non-invertible.   Even when this is not the case, the high dimensionality S often implies unacceptable sample variability and loss of degrees of freedom for model fitting.   The common practice of employing low-dimensional principal component approximations to σ(s,t) to achieve invertibility also raises serious issues.


This talk describes a functional estimate of σ(s,t) and its inverse defined by an expansion in terms of finite element basis functions.   This strategy permits the user to control the resolution of the estimate, its smoothness, and the time lag over which covariance may be nonzero.   It turns out that the matrix resulting from evaluating σ(s,t) at a discrete set of time points is almost never singular, and therefore enables the estimation of S and S-1 as a seamless single problem.  These estimates have many applications to classical statistical problems, such as discrete but unequally spaced time and spatial series, as well as to functional and longitudinal data analysis.

Speaker: Jim Ramsay is a leading researcher in the area of Functional Data Analysis. He is a Professor Emeritus at the Department of Psychology, McGill University.
March 9, 2012
CRM-ISM-GERAD Colloque de statistique
Hugh Chipman and Mori Jamshidian

Jamshidian: Using tests of homoscedasticity to test missing completely at random

Chipman: Sequential optimization of a computer model and other "Active Learning" problems

14:00-16:30

UQAM, 201 ave. du Président-Kennedy, salle 5115
Abstract:

Jamshidian: Test of homogeneity of covariances (or homoscedasticity) among several groups has many applications in statistical analysis.  In the context of incomplete data analysis, tests of homoscedasticity among groups of cases with identical missing data patterns have been proposed to test whether data are missing completely at random (MCAR).  The proposed tests of MCAR often require large sample sizes n and/or large group sample sizes ni, and they usually fail when applied to non-normal data.  Hawkins (1981) proposed a test of multivariate normality and homoscedasticity that is an exact test for complete data when ni are small.  In this talk we present a modification of the Hawkins test for complete data to improve its performance, and extends its application to test of homoscedasticity and MCAR when data are multivariate normal and incomplete.  Moreover, we will show that the statistic used in the Hawkins test in conjunction with a nonparametric k-sample test can be used to obtain a nonparametric test of MCAR that works well for both normal and non-normal data.  It will be explained how a combination of the proposed normal-theory Hawkins test and the nonparametric test can be employed to test for homoscedasticity, MCAR, and multivariate normality.  We will present simulation studies that indicate the newly proposed tests generally outperform their existing competitors in terms of Type I error rejection rates.  Also, a power study of the proposed tests indicates good power.  The newly proposed methods use appropriate methods of imputations to impute missing data.  As such, multiple imputation is employed to assess the performance of our tests in light of imputation variability.  Moreover, examples will be presented where multiple imputation enables one to identify a group or groups whose covariance matrices differ from the majority of other groups.  Finally, an R-package that implements these new tests, called MissMech, will be briefly presented.

Chipman: In computer experiments, statistical models are commonly used as surrogates for slow-running codes.  In this talk, the usually ubiquitous Gaussian process models are nowhere to be seen, however.  Instead, an adaptive nonparametric regression model (BART) is used to deal with nonstationarities in the response surface.  By providing both point estimates and uncertainty bounds for prediction, BART provdes a basis for sequential design criteria to find optima with few function evaluations.  Similar ideas will also be illustrated in other active learning problems, such as identification of active compounds in drug discovery.

Speaker:

Hugh Chipman is Professor of Statistics at Acadia University. He holds the Canada Research Chair in Mathematical Modeling and is the 2009 recipient of the CRM-SSC Award. 

Mori Jamshidian is Professor in the Department of Mathematics at the California State University, Fullerton.

Schedule:

Mori Jamshidian: 14:00 -- 15:00

Coffee Break 15:00 -- 15:30

Hugh Chipman: 15:30 -- 16:30

March 16, 2012
McGill Statistics Seminar
 
Azadeh Shohoudi
 
Variable selection in longitudinal data with a change-point 15:30-16:30 BURN 1205
Abstract: Follow-up studies are frequently carried out to investigate the evolution of measurements through time, taken on a set of subjects. These measurements (responses) are bound to be influenced by subject specific covariates and if a regression model is used the data analyst is faced with the problem of selecting those covariates that “best explain” the data. For example, in a clinical trial, subjects may be monitored for a response following the administration of a treatment with a view of selecting the covariates that are best predictive of a treatment response. This variable selection setting is standard. However, more realistically, there will often be an unknown delay from the administration of a treatment before it has a measurable effect. This delay will not be directly observable since it is a property of the distribution of responses rather than of any particular trajectory of responses. Briefly, each subject will have an unobservable change-point. With a change-point component added, the variable selection problem necessitates the use of penalized likelihood methods. This is because the number of putative covariates for the responses, as well as the change-point distribution, could be large relative to the follow-up time and/or the number of subjects; variable selection in a change-point setting does not appear to have been studied in the literature. In this talk I will briefly introduce the multi-path change-point problem. I will show how variable selection for the covariates before the change, after the change, as well as for the change-point distribution, reduces to variable selection for a finite mixture of multivariate distributions. I will discuss the performance of my model selection methods using an example on cognitive decline in subjects with Alzheimer’s disease and through simulations.
Speaker:

Azadeh Shohoudi is a PhD student in our department, under the supervision of David Wolfson. The work she will present was done jointly with David Wolfson, Masoud Asgharian and Abbas Khalili.

March 23, 2012
McGill Statistics Seminar
 
Jinchi Lv
 
Model selection principles in misspecified models 15:30-16:30 BURN 1205
Abstract: Model selection is of fundamental importance to high-dimensional modeling featured in many contemporary applications. Classical principles of model selection include the Bayesian principle and the Kullback-Leibler divergence principle, which lead to the Bayesian information criterion and Akaike information criterion, respectively, when models are correctly specified. Yet model misspecification is unavoidable in practice. We derive novel asymptotic expansions of the two well-known principles in misspecified generalized linear models, which give the generalized BIC (GBIC) and generalized AIC. A specific form of prior probabilities motivated by the Kullback-Leibler divergence principle leads to the generalized BIC with prior probability ($\mbox{GBIC}_p$), which can be naturally decomposed as the sum of  the negative maximum quasi-log-likelihood, and a penalty on model dimensionality, and a penalty on model misspecification directly. Numerical studies demonstrate the advantage of the new methods for model selection in both correctly specified and misspecified models.
Speaker:

Jinchi Lv is an Assistant Professor in the Marshall School of Business, University of Southern California. He is interested in high dimensional inference, variable selection, machine learning and financial econometrics.

March 30, 2012
McGill Statistics Seminar
 
Julian Wolfson
 
A matching-based approach to assessing the surrogate value of a biomarker 15:30-16:30 BURN 1205
Abstract: Statisticians have developed a number of frameworks which can be used to assess the surrogate value of a biomarker, i.e. establish whether treatment effects on a biological quantity measured shortly after administration of treatment predict treatment effects on the clinical endpoint of interest. The most commonly applied of these frameworks is due to Prentice (1989), who proposed a set of criteria which a surrogate marker should satisfy. However, verifying these criteria using observed data can be challenging due to the presence of unmeasured simultaneous predictors (i.e. confounders) which influence both the potential surrogate and the outcome. In this work, we adapt a technique proposed by Rosenbaum (2002) for observational studies, in which observations are matched and the odds of treatment within each matched pair is bounded. This yields a straightforward and interpretable sensitivity analysis which can be performed particularly efficiently for certain types of test statistics. In this talk, I will introduce the surrogate endpoint problem, discuss the details of my proposed technique for assessing surrogate value, and illustrate with some simulated examples inspired by the problem of identifying immune surrogates in HIV vaccine trials.
Speaker:

Julian Wolfson is an Assistant Professor in the Division of Biostatistics at the University of Minnesota School of Public Health.

April 5, 2012

Thursday!

McGill Statistics Seminar
 
Pengfei Li
 
Hypothesis testing in finite mixture models: from the likelihood ratio test to EM-test 15:30-16:30 ARTS W-215
Abstract: In the presence of heterogeneity, a mixture model is most natural to characterize the random behavior of the samples taken from such populations. Such strategy has been widely employed in applications ranging from genetics, information technology, marketing, to finance. Studying the mixing structure behind a random sample from the population allows us to infer the degree of heterogeneity with important implications in applications such as the presence of disease subgroups in genetics. The statistical problem is to test the hypotheses on the order of the finite mixture models. There has been continued interest in the limiting behavior of the likelihood ratio tests. The non-regularity of the finite mixture models has provided statisticians ample examples of unusual limiting distributions. Yet many of such results are not convenient for conducting hypothesis tests. Motivated at overcoming such difficulties, we have developed a number of strategies to obtain tests with high efficiency yet easy to use limiting distributions. The latest development is a class of EM-tests which are advantageous in many respects. Their limiting distributions are easier to derive mathematically, simple for implementation in data analysis and valid for more general class of mixture models without restrictions on the space of the mixing distribution. The simulation indicates the limiting distributions have good precision at approximating the finite sample distributions in the examples investigated.
Speaker:

Pengfei Li is an Assistant Professor at the University of Waterloo. He obtained his Ph.D. from the Univesity of Waterloo in 2007, under the supervision of Jiahua Chen.

April 13, 2012
CRM-ISM-GERAD Colloque de statistique
 
Longhai Li and Sunil Rao
 

Li: High-dimensional feature selection using hierarchical Bayesian logistic regression with heavy-tailed priors

Rao: Best predictive estimation for linear mixed models with applications to small area estimation

14:00-16:30

McGill

MAASS 217

Abstract:

Li: The problem of selecting the most useful features from a great many (eg, thousands) of candidates arises in many areas of modern sciences. An interesting problem from genomic research is that, from thousands of genes that are active (expressed) in certain tissue cells, we want to find the genes that can be used to separate tissues of different classes (eg. cancer and normal). In this paper, we report a Bayesian logistic regression method based on heavytailed priors with moderately small degree freedom (such as 1) and small scale (such as 0.01), and using Gibbs sampling to do the computation. We show that it can distinctively separate a couple of useful features from a large number of useless ones, and discriminate many redundant correlated features. We also show that this method is very stable to the choice of scale. We apply our method to a microarray data set related to prostate cancer, and identify only 3 genes out of 6033 candidates that can separate cancer and normal tissues very well in leave-one-out cross-validation.

Rao: We derive the best predictive estimator (BPE) of the fixed parameters for a linear mixed model.  This leads to a new prediction procedure called observed best prediction (OBP), which is different from the empirical best linear unbiased prediction (EBLUP).  We show that BPE is more reasonable than the traditional estimators derived from estimation considerations, such as maximum likelihood (ML) and restricted maximum likelihood (REML), if the main interest is the prediction of the mixed effect.  We show how the OBP can significantly outperform the EBLUP in terms of mean squared prediction error (MSPE) if the underlying model is misspecified.  On the other hand, when the underlying model is correctly specified, the overall predictive performance of the OBP can be very similar to the EBLUP.  The well known Fay-Herriot small area model is used as an illustration of the methodology.  In addition, simulations and analysis of a data set on graft failure rates from kidney transplant operations will be used to show empirical performance. This is joint work with Jiming Jiang of UC-Davis and Thuan Nguyen of Oregon Health and Science University.

Speaker:

Longhai Li is an Assistant Professor of Statistics at the University of Saskatchewan

Sunil Rao is a Professor and Director of the Division of Biostatistics at the Department of Epidemiology and Public Health, University of Miami

Schedule:

Longhai Li: 14:00 -- 15:00

Coffee Break 15:00 -- 15:30

Sunil Rao: 15:30 -- 16:30

 

Website design: Dr Johanna Nešlehová

 

Last edited by on Thu, 06/07/2012 - 12:20