Past statistics seminars

Past McGill Statistics Seminar Series

 

Fall 2011  |   Winter 2012  |   Fall 2012  |   Winter 2013  |   Fall 2013  |   Winter 2014  |   Fall 2014  |   Winter 2015  |  Fall 2015  |   Winter 2016  |   Fall 2016   |   Winter 2017

Fall Term 2011

Date Event Speaker(s) Title Time Location

August 31, 2011

McGill Statistics Seminar
Johanna Ziegel
Precision estimation for stereological volumes

15:30-16:30

BURN 1205
Abstract:

Volume estimators based on Cavalieri’s principle are widely used in the bio-
sciences. For example in neuroscience, where volumetric measurements of brain
structures are of interest, systematic samples of serial sections are obtained by
magnetic resonance imaging or by a physical cutting procedure. The volume v is
then estimated by ˆv, which is the sum over the areas of the structure of interest in
the section planes multiplied by the width of the sections, t > 0.
Assessing the precision of such volume estimates is a question of great practical
importance, but statistically a challenging task due to the strong spatial dependence
of the data and typically small sample sizes. In this talk, an overview of classical
and new approaches to this problem will be presented. A special focus will be given
to some recent advances on distribution estimators and confidence intervals for ˆv;
see Hall and Ziegel (2011).


References
Hall, P. and Ziegel, J. (2011). Distribution estimators and confidence intervals for
stereological volumes. Biometrika 98, 417–431.

Speaker: Johanna Ziegel is a Postdoctoral Fellow in the Institute of Applied Mathematics at Heidelberg University. She holds a Ph.D. in Statistics from ETH Zürich and spent a year as a Postdoctoral Fellow with Peter Hall at the University of Melbourne.
September 9, 2011
CRM-ISM-GERAD Colloque de statistique
Ed Susko and Aurélie Labbe

Susko: Properties of Bayesian posteriors and bootstrap support in phylogenetic inference

Labbe: An integrated hierarchical Bayesian model for multivariate eQTL genetic mapping

14:00-16:30

UdeM, Pav. André-Aisenstadt,
SALLE 1360
Abstract:

Susko: The data generated by large scale sequencing projects is complex, high-dimensional, multivariate discrete data. In studies of evolutionary biology, the parameter space of evolutionary trees is an unusual additional complication from a statistical perspective. In this talk I will briefly introduce the general approaches to utilizing sequence data in phylogenetic inference. A particular issue of interest in phylogenetic inference is assessments of uncertainty about the true tree or structures that might be present in it. The primary way in which uncertainty is assessed in practice is through bootstrap support (BP) for splits, large values indicating strong support for the split. A difficulty with this measure, however, has been deciding how large is large enough. We discuss the interpretation of BP and ways of adjusting it so that it has an interpretation similar to a p-value. A related issue, having to do with the behaviour of methods when data are generated from a star tree, gives rise to an interesting example in which, due to the unusual statistical nature,Bayesian and maximum likelihood methods give strikingly different results, even asymptotically.

Labbe: Recently, expression quantitative loci (eQTL) mapping studies, where expression levels of thousands of genes are viewed as quantitative traits, have been used to provide greater insight into the biology of gene regulation. Current data analysis and interpretation of eQTL studies involve the use of multiple methods and applications, the output of which is often fragmented. In this talk, we present an integrated hierarchical Bayesian model that jointly models all genes and SNPs to detect eQTLs.
We propose a model (named iBMQ) that is speci cally designed to handle a large number G of gene expressions, a large number S of regressors (genetic markers) and a small number n of individuals in what we call a "large G, large S, small n" paradigm. This method incorporates genotypic and gene expression data into a single model while 1) specifically coping with the high dimensionality of eQTL data (large number of genes), 2) borrowing strength from all gene expression data for the mapping procedures, and 3) controlling the number of false positives to a desirable level.

Speakers:
Ed Susko, Dalhousie University

Aurélie Labbe, McGill University

Schedule:

Talk 1: Aurélie Labbe 14:00-15:00

Coffee Break 15:00-15:30

Talk 2: Ed Susko 15:30-16:30

September 16, 2011
McGill Statistics Seminar
Elif F. Acar
Inference and model selection for pair-copula constructions

15:30-16:30

BURN 1205
Abstract: Pair-copula constructions (PCCs) provide an elegant way to construct highly flexible multivariate distributions. However, for convenience of inference, pair-copulas are often assumed to depend on the conditioning variables only indirectly. In this talk, I will show how nonparametric smoothing techniques can be used to avoid this assumption. Model selection for PCCs will also be addressed within the proposed method.
Speaker: Elif F. Acar is a Postdoctoral Fellow in the Department of Mathematics and Statistics at McGill University. She holds a Ph.D. in Statistics from the University of Toronto.
September 23, 2011
McGill Statistics Seminar
Shaowei Lin
What is singular learning theory?

15:30-16:30

BURN 1205
Abstract: In this talk, we give a basic introduction to Sumio Watanabe's
Singular Learning Theory, as outlined in his book "Algebraic Geometry
and Statistical Learning Theory". Watanabe's key insight to studying
singular models was to use a deep result in algebraic geometry known
as Hironaka's Resolution of Singularities. This result allows him to
reparametrize the model in a normal form so that central limit
theorems can be applied. In the second half of the talk, we discuss
new algebraic methods where we define fiber ideals for discrete/Gaussian models. We show that the key to understanding the singular model lies in monomializing its fiber ideal.
Speaker: Shaowei Lin is a Postdoctoral Fellow at UC Berkeley. He received a B.Sc. from Stanford and a Ph.D. from UC Berkeley under the supervision of Bernd Sturmfels.
September 30, 2011
McGill Statistics Seminar
Ioana A. Cosma
Data sketching for cardinality and entropy estimation

15:30-16:30

BURN 1205
Abstract:

Streaming data is ubiquitous in a wide range of areas from engineering and information technology, finance, and commerce, to atmospheric physics, and earth sciences. The online approximation of properties of data streams is of great interest, but this approximation process is hindered by the sheer size of the data and the speed at which it is generated. Data stream algorithms typically allow only one pass over the data, and maintain sub-linear representations of the data from which target properties can be inferred with high efficiency.

In this talk we consider the online approximation of two important characterizations of data streams: cardinality and empirical Shannon entropy. We assume that the number of distinct elements observed in the stream is prohibitively large, so that the vector of cumulative
quantities cannot be stored on main computer memory for fast and efficient access. We focus on two techniques that use pseudo-random variates to form low-dimensional data sketches (using hashing and random projections), and derive estimators of the cardinality and empirical entropy. We discuss various properties of our estimators such as relative asymptotic efficiency, recursive computability, and error and complexity bounds. Finally, we present results on simulated data and seismic measurements from a volcano.


References:
Peter Clifford and Ioana A. Cosma (2011) “A statistical analysis of probabilistic counting algorithms” (to appear in the Scandinavian Journal of Statistics, preprint on arXiv:0801.3552).

Peter Clifford and Ioana A. Cosma (2009)“A simple sketching algorithm for entropy estimation” (in preparation, preprint on arXiv:0908.3961).

Speaker: Ioana A. Cosma is a Postdoctoral Fellow in the Statistical Laboratoty at Cambridge University, England. She holds a Ph.D. in Statistics from the University of Oxford.
October 7, 2011
McGill Statistics Seminar
Nikolai Kolev
 
Nonexchangeability and radial asymmetry identification via bivariate quantiles, with financial applications

15:30-16:30

BURN 1205
Abstract: In this talk, the following topics will be discussed: A class of bivariate probability integral transforms and Kendall distribution; bivariate quantile curves, central and lateral regions; non-exchangeability and radial asymmetry identification; new measures of nonexchangeability and radial asymmetry; financial applications and a few open problems (joint work with Flavio Ferreira).
Speaker:
Nikolai Kolev is a Professor of Statistics at the University of Sao Paulo, Brazil.
October 14, 2011
CRM-ISM-GERAD Colloque de statistique
Debbie Dupuis and Richard A. Davis

Dupuis: Modeling non-stationary extremes: The case of heat waves

Davis: Estimating extremal dependence in time series via the extremogram

14:00-16:30

McGill

TROTTIER 1080

Abstract:

Dupuis: Environmental processes are often non-stationary since climate patterns cause systematic seasonal effects and long-term climate changes cause trends.  The usual limit models are not applicable for non-stationary processes, but models from standard extreme value theory can be used along with statistical modeling to provide useful inference. Traditional approaches include letting model parameters be a function of covariates or using time-varying thresholds. These approaches are inadequate for the study of heat waves however and we show how a recent pre-processing approach by Eastoe and Tawn (2009) can be used in conjunction with an innovative change-point analysis to model daily maximum temperature.  The model is then fitted to data from four U.S. cities and used to estimate the recurrence probabilities of runs over seasonally high temperatures.  We show that the probability of long and intense heat waves has increased considerably over 50 years.

Davis: The extremogram is a flexible quantitative tool that measures various types of extremal dependence in a stationary time series.  In many respects, the extremogram can be viewed as an extreme-value analogue of the autocorrelation function (ACF) for a time series.  Under mixing conditions, the asymptotic normality of the empirical extremogram was derived in Davis and Mikosch (2009).  Unfortunately, the limiting variance is a difficult quantity to estimate.  Instead we employ the stationary bootstrap to the empirical extremogram and establish that this resampling  procedure provides an asymptotically correct approximation to the central limit theorem.  This in turn can be used for constructing credible confidence bounds for the sample extremogram. The use of the stationary bootstrap for the extremogram is illustrated in a variety of real and simulated data sets. The cross-extremogram measures cross-sectional extremal dependence in multivariate time series. A measure of this dependence, especially left tail dependence, is of great importance in the calculation of portfolio risk.  We find that after devolatilizing  the marginal series, extremal dependence still remains, which suggests that the extremal dependence is not due solely to the heteroskedasticity in the stock returns process. However, for the univariate series, the filtering removes all extremal dependence.  Following Geman and Chang (2010), a return time extremogram which measures the waiting time between rare or extreme events in univariate and bivariate stationary time series is calculated. The return time extremogram suggests the existence of extremal clustering in the return times of extreme events for financial assets. The stationary bootstrap can again provide an asymptotically correct approximation to the central limit theorem and can be used for constructing credible confidence bounds for this return time extremogram.  (This is joint work with Thomas Mikosch and Ivor Cribben.)

Speaker:

Debbie Dupuis is a Professor of Statistics at HEC Montréal. She works in extreme-value theory, robust estimation and computational statistics.

Richard A. Davis is a Professor of Statistics at Columbia University. He works in applied probability, time series, stochastic processes and extreme-value theory.  Together with P. J. Brockwell, he is the author of the well-known textbook Introduction to Time Series and Forecasting.

Schedule:

Talk 1: Debbie Dupuis 14:00 -- 15:00

Coffee Break 15:00 -- 15:30

Talk 2: Richard A. Davis 15:30 -- 16:30

October 21, 2011
McGill Statistics Seminar
William Astle
Bayesian modelling of GWAS data using linear mixed models

15:30-16:30

BURN 1205
Abstract: Genome-wide association studies (GWAS) are used to identify physical positions (loci) on the genome where genetic variation is causally associated with a phenotype of interest at the population level. Typical studies are based on the measurement of several hundred thousand single nucleotide polymorphism (SNP) variants spread across the genome, in a few thousand individuals. The resulting datasets are large and require computationally efficient methods of statistical analysis.

Two variance components linear mixed models have recently been proposed as a method of analysis for GWAS data that can control for the confounding effects of population stratification, by modelling the correlation between study subjects induced by relatedness. Unfortunately, standard methods for fitting linear mixed models are computationally intensive because computation of the likelihood depends on the inversion of a large matrix which is a function of the model parameters.  I will describe a fast method for calculating the likelihood of a two variance components linear model which allows analysis of a large GWAS dataset using mixed models by Bayesian inference. A Bayesian analysis of GWAS provides a natural way of overcoming the so-called "multiple-testing" problem which arises from the large dimension of the predictor variable space. In the Bayesian framework we should have low prior belief that any particular genetic variant explains a large proportion of the phenotypic variation. The normal-exponential-gamma prior as been proposed as a good representation of such belief and I will describe an efficient MCMC algorithm which allows to incorporate this prior into the modelling.
Speaker:

William Astle is a Postdoctoral Fellow at McGill University, working with Aurélie Labbe and David A. Stephens. He holds a Ph.D. from Imperial Colledge, London.

October 28, 2011
McGill Statistics Seminar
Andrew Patton
Simulated method of moments estimation for copula-based multivariate models

15:00-16:00

BURN 1205
Abstract: This paper considers the estimation of the parameters of a copula via a simulated method of moments type approach. This approach is attractive when the likelihood of the copula model is not known in closed form, or when the researcher has a set of dependence measures or other functionals of the copula, such as pricing errors, that are of particular interest. The proposed approach naturally also nests method of moments and generalized method of moments estimators. Combining existing results on simulation based estimation with recent results from empirical copula process theory, we show the consistency and asymptotic normality of the proposed estimator, and obtain a simple test of over-identifying restrictions as a goodness-of-fit test. The results apply to both iid and time series data. We analyze the finite-sample behavior of these estimators in an extensive simulation study. We apply the model to a group of seven financial stock returns and find evidence of statistically significant tail dependence, and that the dependence between these assets is stronger in crashes than booms.
Speaker: Andrew Patton is an Associate Professor of Economics at Duke University, Durham, North Carolina.
November 3, 2011
McGill Statistics Seminar
Alessandro Rinaldo
Maximum likelihood estimation in network models

16:00-17:00

BURN 1205
Abstract: This talk is concerned with maximum likelihood estimation (MLE) in exponential statistical models for networks (random graphs) and, in particular, with the beta model, a simple model for undirected graphs in which the degree sequence is the minimal sufficient statistic. The speaker will present necessary and sufficient conditions for the existence of the MLE of the beta model parameters that are based on a geometric object known as the polytope of degree sequences. Using this result, it is possible to characterize in a combinatorial fashion sample points leading to a non-existent MLE and non-estimability of the probability parameters under a non-existent MLE. The speaker will further indicate some conditions guaranteeing that the MLE exists with probability tending to 1 as the number of nodes increases. Much of this analysis applies also to other well-known models for networks, such as the Rasch model, the Bradley-Terry model and the more general p1 model of Holland and Leinhardt. These results are in fact instantiations of rather general geometric properties of exponential families with polyhedral support that will be illustrated with a simple exponential random graph model.
Speaker: Alessandro Rinaldo is an Assistant Professor of Statistics at Carnegie Mellon University, Pittsburgh, Pennsylvania.
November 4, 2011
McGill Statistics Seminar
Martin Lysy
A Bayesian method of parametric inference for diffusion processes

15:30-16:30

BURN 1205
Abstract: Diffusion processes have been used to model a multitude of continuous-time phenomena in Engineering and the Natural Sciences, and as in this case, the volatility of financial assets.  However, parametric inference has long been complicated by an intractable likelihood function.  For many models the most effective solution involves a large amount of missing data for which the typical Gibbs sampler can be arbitrarily slow.  On the other hand, joint parameter and missing data proposals can lead to a radical improvement, but their acceptance rate tends to scale exponentially with the number of observations.

We consider here a novel method of dividing the inference process into separate data batches, each small enough to benefit from joint proposals, to be processed consecutively.  A filter combines batch contributions to produce likelihood inference based on the whole dataset.  Although the result is not always unbiased, it has very low variability, often achieving considerable accuracy in a short amount of time.  We present an example using Heston's popular model for option pricing, but much of the methodology can be extended beyond diffusions to Hidden Markov and other State-Space models.
Speaker:

Martin Lysy is finishing his Ph.D. in the Deparment of Statistics, Harvard University.

November 11, 2011
CRM-ISM-GERAD Colloque de statistique
Hélène Guérin and Ana-Maria Staicu

Guérin: An ergodic variant of the telegraph process for a toy model of bacterial chemotaxis

Staicu: Skewed functional processes and their applications

14:00-16:30

UdeM
Abstract:

Guérin: I will study the long time behavior of a variant of the classic telegraph process, with non-constant jump rates that induce a drift towards the origin. This process can be seen as a toy model for velocity-jump processes recently proposed as mathematical models of bacterial chemotaxis. I will give its invariant law and construct an explicit coupling for velocity and position, providing exponential ergodicity with moreover a quantitative control of the total variation distance to equilibrium at each time instant. It is a joint work with Joaquin Fontbona (Universidad de Santiago, Chile) and Florent Malrieu (Université Rennes 1, France).

Staicu: We introduce a novel class of models for functional data exhibiting skewness or other shape characteristics that vary with spatial location. Such data are not envisaged by the current approaches to model functional data, due to the lack of Gaussian – like features. Our methodology allows modeling the pointwise quantiles, has interpretability advantages and is computationally feasible. The methods were motivated by and are illustrated with a state-of-the-art study of neuronal tracts in multiple sclerosis patients and healthy controls.

Speaker:

Ana-Maria Staicu obtained her PhD from the University in Toronto and is currently Assistant Professor at the North Carolina State University. She works in functional data analysis and likelihood methods.

Hélène Guérin is an Associate Professor at Université Rennes 1. She obtained her PhD at the Université Paris X Nanterre. Her main research interests are in the probabilistic interpretation of nonlinear partial differential equations.

Schedule:

Hélène Guérin: 14:00 -- 15:00

Coffee Break 15:00 -- 15:30

Ana-Maria Staicu: 15:30 -- 16:30

November 18, 2011
McGill Statistics Seminar
 Amparo Casanova
Construction of bivariate distributions via principal components

15:30-16:30

BURN 1205
Abstract: The diagonal expansion of a bivariate distribution (Lancaster, 1958) has been used as a tool to construct bivariate distributions; this method has been generalized using principal dimensions of random variables (Cuadras 2002). Sufficient and necessary conditions are given for uniform, exponential, logistic and Pareto marginals in the one and two-dimensional case. The corresponding copulas are obtained.
Speaker: 

Amparo Casanova is an Assistant Professor at the Dalla Lana School of Public Health, Division of Biostatistics, University of Toronto.

November 25, 2011
McGill Statistics Seminar
François Bellavance

Estimation of the risk of a collision when using a cell phone while driving

15:30-16:30 BURN 1205
Abstract:   The use of cell phone while driving raises the question of whether it is associated with an increased collision risk and if so, what is its magnitude. For policy decision making, it is important to rely on an accurate estimate of the real crash risk of cell phone use while driving. Three important epidemiological studies were published on the subject, two using the case-crossover approach  and one using a more conventional longitudinal cohort design. The methodology and results of these studies will be presented and discussed.
Speaker: 

François Bellavance is a Professor of Statistics at HEC Montréal and the Director of the Transportation Safety Laboratory.

December 2, 2011
McGill Statistics Seminar
 Alberto Carabarin

Path-dependent estimation of a distribution under generalized censoring

15:30-16:30 BURN 1205
Abstract: This talk focuses on the problem of the estimation of a distribution on an arbitrary complete separable metric space when the data points are subject to censoring by a general class of random sets. A path-dependent estimator for the distribution is proposed; among other properties, the estimator is sequential in the sense that it only uses data preceding any fixed point at which it is evaluated. If the censoring mechanism is totally ordered, the paths may be chosen in such a way that the estimate of the distribution defines a measure. In this case, we can prove a functional central limit theorem for the estimator when the underlying space is Euclidean. This is joint work with Gail Ivanoff (University of Ottawa)
Speaker:

Alberto Carabarin is a Postdoctoral Fellow at McGill University. He works with Christian Genest and Johanna Nešlehová. He holds a PhD from the University of Ottawa.

December 9, 2011
CRM-ISM-GERAD Colloque de statistique
Giles Hooker

Detecting evolution in experimental ecology: Diagnostics for missing state variables

15:30-16:30

UQAM

Salle 5115

Abstract:

This talk considers goodness of fit diagnostics for time-series data from processes approximately modeled by systems of nonlinear ordinary differential equations. In particular, we seek to determine three nested causes of lack of fit: (i) unmodeled stochastic forcing, (ii) mis-specified functional forms and (iii) mis-specified state variables. Testing lack of fit in differential equations is challenging since the model is expressed in terms of rates of change of the measured variables. Here, lack of fit is represented on the model scale via time-varying parameters. We develop tests for each of the three cases above through bootstrap and permutation methods.

A motivating example is presented from laboratory-based ecology in which algae are grown on nitrogen-rich medium and rotifers are introduced as a predator. The resulting data exhibit dynamics that do not correspond to those generated by classical ecological models. A hypothesized explanation is that more than one algal species are present in the chemostat. We assess the statistical evidence for this claim and show that while models incorporating multiple algal species provide better agreement with the data, their existence cannot be demonstrated without strong model assumptions. We conclude with an examination of the use of control theory to design inputs into dynamic systems to improve parameter estimation and power to detect missing components.

Speaker:

Giles Hooker is an Assistant Professor in the Department of Statistical Science and the Department of Biological Statistics and Computational Biology at Cornell University. His main research interests include functional data analysis, machine learning and data analysis for dynamical systems.

 
 

Go to top

Winter Term 2012

Date Event Speaker(s) Title Time Location
January 13, 2012
CRM-ISM-GERAD Colloque de statistique
Yulei He Bayesian approaches to evidence synthesis in clinical practice guideline development

15:30-16:30

Concordia, Library Building

LB-921.04

Abstract: The American College of Cardiology Foundation (ACCF) and the American Heart Association (AHA) have jointly engaged in the production of guideline in the area of cardiovascular disease since 1980. The developed guidelines are intended to
assist health care providers in clinical decision making by describing a range of generally acceptable approaches for the diagnosis, management, or prevention of specific diseases or conditions. This talk describes some of our work under a contract with ACCF/AHA for applying Bayesian methods to guideline recommendation development. In a demonstration example, we use Bayesian meta-analysis strategies to summarize evidence on the comparative effectiveness between Percutaneous coronary intervention and Coronary artery bypass grafting for patients with unprotected left main coronary artery disease. We show the usefulness and flexibility of Bayesian methods in handling data arisen from studies with different designs (e.g. RCTs and observational studies), performing indirect comparison among treatments when studies with direct comparisons are unavailable, and accounting for historical data.
Speakers:
Yulei He is Assistant Professor in the Department of Health Care Policy at the Harvard Medical School. His research focuses on the development and application of statistical methods for health services and policy research.
Schedule

 Talk: 15:30-16:30

January 20, 2012
McGill Statistics Seminar
Martin Larsson A concave regularization technique for sparse mixture models

15:30-16:30

BURN 1205
Abstract: Latent variable mixture models are a powerful tool for exploring the structure in large datasets. A common challenge for interpreting such models is a desire to impose sparsity, the natural assumption that each data point only contains few latent features. Since mixture distributions are constrained in their L1 norm, typical sparsity techniques based on L1 regularization become toothless, and concave regularization becomes necessary. Unfortunately concave regularization typically results in EM algorithms that must perform problematic non-convex M-step optimization. In this work, we introduce a technique
for circumventing this difficulty, using the so-called Mountain Pass Theorem to provide easily verifiable conditions under which the M-step is well-behaved despite the lacking convexity. We also develop a correspondence between logarithmic regularization and what we term the pseudo-Dirichlet distribution, a generalization of the ordinary Dirichlet distribution well-suited for inducing sparsity. We demonstrate our approach on a text corpus, inferring a sparse topic mixture model for 2,406 weblogs.
Speaker: Martin Larsson is a Ph.D. candidate in the School of Operations Research and Information Engineering at Cornell University; his advisor is Robert Jarrow.
January 27, 2012
McGill Statistics Seminar
Sepideh Farsinezhad
Applying Kalman filtering to problems in causal inference

15:30-16:30

BURN 1205
Abstract: A common problem in observational studies is estimating the causal effect of time-varying treatment in the presence of a time varying confounder.  When random assignment of subjects to comparison groups is not possible, time-varying confounders can cause bias in estimating causal effects even after standard regression adjustment if past treatment history is a predictor of future confounders. To eliminate the bias of standard methods for estimating the causal effect of time varying treatment, Robins developed a number of innovative methods for discrete treatment levels, including G-computation, G-estimation, and marginal structural models (MSMs).   However, there does not currently exist straight-forward applications of G-Estimation and MSMs for continuous treatment.  In this talk, I will introduce an alternative approach to previous methods which utilize the Kalman filter. The key advantage to the Kalman filter approach is that the model easily accommodates continuous levels of treatment.
Speaker: Sepideh Farsinezhad is a Ph.D. candidate in our department. She works with Russell Steele.
February 3, 2012
McGill Statistics Seminar
Yeting Du and Daphna Harel

Du: Simultaneous fixed and random effects selection in finite mixtures of linear mixed-effects models

Harel: Measuring fatigue in systemic sclerosis: a comparison of the SF-36 vitality subscale and FACIT fatigue scale using item response theory

15:30-16:30

BURN 1205
Abstract:

Du: Linear mixed-effects (LME) models are frequently used for modeling longitudinal data. One complicating factor in the analysis of such data is that samples are sometimes obtained from a population with significant underlying heterogeneity, which would be hard to capture by a single LME model. Such problems may be addressed by a finite mixture of linear mixed-effects (FMLME) models, which segments the population into subpopulations and models each subpopulation by a distinct LME model. Often in the initial stage of a study, a large number of predictors are introduced. However, their associations to the response variable vary from one component to another of the FMLME model. To enhance predictability and to obtain a parsimonious model, it is of great practical interest to identify the important effects, both fixed and random, in the model. Traditional variable selection techniques such as stepwise deletion and subset selection are computationally expensive as the number of covariates and components in the mixture model increases. In this talk, we introduce a penalized likelihood approach and propose a nested EM algorithm for efficient numerical computations. Our estimators are shown to possess desirable properties such as consistency, sparsity and asymptotic normality. We illustrate the performance of our method through simulations and a systemic sclerosis data example.

Harel: Multi-item, self-reported questionnaires are frequently used to measure aspects of health-related quality of life. Due to the latent nature of the constructs underlying these instruments, Item Response Theory models are often used to relate the observed item scores to the latent trait. However, there are no well-established guidelines for how to compare two such questionnaires. In this talk I will explore graphical methods for the comparison of multi-item self-reported questionnaires by using Partial Credit Models. This will be illustrated with the comparison of two fatigue questionnaires in patients with Systemic Sclerosis.

Speaker:

Ye Ting Du is an M.Sc. student in our department. He works with Abbas Khalili and Johanna Neslehova.

Daphna Harel is a Ph.D. candidate in our department. She works with Russell Steele.

February 10, 2012
CRM-ISM-GERAD Colloque de statistique
Winfried Stute and Jochen Blath

Stute: Principal component analysis of the Poisson Process

Blath: Longterm properties of the symbiotic branching model

14:00-16:30

Concordia
Abstract:

Stute: The Poisson Process constitutes a well-known model for describing random events over time.  It has many applications in marketing research, insurance mathematics and finance.  Though it has been studied for decades not much is known how to check (in a non-asymptotic way) the validity of the Poisson Process.  In this talk we present the principal component decomposition of the Poisson Process which enables us to derive finite sample properties of associated goodness-of-fit tests.  In the first step we show that the Fourier-transforms of the components contain Bessel and Struve functions.  Inversion leads to densities which are modified arc sin distributions.

Blath: In this talk we consider properties of the so-called 'symbiotic branching model' describing the spatial evolution of two populations which can only reproduce if they are both present at the same location at the same time. We will put particular emphasis on the long-term dynamics of this population model. To this end, we consider a 'critical curve' separating the asymptotic behaviour of the moments of the symbiotic branching process into two qualitatively different regimes. From this result, various properties can be derived. For example, we improve a result of Etheridge and Fleischmann on the speed of the propagation of the area in which both species are simultaneously present.

Speaker:

Winfried Stute is Professor of Statistics and Probability at the Universität Giessen.

Jochen Blath is Professor of Mathematics at the Technische Universität Berlin.

Schedule:

Talk 1: Jochen Blath 14:00 -- 15:00

Coffee Break 15:00 -- 15:30

Talk 2: Winfried Stute 15:30 -- 16:30

February 17, 2012
McGill Statistics Seminar
Annaliza McGillivray and Ana Best

McGillivray: A penalized quasi-likelihood approach for estimating the number of states in a hidden Markov model

Best: Risk-set sampling and left truncation in survival analysis

15:30-16:30

BURN 1205
Abstract:

McGillivray: In statistical applications of hidden Markov models (HMMs), one may have no knowledge of the number of hidden states (or order) of the model needed to be able to accurately represent the underlying process of the data. The problem of estimating the number of hidden states of the HMM is thus brought to the forefront. In this talk, we present a penalized quasi-likelihood approach for order estimation in HMMs which makes use of the fact that the marginal distribution of the observations from a HMM is a finite mixture model. The method starts with a HMM with a large number of states and obtains a model of lower order by clustering and combining similar states of the model through two penalty functions. We assess the performance of the new method via extensive simulation studies for Normal and Poisson HMMs.

Best: Statisticians are often faced with budget concerns when conducting studies. The collection of some covariates, such as genetic data, is very expensive. Other covariates, such as detailed histories, might be difficult or time-consuming to measure. This helped bring about the invention of the nested case-control study, and its more generalized version, risk-set sampled survival analysis. The literature has a good discussion of the properties of risk-set sampling in standard right-censored survival data. My interest is in extending the methods of risk-set sampling to left-truncated survival data. Left-truncated survival data arise in prevalent longitudinal studies. Since prevalent studies are easier and cheaper to conduct than incident studies, this extension is extremely practical and relevant. I will introduce the partial likelihood in this scenario.

Speaker:

Annaliza McGillivray is an M.Sc. student in our department. Se works with Abbas Khalli.

Ana Best is a Ph.D. candidate in our department. She works with David Wolfson.

March 2, 2012
McGill Statistics Seminar
 James O. Ramsay
Estimating a variance-covariance surface for functional and longitudinal data

15:30-16:30

BURN 1205
Abstract:

In functional data analysis, as in its multivariate counterpart, estimates of the bivariate covariance kernel σ(s,t )and its inverse are useful for many things, and we need the inverse of a covariance matrix or kernel especially often.  However, the dimensionality of functional observations often exceeds the sample size available to estimate σ(s,t, and then the analogue S of the multivariate sample estimate is singular and non-invertible.   Even when this is not the case, the high dimensionality S often implies unacceptable sample variability and loss of degrees of freedom for model fitting.   The common practice of employing low-dimensional principal component approximations to σ(s,t) to achieve invertibility also raises serious issues.


This talk describes a functional estimate of σ(s,t) and its inverse defined by an expansion in terms of finite element basis functions.   This strategy permits the user to control the resolution of the estimate, its smoothness, and the time lag over which covariance may be nonzero.   It turns out that the matrix resulting from evaluating σ(s,t) at a discrete set of time points is almost never singular, and therefore enables the estimation of S and S-1 as a seamless single problem.  These estimates have many applications to classical statistical problems, such as discrete but unequally spaced time and spatial series, as well as to functional and longitudinal data analysis.

Speaker: Jim Ramsay is a leading researcher in the area of Functional Data Analysis. He is a Professor Emeritus at the Department of Psychology, McGill University.
March 9, 2012
CRM-ISM-GERAD Colloque de statistique
Hugh Chipman and Mori Jamshidian

Jamshidian: Using tests of homoscedasticity to test missing completely at random

Chipman: Sequential optimization of a computer model and other "Active Learning" problems

14:00-16:30

UQAM, 201 ave. du Président-Kennedy, salle 5115
Abstract:

Jamshidian: Test of homogeneity of covariances (or homoscedasticity) among several groups has many applications in statistical analysis.  In the context of incomplete data analysis, tests of homoscedasticity among groups of cases with identical missing data patterns have been proposed to test whether data are missing completely at random (MCAR).  The proposed tests of MCAR often require large sample sizes n and/or large group sample sizes ni, and they usually fail when applied to non-normal data.  Hawkins (1981) proposed a test of multivariate normality and homoscedasticity that is an exact test for complete data when ni are small.  In this talk we present a modification of the Hawkins test for complete data to improve its performance, and extends its application to test of homoscedasticity and MCAR when data are multivariate normal and incomplete.  Moreover, we will show that the statistic used in the Hawkins test in conjunction with a nonparametric k-sample test can be used to obtain a nonparametric test of MCAR that works well for both normal and non-normal data.  It will be explained how a combination of the proposed normal-theory Hawkins test and the nonparametric test can be employed to test for homoscedasticity, MCAR, and multivariate normality.  We will present simulation studies that indicate the newly proposed tests generally outperform their existing competitors in terms of Type I error rejection rates.  Also, a power study of the proposed tests indicates good power.  The newly proposed methods use appropriate methods of imputations to impute missing data.  As such, multiple imputation is employed to assess the performance of our tests in light of imputation variability.  Moreover, examples will be presented where multiple imputation enables one to identify a group or groups whose covariance matrices differ from the majority of other groups.  Finally, an R-package that implements these new tests, called MissMech, will be briefly presented.

Chipman: In computer experiments, statistical models are commonly used as surrogates for slow-running codes.  In this talk, the usually ubiquitous Gaussian process models are nowhere to be seen, however.  Instead, an adaptive nonparametric regression model (BART) is used to deal with nonstationarities in the response surface.  By providing both point estimates and uncertainty bounds for prediction, BART provdes a basis for sequential design criteria to find optima with few function evaluations.  Similar ideas will also be illustrated in other active learning problems, such as identification of active compounds in drug discovery.

Speaker:

Hugh Chipman is Professor of Statistics at Acadia University. He holds the Canada Research Chair in Mathematical Modeling and is the 2009 recipient of the CRM-SSC Award. 

Mori Jamshidian is Professor in the Department of Mathematics at the California State University, Fullerton.

Schedule:

Mori Jamshidian: 14:00 -- 15:00

Coffee Break 15:00 -- 15:30

Hugh Chipman: 15:30 -- 16:30

March 16, 2012
McGill Statistics Seminar
 
Azadeh Shohoudi
 
Variable selection in longitudinal data with a change-point 15:30-16:30 BURN 1205
Abstract: Follow-up studies are frequently carried out to investigate the evolution of measurements through time, taken on a set of subjects. These measurements (responses) are bound to be influenced by subject specific covariates and if a regression model is used the data analyst is faced with the problem of selecting those covariates that “best explain” the data. For example, in a clinical trial, subjects may be monitored for a response following the administration of a treatment with a view of selecting the covariates that are best predictive of a treatment response. This variable selection setting is standard. However, more realistically, there will often be an unknown delay from the administration of a treatment before it has a measurable effect. This delay will not be directly observable since it is a property of the distribution of responses rather than of any particular trajectory of responses. Briefly, each subject will have an unobservable change-point. With a change-point component added, the variable selection problem necessitates the use of penalized likelihood methods. This is because the number of putative covariates for the responses, as well as the change-point distribution, could be large relative to the follow-up time and/or the number of subjects; variable selection in a change-point setting does not appear to have been studied in the literature. In this talk I will briefly introduce the multi-path change-point problem. I will show how variable selection for the covariates before the change, after the change, as well as for the change-point distribution, reduces to variable selection for a finite mixture of multivariate distributions. I will discuss the performance of my model selection methods using an example on cognitive decline in subjects with Alzheimer’s disease and through simulations.
Speaker:

Azadeh Shohoudi is a PhD student in our department, under the supervision of David Wolfson. The work she will present was done jointly with David Wolfson, Masoud Asgharian and Abbas Khalili.

March 23, 2012
McGill Statistics Seminar
 
Jinchi Lv
 
Model selection principles in misspecified models 15:30-16:30 BURN 1205
Abstract: Model selection is of fundamental importance to high-dimensional modeling featured in many contemporary applications. Classical principles of model selection include the Bayesian principle and the Kullback-Leibler divergence principle, which lead to the Bayesian information criterion and Akaike information criterion, respectively, when models are correctly specified. Yet model misspecification is unavoidable in practice. We derive novel asymptotic expansions of the two well-known principles in misspecified generalized linear models, which give the generalized BIC (GBIC) and generalized AIC. A specific form of prior probabilities motivated by the Kullback-Leibler divergence principle leads to the generalized BIC with prior probability ($\mbox{GBIC}_p$), which can be naturally decomposed as the sum of  the negative maximum quasi-log-likelihood, and a penalty on model dimensionality, and a penalty on model misspecification directly. Numerical studies demonstrate the advantage of the new methods for model selection in both correctly specified and misspecified models.
Speaker:

Jinchi Lv is an Assistant Professor in the Marshall School of Business, University of Southern California. He is interested in high dimensional inference, variable selection, machine learning and financial econometrics.

March 30, 2012
McGill Statistics Seminar
 
Julian Wolfson
 
A matching-based approach to assessing the surrogate value of a biomarker 15:30-16:30 BURN 1205
Abstract: Statisticians have developed a number of frameworks which can be used to assess the surrogate value of a biomarker, i.e. establish whether treatment effects on a biological quantity measured shortly after administration of treatment predict treatment effects on the clinical endpoint of interest. The most commonly applied of these frameworks is due to Prentice (1989), who proposed a set of criteria which a surrogate marker should satisfy. However, verifying these criteria using observed data can be challenging due to the presence of unmeasured simultaneous predictors (i.e. confounders) which influence both the potential surrogate and the outcome. In this work, we adapt a technique proposed by Rosenbaum (2002) for observational studies, in which observations are matched and the odds of treatment within each matched pair is bounded. This yields a straightforward and interpretable sensitivity analysis which can be performed particularly efficiently for certain types of test statistics. In this talk, I will introduce the surrogate endpoint problem, discuss the details of my proposed technique for assessing surrogate value, and illustrate with some simulated examples inspired by the problem of identifying immune surrogates in HIV vaccine trials.
Speaker:

Julian Wolfson is an Assistant Professor in the Division of Biostatistics at the University of Minnesota School of Public Health.

April 5, 2012

McGill Statistics Seminar
 
Pengfei Li
 
Hypothesis testing in finite mixture models: from the likelihood ratio test to EM-test 15:30-16:30 ARTS W-215
Abstract: In the presence of heterogeneity, a mixture model is most natural to characterize the random behavior of the samples taken from such populations. Such strategy has been widely employed in applications ranging from genetics, information technology, marketing, to finance. Studying the mixing structure behind a random sample from the population allows us to infer the degree of heterogeneity with important implications in applications such as the presence of disease subgroups in genetics. The statistical problem is to test the hypotheses on the order of the finite mixture models. There has been continued interest in the limiting behavior of the likelihood ratio tests. The non-regularity of the finite mixture models has provided statisticians ample examples of unusual limiting distributions. Yet many of such results are not convenient for conducting hypothesis tests. Motivated at overcoming such difficulties, we have developed a number of strategies to obtain tests with high efficiency yet easy to use limiting distributions. The latest development is a class of EM-tests which are advantageous in many respects. Their limiting distributions are easier to derive mathematically, simple for implementation in data analysis and valid for more general class of mixture models without restrictions on the space of the mixing distribution. The simulation indicates the limiting distributions have good precision at approximating the finite sample distributions in the examples investigated.
Speaker:

Pengfei Li is an Assistant Professor at the University of Waterloo. He obtained his Ph.D. from the Univesity of Waterloo in 2007, under the supervision of Jiahua Chen.

April 13, 2012
CRM-ISM-GERAD Colloque de statistique
 
Longhai Li and Sunil Rao
 

Li: High-dimensional feature selection using hierarchical Bayesian logistic regression with heavy-tailed priors

Rao: Best predictive estimation for linear mixed models with applications to small area estimation

14:00-16:30

McGill

MAASS 217

Abstract:

Li: The problem of selecting the most useful features from a great many (eg, thousands) of candidates arises in many areas of modern sciences. An interesting problem from genomic research is that, from thousands of genes that are active (expressed) in certain tissue cells, we want to find the genes that can be used to separate tissues of different classes (eg. cancer and normal). In this paper, we report a Bayesian logistic regression method based on heavytailed priors with moderately small degree freedom (such as 1) and small scale (such as 0.01), and using Gibbs sampling to do the computation. We show that it can distinctively separate a couple of useful features from a large number of useless ones, and discriminate many redundant correlated features. We also show that this method is very stable to the choice of scale. We apply our method to a microarray data set related to prostate cancer, and identify only 3 genes out of 6033 candidates that can separate cancer and normal tissues very well in leave-one-out cross-validation.

Rao: We derive the best predictive estimator (BPE) of the fixed parameters for a linear mixed model.  This leads to a new prediction procedure called observed best prediction (OBP), which is different from the empirical best linear unbiased prediction (EBLUP).  We show that BPE is more reasonable than the traditional estimators derived from estimation considerations, such as maximum likelihood (ML) and restricted maximum likelihood (REML), if the main interest is the prediction of the mixed effect.  We show how the OBP can significantly outperform the EBLUP in terms of mean squared prediction error (MSPE) if the underlying model is misspecified.  On the other hand, when the underlying model is correctly specified, the overall predictive performance of the OBP can be very similar to the EBLUP.  The well known Fay-Herriot small area model is used as an illustration of the methodology.  In addition, simulations and analysis of a data set on graft failure rates from kidney transplant operations will be used to show empirical performance. This is joint work with Jiming Jiang of UC-Davis and Thuan Nguyen of Oregon Health and Science University.

Speaker:

Longhai Li is an Assistant Professor of Statistics at the University of Saskatchewan

Sunil Rao is a Professor and Director of the Division of Biostatistics at the Department of Epidemiology and Public Health, University of Miami

Schedule:

Longhai Li: 14:00 -- 15:00

Coffee Break 15:00 -- 15:30

Sunil Rao: 15:30 -- 16:30

Go to top

Fall Term 2012

Date Event Speaker(s) Title Time Location
September 21, 2012
CRM-ISM-GERAD Colloque de statistique
Fang Yao Regularized semiparametric functional linear regression

14:30-15:30

McGill, Burnside Hall 1214
Abstract:

In many scientific experiments we need to face analysis with functional data, where the observations are sampled from random process, together with a potentially large number of non-functional covariates. The complex nature of functional data makes it difficult to directly apply existing methods to model selection and estimation. We propose and study a new class of penalized semiparametric functional linear regression to characterize the regression relation between a scalar response and multiple covariates, including both functional covariates and scalar covariates. The resulting method provides a unified and flexible framework to jointly model functional and non-functional predictors, identify important covariates, and improve efficiency and interpretability of the estimates. Featured with two types of regularization: the shrinkage on the effects of scalar covariates and the truncation on principal components of the functional predictor, the new approach is flexible and effective in dimension reduction. One key contribution of this paper is to study theoretical properties of the regularized semiparametric functional linear model. We establish oracle and consistency properties under mild conditions by allowing possibly diverging number of scalar covariates and simultaneously taking the infinite-dimensional functional predictor into account. We illustrate the new estimator with extensive simulation studies, and then apply it to an image data analysis.

Speaker: 

Fang Yao (http://www.utstat.utoronto.ca/fyao/) is Associate Professor, Department of Statistics, University of Toronto. His research interests include functional and longitudinal data analysis, nonparametric regression and smoothing methods, statistical modeling of high-dimensional and complex data, with applications involving functional objects (evolutional biology, human genetics,  finance and e-commerce, chemical engineering).

September 28, 2012

McGill Statistics Seminar
Erica Moodie The current state of Q-learning for personalized medicine

14:30-15:30

BURN 1205
Abstract:

In this talk, I will provide an introduction to DTRs and an overview the state of the art (and science) of Q-learning, a popular tool in reinforcement learning. The use of Q-learning and its variance in randomized and non-randomized studies will be discussed, as well as issues concerning inference as the resulting estimators are not always regular. Current and future directions of interest will also be considered.

Speaker: Erica Moodie is an Associate Professor in the Department of Epidemiology, Biostatistics and Occupational Health at McGill.

October 5, 2012

McGill Statistics Seminar
Jacob Stöber Markov switching regular vine copulas

14:30-15:30

BURN 1205
Abstract:

Using only bivariate copulas as building blocks, regular vines(R-vines) constitute a flexible class of high-dimensional dependence models. In this talk we introduce a Markov switching R-vine copula model, combining the flexibility of general R-vine copulas with the possibility for dependence structures to change over time. Frequentist as well as Bayesian parameter estimation is discussed. Further, we apply the newly proposed model to examine the dependence of exchange rates as well as stock and stock index returns. We show that changes in dependence are usually closely interrelated with periods of market stress. In such times the Value at Risk of an asset portfolio is significantly underestimated when changes in the dependence structure are ignored.

Speaker: Jacob Stöber is a PhD candidate at the Technische Universität München. He is currently visiting Duke University.

October 12, 2012

McGill Statistics Seminar
Elena Rivera Mancia Modeling operational risk using a Bayesian approach to EVT

14:30-15:30

BURN 1205
Abstract:

 Extreme Value Theory has been widely used for assessing risk for highly unusual events, either by using block maxima or peaks over the threshold (POT) methods. However, one of the main drawbacks of the POT method is the choice of a threshold, which plays an important role in the estimation since the parameter estimates strongly depend on this value. Bayesian inference is an alternative to handle these difficulties; the threshold can be treated as another parameter in the estimation, avoiding the classical empirical approach. In addition, it is possible to incorporate internal and external observations in combination with expert opinion, providing a natural, probabilistic framework in which to evaluate risk models. In this talk, we analyze operational risk data using a mixture model which combines a parametric form for the center and a GPD for the tail of the distribution, using all observations for inference about the unknown parameters from both distributions, the threshold included. A Bayesian analysis is performed and inference is carried out through Markov Chain Monte Carlo (MCMC) methods in order to determine the minimum capital requirement for operational risk.

Speaker: Elena Rivera Mancia is a PhD candidate in our department. Her main supervisor is David A. Stephens, her co-supervisor is Johanna Nešlehová.
October 19, 2012
CRM-ISM-GERAD Colloque de statistique
David Madigan

Observational studies in healthcare: are they any good?

14:30-15:30

Université de Montréal
Abstract:

Observational healthcare data, such as administrative claims and electronic health records, play an increasingly prominent role in healthcare.  Pharmacoepidemiologic studies in particular routinely estimate temporal associations between medical product exposure and subsequent health outcomes of interest, and such studies influence prescribing patterns and healthcare policy more generally.  Some authors have questioned the reliability and accuracy of such studies, but few previous efforts have attempted to measure their performance.

The Observational Medical Outcomes Partnership (OMOP, http://omop.fnih.org) has conducted a series of experiments to empirically measure the performance of various observational study designs with regard to predictive accuracy for discriminating between true drug effects and negative controls.  In this talk, I describe the past work of the Partnership, explore opportunities to expand the use of observational data to further our understanding of medical products, and highlight areas for future research and development.

(on behalf of the OMOP investigators)

Speaker: David Madigan (http://www.stat.columbia.edu/~madigan/)  is Professor and Chair, Department of Statistics, Columbia University, New York. An ASA (1999) and IMS (2006) Fellow, he is a recognized authority in data mining; he has just been appointed as Editor for the ASA's journal "Statistical Analysis and Data Mining". He recently served as Editor-in-chief of "Statistical Science".

October 26, 2012

McGill Statistics Seminar
Derek Bingham Simulation model calibration and prediction using outputs from multi-fidelity simulators

14:30-15:30

BURN 1205
Abstract:

Computer simulators are used widely to describe physical processes in lieu of physical observations. In some cases, more than one computer code can be used to explore the same physical system - each with different degrees of fidelity. In this work, we combine field observations and model runs from deterministic multi-fidelity computer simulators to build a predictive model for the real process. The resulting model can be used to perform sensitivity analysis for the system and make predictions with associated measures of uncertainty. Our approach is Bayesian and will be illustrated through a simple example, as well as a real application in predictive science at the Center for Radiative Shock Hydrodynamics at the University of Michigan.

Speaker: Derek Bingham is an Associate Professor in the Department of Statistics and Actuarial Science at Simon Fraser University. He holds a Canada Research Chair in Industrial Statistics.

November 2, 2012

McGill Statistics Seminar
Anne-Laure Fougères Multivariate extremal dependence: Estimation with bias correction

14:30-15:30

BURN 1205
Abstract:

Estimating extreme risks in a multivariate framework is highly connected with the estimation of the extremal dependence structure. This structure can be described via the stable tail dependence function L, for which several estimators have been introduced. Asymptotic normality is available for empirical estimates of L, with rate of convergence k^1/2, where k denotes the number of high order statistics used in the estimation. Choosing a higher k might be interesting for an improved accuracy of the estimation, but may lead to an increased asymptotic bias. We provide a bias correction procedure for the estimation of L. Combining estimators of L is done in such a way that the asymptotic bias term disappears. The new estimator of L is shown to allow more flexibility in the choice of k. Its asymptotic behavior is examined, and a simulation study is provided to assess its small sample behavior. This is a joint work with Cécile Mercadier (Université Lyon 1) and Laurens de Haan (Erasmus University Rotterdam).

Speaker:  Anne-Laure Fougères is Professor of Statistics at Université Claude-Bernard, in Lyon, France.

November 9, 2012

McGill Statistics Seminar
Sidney Resnick The multidimensional edge: Seeking hidden risks

14:30-15:30

BURN 1205
Abstract:

Assessing tail risks using the asymptotic models provided by multivariate extreme value theory has the danger that when asymptotic independence is present (as with the Gaussian copula model), the
asymptotic model provides estimates of probabilities of joint tail regions that are zero. In diverse applications such as finance, telecommunications, insurance and environmental science, it may be difficult to believe in the absence of risk contagion. This problem can be partly ameliorated by using hidden regular variation which assumes a lower order asymptotic behavior on a subcone of the state space and this theory can be made more flexible by extensions in the following directions: (i) higher dimensions than two; (ii) where the lower order variation on a subcone is of extreme value type different from regular variation; and (iii) where the concept is extended to searching for lower order behavior on the complement of the support of the limit measure of regular variation. We discuss some challenges and potential applications to this ongoing effort.

Speaker: Sidney Resnick is the Lee Teng Hui Professor in Engineering at the School of Operations Research and Information Engineering, Cornell University. He is the author of several well-known textbooks in probability and extreme-value theory.

November 16, 2012

McGill Statistics Seminar
Taoufik Bouezmarni Copula-based regression estimation and Inference

14:30-15:30

BURN 1205
Abstract:

In this paper we investigate a new approach of estimating a regression function based on copulas. The main idea behind this approach is to write the regression function in terms of a copula and marginal distributions. Once the copula and the marginal distributions are estimated we use the plug-in method to construct the new estimator. Because various methods are available in the literature for estimating both a copula and a distribution, this idea provides a rich and flexible alternative to many existing regression estimators. We provide some asymptotic results related to this copula-based regression modeling when the copula is estimated via profile likelihood and the marginals are estimated nonparametrically. We also study the finite sample performance of the estimator and illustrate its usefulness by analyzing data from air pollution studies.
 
Joint work with H.  Noh and A. El Ghouch from Université catholique de Louvain.

Speaker: Taoufik Bouezmarni is an Assistant Professor of Statistics at the Université de Sherbrooke.
November 23, 2012
CRM-ISM-GERAD Colloque de statistique
Peter Mueller

A nonparametric Bayesian model for local clustering

14:30-15:30

McGill, Burnside Hall 107

Abstract:

We propose a nonparametric Bayesian local clustering (NoB-LoC) approach for heterogeneous data.  Using genomics data as an example, the NoB-LoC clusters genes into gene sets and simultaneously creates multiple partitions of samples, one for each gene set. In other words, the sample partitions are nested within the gene sets.  Inference is guided by a joint probability model on all random elements. Biologically, the model formalizes the notion that biological samples cluster differently with respect to different genetic processes, and that each process is related to only a small subset of genes. These local features are importantly different from global clustering approaches such as hierarchical clustering, which create one partition of samples that applies for all genes in the data set. Furthermore, the NoB-LoC includes a special cluster of genes that do not give rise to any meaningful partition of samples. These genes could be irrelevant to the disease conditions under investigation. Similarly, for a given gene set, the NoB-LoC includes a subset of samples that do not co-cluster with other samples. The samples in this special cluster could, for example, be those whose disease subtype is not characterized by the particular gene set.

This is joint work with Juhee Lee and Yuan Ji.

Speaker: Peter Mueller (http://www.math.utexas.edu/users/pmueller/) is Professor, Department of Mathematics, University of Texas at Austin.  His research interests include theory and applications of Bayesian nonparametric inference, with applications in genomics, medicine and health sciences.

November 30, 2012

McGill Statistics Seminar
Anne-Sophie Charest Sharing confidential datasets using differential privacy

14:30-15:30

BURN 1205
Abstract:

While statistical agencies would like to share their data with researchers, they must also protect the confidentiality of the data provided by their respondents. To satisfy these two conflicting objectives, agencies use various techniques to restrict and modify the data before publication. Most of these techniques however share a common flaw: their confidentiality protection can not be rigorously measured. In this talk, I will present the criterion of differential privacy, a rigorous measure of the protection offered by such methods. Designed to guarantee confidentiality even in a worst-case scenario, differential privacy protects the information of any individual in the database against an adversary with complete knowledge of the rest of the dataset. I will first give a brief overview of recent and current research on the topic of differential privacy. I will then focus on the publication of differentially-private synthetic contingency tables and present some of my results on the methods for the generation
and proper analysis of such datasets.

Speaker: Anne-Sophie Charest is a newly hired Assistant Professor of Statistics at Université Laval, Québec. A McGill graduate, she recently completed her PhD at Carnegie Mellon University, Pittsburgh.

December 7, 2012

McGill Statistics Seminar
Pierre Lafaye de Micheaux
Sample size and power determination for multiple comparison procedures aiming at rejecting at least r among m false hypotheses

14:30-15:30

BURN 1205
Abstract:

Multiple testing problems arise in a variety of situations, notably in clinical trials with multiple endpoints. In such cases, it is often of interest to reject either all hypotheses or at least one of them. More generally, the question arises as to whether one can reject at least r out of m hypotheses. Statistical tools addressing this issue are rare in the literature. In this talk, I will recall well-known hypothesis testing concepts, both in a single- and in a multiple-hypothesis context. I will then present general power formulas for three important multiple comparison procedures: the Bonferroni and Hochberg procedures, as well as Holm’s sequential procedure. Next, I will describe an R package that we developed for sample size calculations in multiple endpoints trials where it is desired to reject at least r out of m hypotheses. This package covers the case where all the variables are continuous and four common variance-covariance patterns. I will show how to use this package to compute the sample size needed in a real-life application.

Speaker: Pierre Lafaye de Micheaux is an Associate Professor of Statistics at the Université de Montréal.
December 14, 2012
CRM-ISM-GERAD Colloque de statistique
Raymond J. Carroll


What percentage of children in the U.S. are eating a healthy diet? A statistical approach

14:30-15:30

Concordia,  Room LB 921-04
Abstract:

In the United States the preferred method of obtaining dietary intake data is the 24-hour dietary recall, yet the measure of most interest is usual or long-term average daily intake, which is impossible to measure. Thus, usual dietary intake is assessed with considerable measurement error. Also, diet represents numerous foods, nutrients and other components, each of which have distinctive attributes. Sometimes, it is useful to examine intake of these components separately, but increasingly nutritionists are interested in exploring them collectively to capture overall dietary patterns and their effect on various diseases. Consumption of these components varies widely: some are consumed daily by almost everyone on every day, while others are episodically consumed so that 24-hour recall data are zero-inflated. In addition, they are often correlated with each other. Finally, it is often preferable to analyze the amount of a dietary component relative to the amount of energy (calories) in a diet because dietary recommendations often vary with energy level.

We propose the first model appropriate for this type of data, and give the first workable solution to fit such a model. After describing the model, we use survey-weighted MCMC computations to fit the model, with uncertainty estimation coming from balanced repeated replication. The methodology is illustrated through an application to estimating the population distribution of the Healthy Eating Index-2005 (HEI-2005), a multi-component dietary quality index involving ratios of interrelated dietary components to energy, among children aged 2-8 in the United States. We pose a number of interesting questions about the HEI-2005, and show that it is a powerful predictor of the risk of developing colorectal cancer.

Speaker: Raymond J. Carroll is a professor of statistics at the Texas A&M University.

 

Go to top

Winter Term 2013

Date Event Speaker(s) Title Time Location

January 11, 2013

McGill Statistics Seminar
Ana Best Risk-set sampling, left truncation, and Bayesian methods in survival analysis

14:30-15:30

BURN 1205
Abstract:

Statisticians are often faced with budget concerns when conducting studies. The collection of some covariates, such as genetic data, is very expensive. Other covariates, such as detailed histories, might be difficult or time-consuming to measure. This helped bring about the invention of the nested case-control study, and its more generalized version, risk-set sampled survival analysis. The literature has a good discussion of the properties of risk-set sampling in standard right-censored survival data. My interest is in extending the methods of risk-set sampling to left-truncated survival data, which arise in prevalent longitudinal studies. Since prevalent studies are easier and cheaper to conduct than incident studies, this extension is extremely practical and relevant. I will introduce the partial likelihood in this scenario, and briefly discuss the asymptotic properties of my estimator. I will also introduce Bayesian methods for standard survival analysis, and discuss methods for analyzing risk-set-sampled survival data using Bayesian methods.

Speaker: Ana Best is a PhD candidate in our department. She works with David Wolfson.
January 18, 2013
CRM-ISM-GERAD Colloque de statistique
Victor Chernozhukov

Inference on treatment effects after selection amongst high-dimensional controls

14:30-15:30

McGill, Burnside Hall, Room 306.
Abstract:

We propose robust methods for inference on the effect of a treatment variable on a scalar outcome in the presence of very many controls. Our setting is a partially linear model with possibly non-Gaussian and heteroscedastic disturbances. Our analysis allows the number of controls to be much larger than the sample size. To make informative inference feasible, we require the model to be approximately sparse; that is, we require that the effect of confounding factors can be controlled for up to a small approximation error by conditioning on a relatively small number of controls whose identities are unknown. The latter condition makes it possible to estimate the treatment effect by selecting approximately the right set of controls. We develop a novel estimation and uniformly valid inference method for the treatment effect in this setting, called the "post-double-selection" method. Our results apply to Lasso-type methods used for covariate selection as well as to any other model selection method that is able to find a sparse model with good approximation properties.

The main attractive feature of our method is that it allows for imperfect selection of the controls and provides confidence intervals that are valid uniformly across a large class of models. In contrast, standard post-model selection estimators fail to provide uniform inference even in simple cases with a small, fixed number of controls. Thus our method resolves the problem of uniform inference after model selection for a large, interesting class of models. We illustrate the use of the developed methods with numerical simulations and an application to the effect of abortion on crime rates.
 
This is joint work with Alexandre Belloni, and Christian Hansen.

Speaker: Victor Chernozhukov is a Professor in the Department of Economics at the Massachusetts Institute of Technology (http://www.mit.edu/~vchern/).

January 25, 2013

McGill Statistics Seminar
Mylène Bédard On the empirical efficiency of local MCMC algorithms with pools of proposals

14:30-15:30

BURN 1205
Abstract:

 

In an attempt to improve on the Metropolis algorithm, various MCMC methods with auxiliary variables, such as the multiple-try and delayed rejection Metropolis algorithms, have been proposed. These methods generate several candidates in a single iteration; accordingly they are computationally more intensive than the Metropolis algorithm. It is usually difficult to provide a general estimate for the computational cost of a method without being overly conservative; potentially efficient methods could thus be overlooked by relying on such estimates. In this talk, we describe three algorithms with auxiliary variables - the multiple-try Metropolis (MTM) algorithm, the multiple-try Metropolis hit-and-run (MTM-HR) algorithm, and the delayed rejection Metropolis algorithm with antithetic proposals (DR-A) - and investigate the net performance of these algorithms in various contexts. To allow for a fair comparison, the study is carried under optimal mixing conditions for each of these algorithms. The DR-A algorithm, whose proposal scheme introduces correlation in the pool of candidates, seems particularly promising. The algorithms are used in the contexts of Bayesian logistic regressions and classical inference for a linear regression model. This talk is based on work in collaboration with M. Mireuta, E. Moulines, and R. Douc.

Speaker: Mylène Bédard is an Associate Professor of Statistics at the Université de Montréal.

February 1, 2013

McGill Statistics Seminar
Daniela Witten Structured learning of multiple Gaussian graphical models

14:30-15:30

BURN 1205
Abstract:

I will consider the task of estimating high-dimensional Gaussian graphical models (or networks) corresponding to a single set of features under several distinct conditions. In other words, I wish to estimate several distinct but related networks. I assume that most aspects of the networks are shared, but that there are some structured differences between them. The goal is to exploit the similarity among the networks in order to obtain more accurate estimates of each individual network, as well as to identify the differences between the networks.

To begin, I will assume that network differences arise from edge perturbations. In this case, estimating the networks by maximizing the log likelihood subject to fused lasso or group lasso penalties on the differences between the precision matrices can lead to very good results. Next, I will discuss a more structured type of network difference that arises from node (rather than edge) perturbations. In order to estimate networks in this setting, I will present the "row-column overlap norm penalty", a type of overlapping group lasso penalty.

Finally, I will present an application of these network estimation techniques to a gene expression data set, in which the goal is to identify genes whose regulatory patterns are perturbed across various subtypes of brain cancer.

This is joint work with Pei Wang, Su-In Lee, Maryam Fazel, and others.

Speaker: Daniela Witten is an Assistant Professor of Biostatistics at the University of Washington.

February 8, 2013

McGill Statistics Seminar
Celia Greenwood Multiple testing and region-based tests of rare genetic variation

14:30-15:30

BURN 1205
Abstract:

In the context of univariate association tests between a trait of interest and common genetic variants (SNPs) across the whole genome, corrections for multiple testing have been well-studied. Due to the patterns of correlation (i.e. linkage disequilibrium), the number of independent tests remains close to 1 million, even when many more common genetic markers are available. With the advent of the DNA sequencing era, however, newly-identified genetic variants tend to be rare or even unique, and consequently single-variant tests of association have little power. As a result, region-based tests of association are being developed that examine associations between the trait and all the genetic variability in a small pre-defined region of the genome. However, coping with multiple testing in this situation has had little attention. I will discuss two aspects of multiple testing for region-based tests. First, I will describe a method for estimating the effective number of independent tests, and second, I will discuss an approach for controlling type I error that is based stratified false discovery rates, where strata are defined by external information such as genomic annotation.

Speaker: Celia Greenwood is an Associate Professor at the Department of Oncology at the McGill Faculty of Medicine

February 15, 2013

McGill Statistics Seminar
Eric Cormier Data Driven Nonparametric Inference for Bivariate Extreme-Value Copulas

14:30-15:30

BURN 1205
Abstract:

It is often crucial to know whether the dependence structure of a bivariate distribution belongs to the class of extreme-­‐value copulas. In this talk, I will describe a graphical tool that allows judgment regarding the existence of extreme-­‐value dependence. I will also present a data-­‐ driven nonparametric estimator of the Pickands dependence function. This estimator, which is constructed from constrained b-­‐splines, is intrinsic and differentiable, thereby enabling sampling from the fitted model. I will illustrate its properties via simulation. This will lead me to highlight some of the limitations associated with currently available tests of extremeness.

Speaker: Eric Cormier is a PhD candidate in our department. He works with Christian Genest and Johanna Neslehova
February 22, 2013
CRM-ISM-GERAD Colloque de statistique
CRM-SSC Prize 2012 Colloque
Changbao Wu

Analysis of complex survey data with missing observations

14:30-15:30

CRM, Université de Montréal, Pav. André-Ainsenstadt, salle 1360
Abstract:

In this talk, we first provide an overview of issues arising from and methods dealing with complex survey data in the presence of missing observations, with a major focus on the estimating equation approach for analysis and imputation methods for missing data. We then propose a semiparametric fractional imputation method for handling item nonresponses, assuming certain baseline auxiliary variables can be observed for all units in the sample. The proposed strategy combines the strengths of conventional single imputation and multiple imputation methods, and is easy to implement even with a large number of auxiliary variables available, which is typically the case for large scale complex surveys. Simulation results and some general discussion on related issues will also be presented.

 

This talk is based partially on joint work with Jiahua Chen of University of British Columbia and Jaekwang Kim of Iowa State University.

Speaker: Changbao Wu, University of Waterloo

March 1, 2013

McGill Statistics Seminar
Natalia Stepanova On asymptotic efficiency of some nonparametric tests for testing multivariate independence

14:30-15:30

BURN 1205
Abstract:

Some problems of statistics can be reduced to extremal problems of minimizing functionals of smooth functions defined on the cube $[0,1]^m$, $m\geq 2$. In this talk, we consider a class of  extremal problems that is closely connected to the problem of testing multivariate independence. By solving the extremal problem, we provide a unified approach to establishing weak convergence for a wide class
of empirical processes which emerge in connection with testing multivariate independence. The use of our result will be also illustrated by describing the domain of local asymptotic optimality of some nonparametric tests of independence.

This is a joint work with Alexander Nazarov (St. Petersburg State University, Russia)

Speaker: Natalia Stepanova is an Associate Professor in the School of Mathematics and Statistics at Carleton University.

March 15, 2013

McGill Statistics Seminar
Jiahua Chen Quantile and quantile function estimations under density ratio model

14:30-15:30

BURN 1205
Abstract:

Join work with Yukun Liu (East China Normal University)

Population quantiles and their functions are important parameters in many applications. For example, the lower level quantiles often serve as crucial quality indices of forestry products and others. In the presence of several independent samples from populations satisfying density ratio model, we investigate the properties of the empirical likelihood (EL) based inferences of quantiles and their functions. In this paper, we first establish the consistency and asymptotic normality of the estimators of parameters and cumulative distributions. The induced EL quantile estimators are then shown to admit Bahadur representation. The results are used to construct asymptotically valid confidence intervals
for functions of quantiles. In addition, we rigorously prove that the EL quantiles based on all samples are more efficient than the empirical quantiles which can only utilize information from individual samples. Simulation study shows that the EL quantiles and their functions have superior performances both when the density ratio model assumption is satisfied and mildly violated. An application example is used to demonstrate the new methods and potential cost savings.

Speaker: Jiahua Chen is a Professor of statistics and Canada Research Chair at the University of British Columbia.
March 22, 2013
CRM-ISM-GERAD Colloque de statistique
Hélène Massam

The hyper Dirichlet revisited: a characterization

14:30-15:30

McGill, Burnside Hall 107
Abstract:

We give a characterization of the hyper Dirichlet distribution hyper Markov with respect to a decomposable graph $G$ (or equivalently a moral directed acyclic graph). For $X=(X_1,\ldots,X_d)$ following the hyper Dirichlet distribution, our characterization is through the so-called "local and global independence properties" for a carefully designed family of orders of the variables $X_1,\ldots,X_d$.

The hyper Dirichlet for general directed acyclic graphs was derived from a characterization of the Dirichlet distribution given by Geiger and Heckerman (1997). This characterization of the Dirichlet for $X=(X_1,\ldots,X_d)$ is obtained through a functional equation derived from the local and global independence properties for two different orders of the variables. These two orders are seemingly chosen haphazardly but, as our results show, this is not so. Our results generalize those of Geiger and Heckerman (1997) and are given without the assumption of existence of a positive density for $X$.

Speaker: Hélène Massam, York University

April 5, 2013

McGill Statistics Seminar
Éric Marchand On improved predictive density estimation with parametric constraints

14:30-15:30

BURN 1205
Abstract:

We consider the problem of predictive density estimation under Kullback-Leibler loss when the parameter space is restricted to a convex subset.   The principal situation analyzed relates to the estimation of an unknown predictive p-variate normal density based on an observation generated by another p-variate normal density.  The means of the densities are assumed to coincide, the covariance matrices are a known multiple of the identity matrix.   We obtain sharp results concerning plug-in estimators, we show that the best unrestricted invariant predictive density estimator is dominated by the Bayes estimator associated with a uniform prior on the restricted parameter space, and we obtain minimax results for cases where the parameter space is (i) a cone, and (ii) a ball.  A key feature, which we will describe, is a correspondence between the predictive density estimation problem with a collection of point estimation problems. Finally, if time permits, we describe recent work concerning : (i) non-normal models, and (ii) analysis relative to other loss functions such as reverse Kullback-Leibler and integrated L2 .

References.

1)    Dominique Fourdrinier, Éric Marchand, Ali Righi, William E. Strawderman. On improved predictive density estimation with parametric constraints,  Electronic Journal of Statistics 2011, Vol. 5, 172-191.
2)    Tatsuya Kubokawa, Éric Marchand, William E. Strawderman, Jean-Philippe Turcotte. Minimaxity in predictive density estimation with parametric constraints.  Journal of Multivariate Analysis, 2013, Vol. 116, 382-397.

Speaker: Éric Marchand is a Professor of Statistics at the Université de Sherbrooke.
April 12, 2013
CRM-ISM-GERAD Colloque de statistique
Arup Bose

Consistency of large dimensional sample covariance matrix under weak dependence

14:30-15:30

Concordia
Abstract:

Estimation of large dimensional covariance matrix has been of interest recently. One model assumes that there are  $p$ dimensional independent identically distributed Gaussian observations $X_1, \ldots , X_n$ with dispersion matrix $\Sigma_p$ and $p$ grows much faster than $n$. Appropriate convergence rate results have been established in the literature for tapered and banded estimators of $\Sigma_p$ which are based on the sample variance covariance matrix of $n$ observations.

However, the assumption of independence has been questioned in applications.  As a first step towards general results for the dependent case, we introduce and investigate one class of dependent models. Our model can accommodate suitable patterned  cross covariance matrices. These estimators remain consistent in operator norm with appropriate rates of convergence.

A related problem is the estimation of the matrix parameters of a stationary vector ARMA time series model with increasing dimension where we have one realisation of the process. We shall exhibit some preliminary results in this area.

This work is joint with Ms. Monika Bhattacharjee.

Speaker:
    

 Arup Bose (http://www.isical.ac.in/~abose/) is Professor of Theoretical Statistics and Mathematics, Indian Statistical Institute, Kolkata.

Go to top

Fall Term 2013

 
Date Event Speaker(s) Title Time Location
September 13, 2013
McGill Statistics Seminar
Theodoros Nicoleris Bayesian nonparametric density estimation under length bias sampling

15:30-16:30

BURN 1205
Abstract:

A new density estimation method in a Bayesian nonparametric framework is presented when recorded data are not coming directly from the distribution of interest, but from a length biased version. From a Bayesian perspective, efforts to computationally evaluate posterior quantities conditionally on length biased data were hindered by the inability to circumvent the problem of a normalizing constant. In this talk a novel Bayesian nonparametric approach to the length bias sampling problem is presented which circumvents the issue of the normalizing constant. Numerical illustrations as well as a real data example are presented and the estimator is compared against its frequentist counterpart, the kernel density estimator for indirect data." This is joint work with: a) Spyridon J. Hatjispyros, University of the Aegean, Greece. b)Stephen G. Walker, University of Texas at Austin, U.S.A.

Speaker: 

Theodoros Nicoleris is a Professor in the Department of Economics at the National and Kapodistrian University of Athens, Greece.

September 20, 2013
McGill Statistics Seminar
Orla A. Murphy Tests of independence for sparse contingency tables and beyond

15:30-16:30

BURN 1205
Abstract:

In this talk, a new and consistent statistic is proposed to test whether two discrete random variables are independent. The test is based on a statistic of the Cramér–von Mises type  constructed from the so-called empirical checkerboard copula. The test can be used even for sparse contingency tables or tables whose dimension changes with the sample size. Because the limiting distribution of the test statistic is not tractable, a valid bootstrap procedure for the computation of p-values will be discussed. The new statistic is compared by a power study to standard procedures for testing independence, such as the Pearson’s Chi-Squared, the Likelihood Ratio, and the Zelterman statistics. The new test turns out to be considerably more powerful than all its competitors in all scenarios considered.

Speaker:

Orla A. Murphy is a PhD student in the Department of Mathematics and Statistics at McGill University, Montréal. She works with Christian Genest and Johanna G. Nešlehová.

September 27, 2013
CRM-ISM-GERAD Colloque de statistique
Len Stefanski Measurement error and variable selection in parametric and nonparametric models

15:30-16:30

McGill RPHYS 114
Abstract:

This talk will start with a discussion of the relationships between LASSO estimation, ridge regression, and attenuation due to measurement error as motivation for, and introduction to, a new generalizable approach to variable selection in parametric and nonparametric regression and discriminant analysis. The approach transcends the boundaries of parametric/nonparametric models. It will first be described in the familiar context of linear regression where its relationship to the LASSO will be described in detail. The latter part of the talk will focus on implementation of the approach to nonparametric modeling where sparse dependence on covariates is desired. Applications to two- and multi-category classification problems will be discussed in detail.

Speaker:

Len Stefanski is a Professor of Statistics at North Carolina State University, Raleigh, NC.

October 4, 2013
McGill Statistics Seminar
Farhad Shokoohi Some recent developments in likelihood-based small area estimation

15:30-16:30

BURN 1205
Abstract:

Mixed models are commonly used for the analysis data in small area estimation. In particular, small area estimation has been extensively studied under linear mixed models. However, in practice there are many situations that we have counts or proportions in small area estimation; for example a (monthly) dataset on the number of incidences in small areas. Recently, small area estimation under the linear mixed model with penalized spline model, for xed part of the model, was studied. In this talk, small area estimation under generalized linear mixed models by combining time series and cross-sectional data with the extension of these models to include penalized spline regression models are proposed. A likelihood-based approach is used to predict small area parameters and also to provide prediction intervals. The performance of the proposed models and approach is evaluated through simulation studies and also by real datasets.

Speaker: 

Farhad Shokoohi is a Postdoctoral Fellow in our department. He works with Masoud Asgharian and Abbas Khalili.

October 11, 2013
McGill Statistics Seminar
Hela Romdhani An exchangeable Kendall's tau for clustered data CANCELLED CANCELLED
Abstract:

I'll introduce the exchangeable Kendall's tau as a non parametric intra class association measure in a clustered data frame and provide an estimator for this measure. The asymptotic properties of this estimator are investigated under a multivariate exchangeable copula model. Two applications of the proposed statistic are considered. The first is an estimator of the intra class correlation coefficient for  data drawn from an elliptical distribution. The second is a semi-parametric intra class independence test based on the exchangeable Kendall's tau.

Speaker: 

Hela Romdhani is a Postdoctoral Fellow in the Department of Biostatistics, Epidemiology and Occupational Health at McGill.

October 18, 2013
McGill Statistics Seminar
Shili Lin Whole genome 3D architecture of chromatin and regulation

15:30-16:30

BURN 1205
Abstract:

The expression of a gene is usually controlled by the regulatory elements in its promoter region. However, it has long been hypothesized that, in complex genomes, such as the human genome, a gene may be controlled by distant enhancers and repressors. A recent molecular technique, 3C (chromosome conformation capture), that uses formaldehyde cross-linking and locus-specific PCR, was able to detect physical contacts between distant genomic loci. Such communication is achieved through spatial organization (looping) of chromosomes to bring genes and their
regulatory elements into close proximity. Several adaptations of the 3C assay to study genomewide spatial interactions, including Hi-C and ChIA-PET, have been developed. The availability of such data makes it possible to reconstruct the underlying three-dimensional spatial chromatin structure. In this talk, I will first describe a Bayesian statistical model for building spatial estrogen receptor regulation focusing on reducing false positive interactions. A random effect model, PRAM, will then be presented to make inference on the locations of genomic loci in a 3D Euclidean space. Results from ChIA-PET and Hi-C data will be visualized to illustrate the regulation and spatial proximity of genomic loci that are far apart in their linear chromosomal locations.

Speaker: 

Shili Lin is a Professor of Statistics at Ohio State University, Columbus, OH.

October 25, 2013
CRM-ISM-GERAD Colloque de statistique
Luke Bornn XY - Basketball meets Big Data

15:30-16:30

HEC Montréal
Salle CIBC 1er étage

Abstract:

In this talk, I will explore the state of the art in the analysis and modeling of player tracking data in the NBA.  In the past, player tracking data has been used primarily for visualization, such as understanding the spatial distribution of a player’s shooting characteristics, or to extract summary statistics, such as the distance traveled by a player in a given game.  In this talk, I will present how we're using advanced statistics and machine learning tools to answer previously unanswerable questions about the NBA.  Examples include “How should teams configure their defensive matchups to minimize a player’s effectiveness?”, “Who are the best decision makers in the NBA?”, and “Who was responsible for the most points against in the NBA last season?”

Speaker:

Luke Bronn is an Assistant Professor of Statistics at Harvard University, Boston, MA. He is also the winner of the 2012 Pierre Robillard Award for the best PhD thesis awarded in a Canadian university in a given year.

November 1, 2013
McGill Statistics Seminar
Radu Craiu Bayesian latent variable modelling of longitudinal family data for genetic pleiotropy studies

15:30-16:30

BURN 1205
Abstract:

Motivated by genetic association studies of pleiotropy, we propose  a Bayesian latent variable approach to jointly study multiple outcomes or phenotypes. The proposed method models both continuous and binary phenotypes, and it accounts for serial and familial correlations when longitudinal and pedigree data have been collected. We present a Bayesian estimation method for the model parameters and we discuss some of the model misspecification effects.  Central to the analysis is a novel MCMC algorithm that builds upon hierarchical centering and  parameter expansion techniques to efficiently sample the posterior distribution. We discuss phenotype and model selection, and we study the performance of two selection strategies based on Bayes factors and spike-and-slab priors.

Speaker: 

Radu Craiu is a Professor of Statistics at the University of Toronto, Toronto, ON.

November 8, 2013
McGill Statistics Seminar
Daphna Harel The inadequacy of the summed score (and how you can fix it!)

15:30-16:30

BURN 1205
Abstract:

Health researchers often use patient and physician questionnaires to assess certain aspects of health status. Item Response Theory (IRT) provides a set of tools for examining the properties of the instrument and for estimation of the latent trait for each individual. In my research, I critically examine the usefulness of the summed score over items and an alternative weighted summed score (using weights computed from the IRT model) as an alternative to both the empirical Bayes estimator and maximum likelihood estimator for the Generalized Partial Credit Model. First, I will talk about two useful theoretical properties of the weighted summed score that I have proven as part of my work. Then I will relate the weighted summed score to other commonly used estimators of the latent trait. I will demonstrate the importance of these results in the context of both simulated and real data on the Center for Epidemiological Studies Depression Scale.

Speaker: 

Daphna Harel is a PhD candidate in our department. She works with Russ Steele.

November 15, 2013
McGill Statistics Seminar
Syed Ejaz Ahmed Submodel selection and post estimation: Making sense or folly

15:30-16:30

BURN 1205
Abstract:

In this talk, we consider estimation in generalized linear models when there are many potential predictors and some of them may not have influence on the response of interest. In the context of two competing models where one model includes all predictors and the other restricts variable coefficients to a candidate linear subspace based on subject matter or prior knowledge, we investigate the relative performances of Stein type shrinkage, pretest, and penalty estimators (L1GLM, adaptive L1GLM, and SCAD) with respect to the full model estimator. The asymptotic properties of the pretest and shrinkage estimators including the derivation of asymptotic distributional biases and risks are established. A Monte Carlo simulation study show that the mean squared error (MSE) of an adaptive shrinkage estimator is comparable to the MSE of the penalty estimators in many situations and in particular performs better than the penalty estimators when the model is sparse.  A real data set analysis is also presented to compare the suggested methods.

Speaker: 

Syed Ejaz Ahmed is a Professor of Statistics and Dean of the Faculty of Mathematics and Science at Brock University, St. Catherines, ON.

November 22, 2013
McGill Statistics Seminar
Lei Hua Tail order and its applications

15:30-16:30

BURN 1205
Abstract:

Tail order is a notion for quantifying the strength of dependence in the tail of a joint distribution. It can account for a wide range of dependence, ranging from tail positive dependence to tail negative dependence. We will introduce theory and applications of tail order. Conditions for tail orders of copula families will be discussed, and they are helpful in guiding us to find suitable copula families for statistical inference. As applications of tail order, regression analysis will be demonstrated, using appropriately constructed copulas, that can capture the unique tail dependence patterns appear in a medical expenditure panel survey data.

Speaker:

Lei Hua is an Assistant Professor of Statistics at Northern Illinois University, DeKalb, IL.

November 29, 2013
CRM-ISM-GERAD Colloque de statistique
Marc Hallin Signal detection in high dimension: Testing sphericity against spiked alternatives

15:30-16:30

Concordia

MB-2.270

Abstract:

We consider the problem of testing the null hypothesis of sphericity for a high-dimensional covariance matrix against the alternative  of a finite (unspecified) number of symmetry-breaking directions (multispiked alternatives) from the point of view of the asymptotic theory of statistical experiments.  The region lying below the so-called phase transition or  impossibility threshold is shown to be a contiguity region.  Simple analytical expressions are derived for the asymptotic power envelope and the asymptotic powers of existing tests. These asymptotic powers are shown to lie very substantially below the power envelope; some of them even  trivially coincide with the size of the test. In contrast, the asymptotic power of the likelihood ratio test is shown to be uniformly close to the same.

Speaker: 

Marc Hallin is a Professor of Statistics at the European Center for Advanced Research in Economics and Statistics, Université Libre de Bruxelles, Belgium. He is currently visiting the ORFE Department at Princeton.

December 6, 2013
CRM-ISM-GERAD Colloque de statistique
Stephen M. Stigler Great probabilists publish posthumously

15:30-16:30

UQAM 
Salle SH-3420
Abstract:

Jacob Bernoulli died in 1705. His great book Ars Conjectandi was published in 1713, 300 years ago. Thomas Bayes died in 1761. His great paper was read to the Royal Society of London in December 1763, 250 years ago, and published in 1764. These anniversaries are noted by discussing new evidence regarding the circumstances of publication, which in turn can lead to a better understanding of the works themselves. As to whether or not these examples of posthumous publication suggest a career move for any modern probabilist; that question is left to the audience.

Speaker:

Stephen M. Stigler is Ernest DeWitt Burton Distinguished Service Professor at the University of Chicago, Chicago, IL.

Go to top

Winter Term 2014

 
Date Event Speaker(s) Title Time Location
January 10, 2014
McGill Statistics Seminar
Raluca Balan An introduction to stochastic partial differential equations and intermittency

15:30-16:30

BURN 1205
Abstract:

In a seminal article in 1944, Itô introduced the stochastic integral with respect to the Brownian motion, which turned out to be one of the most fruitful ideas in mathematics in the 20th century. This lead to the development of stochastic analysis, a field which includes the study of stochastic partial differential equations (SPDEs). One of the approaches for the study of SPDEs was initiated by Walsh (1986) and relies on the concept of random-field solution for equations perturbed by a space-time white noise (or Brownian sheet). This concept allows us to investigate the dynamical changes in the probabilistic behavior of the solution, simultaneously in time and space. These developments will be reviewed in the first part of the talk. The second part of the talk will be dedicated to some recent advances in this area, related to the existence of a random-field solution for some classical SPDEs (like the stochastic heat equation) perturbed by a ``colored'' noise, which behaves in time like the fractional Brownian motion. When this solution exists, it exhibits a strong form of ``intermittency,'' a property which was originally introduced in the physics literature for describing random fields whose values develop very large peaks. This talk is based on some recent joint work with Daniel Conus (Lehigh University).

Speaker: 

 Raluca Balan is a professor at the University of Ottawa.

January 24, 2014
CRM-ISM-GERAD Colloque de statistique
Derek Bingham Calibration of computer experiments with large data structures

15:30-16:30

Salle 1355, pavillon André-Aisenstadt (CRM)
Abstract:

Statistical model calibration of computer models is commonly done in a wide variety of scientific endeavours. In the end, this exercise amounts to solving an inverse problem and a form of regression.  Gaussian process model are very convenient in this setting as non-parametric regression estimators and provide sensible inference properties.  However, when the data structures are large, fitting the model becomes difficult.  In this work, new methodology for calibrating large computer experiments is presented. We proposed to perform the calibration exercise by modularizing a hierarchical statistical model with approximate emulation via local Gaussian processes.  The approach is motivated by an application to radiative shock hydrodynamics.

Speaker: 

 Derek Bingham is a Professor of Statistics at Simon Fraser University, Burnaby, BC. He is the winner of the 2012 CRM-SSC Award.

January 31, 2014
McGill Statistics Seminar
 Hela Romdhani An exchangeable Kendall's tau for clustered data

15:30-16:30

BURN 1205
Abstract:

I'll introduce the exchangeable Kendall's tau as a nonparametric intra class association measure in a clustered data frame and provide an estimator for this measure. The asymptotic properties of this estimator are investigated under a multivariate exchangeable cdf. Two applications of the proposed statistic are considered. The first is an estimator of the intraclass correlation coefficient for data drawn from an elliptical distribution. The second is a semi-parametric intraclass independence test based on the exchangeable Kendall's tau.

Speaker: Hela Romdhani is a Postdoctoral Fellow in the Department of Biostatistics, Epidemiology and Occupational Health at McGill.
February 7, 2014
McGill Statistics Seminar
Taki Shinohara Statistical techniques for the normalization and segmentation of structural MRI

15:30-16:30

BURN 1205
Abstract:

While computed tomography and other imaging techniques are measured in absolute units with physical meaning, magnetic resonance images are expressed in arbitrary units that are difficult to interpret and differ between study visits and subjects. Much work in the image processing literature has centered on histogram matching and other histogram mapping techniques, but little focus has been on normalizing images to have biologically interpretable units. We explore this key goal for statistical analysis and the impact of normalization on cross-sectional and longitudinal segmentation of pathology.

Speaker: 

Taki Shinohara is an Assistant Professor of Biostatistics at the University of Pennsylvania.

February 14, 2014
McGill Statistics Seminar
Anand N. Vidyashankar Divergence based inference for general estimating equations

15:30-16:30

BURN 1205
Abstract:

Hellinger distance and its variants have long been used in the theory of robust statistics to develop inferential tools that are more robust than the maximum likelihood but as ecient as the MLE when the posited model holds. A key aspect of this alternative approach requires speci cation of a parametric family, which is usually not feasible in the context of problems involving complex data structures wherein estimating equations are typically used for inference. In this presentation, we describe how to extend the scope of divergence theory for inferential problems involving estimating equations and describe useful algorithms for their computation. Additionally, we theoretically study the robustness properties of the methods and establish the semi-parametric eciency of the new divergence based estimators under suitable technical conditions. Finally, we use the proposed methods to develop robust sure screening methods for ultra high dimensional problems. Theory of large deviations, convexity theory, and concentration inequalities play an essential role in the theoretical analysis and numerical development. Applications from equine parasitology, stochastic optimization, and antimicrobial resistance will be used to describe various aspects of the proposed methods.

Speaker: 

 Anand N. Vidyashankar is an Associate Professor in the Department of Statistics at George Mason University.

February 21, 2014
McGill Statistics Seminar
Reza Ramezan On the multivariate analysis of neural spike trains: Skellam process with resetting and its applications

15:30-16:30

BURN 1205
Abstract:

Nerve cells (a.k.a. neurons) communicate via electrochemical waves (action potentials), which are usually called spikes as they are very localized in time. A sequence of consecutive spikes from one neuron is called a spike train. The exact mechanism of information coding in spike trains is still an open problem; however, one popular approach is to model spikes as realizations of an inhomogeneous Poisson process. In this talk, the limitations of the Poisson model are highlighted , and the Skellam Process with Resetting (SPR) is introduced as an alternative model for the analysis of neural spike trains. SPR is biologically justified, and the parameter estimation algorithm developed for it is computationally efficient. To allow for the modelling of neural ensembles, this process is generalized to the multivariate case, where Multivariate Skellam Process with Resetting (MSPR), as well as the multivariate Skellam distribution are introduced. Simulation and real data studies confirm the promising results of the Skellam model in the statistical analysis of neural spike trains.

Speaker: 

Reza Ramezan is a fresh PhD graduate from the University of Waterloo.

February 28, 2014
CRM-ISM-GERAD Colloque de statistique
Christian Robert ABC as the new empirical Bayes approach?

13:30-14:30

UdM, Pav. Roger-Gaudry, Salle S-116

Abstract:

Approximate Bayesian computation (ABC) has now become an essential tool for the analysis of complex stochastic models when the likelihood function is unavailable. The approximation is seen as a nuisance from a computational statistic point of view but we argue here it is also a blessing from an inferential perspective. We illustrate this paradoxical stand in the case of dynamic models and population genetics models. There are also major inference difficulties, as detailed in the case of Bayesian model choice.

Speaker:

Christian P. Robert is a Professor of Statistics at Université Paris-Dauphine, France.

March 14, 2014
McGill Statistics Seminar
Denis Larocque Mixed effects trees and forests for clustered data

15:30-16:30

BURN 1205
Abstract:

In this talk, I will present extensions of tree-based and random forest methods for the case of clustered data. The proposed methods can handle unbalanced clusters, allows observations within clusters to be splitted, and can incorporate random effects and observation-level covariates. The basic tree-building algorithm for a continuous outcome is implemented using standard algorithms within the framework of the EM algorithm. The extension to other types of outcomes (e.g., binary, count) uses the penalized quasi-likelihood (PQL) method for the estimation and the EM algorithm for the computation. Simulation results show that the proposed methods provides substantial improvements over standard trees and forests when the random effects are non negligible. The use of the method will be illustrated with real data sets.

Speaker: 

Denis Larocque is a professor of statistics at HEC Montreal.

March 21, 2014
CRM-ISM-GERAD Colloque de statistique
Jed Frees Insurance company operations and dependence modeling

15:30-16:30

McGill
Burnside Hall 107
Abstract:

Actuaries and other analysts have long had the responsibility in insurance company operations for various financial functions including  (i) ratemaking, the process of setting premiums, (ii) loss reserving, the process of predicting obligations that arise from policies, and (iii) claims management, including fraud detection. With the advent of modern computing capabilities and detailed and novel data sources, new  opportunities to make an impact on insurance company operations are extensive.

I focus on models at the "micro" level of risk, corresponding to an individual contract or individual claim. By understanding individual risks and considering a portfolio of risks, this level of detail allows, for example, the analyst to specifically consider the effects of a changing mix of business on a company's overall financial picture. Micro-level modeling smooths the path for the introduction of economic models of behavior. Economic models allow one to think about how a micro-level change (e.g., to a policy deductible) might affect risk outcomes. Actuaries and other analysts can provide useful advice if they can present a sensible projection of how a micro-level change can alter the risk position of a company's portfolio of policies.

This presentation focuses on interactions among risks at a micro-level, where for example, a policy may include several coverage types or face  several causes of loss. Dependence modeling among risks are important  for understanding a company's total risk obligation.

Speaker: 

Jed Frees is a Professor of Actuarial Science, Risk Management and Insurance at the Wisconsin School of Business, Madison, WI.

March 28, 2014
McGill Statistics Seminar
Ruodu Wang How much does the dependence structure matter?

15:30-16:30

BURN 1205
Abstract:

In this talk, we will look at some classical problems from an anti-traditional perspective. We will consider two problems regarding a sequence of random variables with a given common marginal distribution. First, we will introduce the notion of extreme negative dependence (END), a new benchmark for negative dependence, which is comparable to comonotonicity and independence. Second, we will study the compatibility of the marginal distribution and the limiting distribution when the dependence structure in the sequence is allowed to vary among all possibilities. The results are somewhat simple, yet surprising. We will provide some interpretation and applications of the theoretical results in financial risk management, with the hope to deliver the following message: with the common marginal distribution known and dependence structure unknown, we know essentially nothing about the asymptotic shape of the sum of random variables.

Speaker: 

Ruodu Wang  is an Assistant Professor in the Department of Statisitcs and Actuarial Science, University of Waterloo.

April 4, 2014
McGill Statistics Seminar
Bimal Sinha Some aspects of data analysis under confidentiality protection

15:30-16:30

BURN 1205
Abstract:

Statisticians working in most federal agencies are often faced with two conflicting objectives: (1) collect and publish useful datasets for designing public policies and building scientific theories, and (2) protect confidentiality of data respondents which is essential to uphold public trust, leading to better response rates and data accuracy. In this talk I will provide a survey of two statistical methods currently used at the U.S. Census Bureau: synthetic data and noise perturbed data.

Speaker 

Bimal Sinha is a Presidential Research Professor at the University of Maryland, Baltimore County, MD.

 

April 11, 2014
CRM-ISM-GERAD Colloque de statistique
Ryan Tibshirani Adaptive piecewise polynomial estimation via trend filtering

15:30-16:30

Salle KPMG, 1er étage
HEC Montréal
Abstract:

We will discuss trend filtering, a recently proposed tool of Kim et al. (2009) for nonparametric regression. The trend filtering estimate is defined as the minimizer of a penalized least squares criterion, in which the penalty term sums the absolute kth order discrete derivatives over the input points. Perhaps not surprisingly, trend filtering estimates appear to have the structure of kth degree spline functions, with adaptively chosen knot points (we say “appear” here as trend filtering estimates are not really functions over continuous domains, and are only defined over the discrete set of inputs). This brings to mind comparisons to other nonparametric regression tools that also produce adaptive splines; in particular, we will compare trend filtering to smoothing splines, which penalize the sum of squared derivatives across input points, and to locally adaptive regression splines (Mammen & van de Geer 1997), which penalize the total variation of the kth derivative.

Empirically, trend filtering estimates adapt to the local level of smoothness much better than smoothing splines, and further, they exhibit a remarkable similarity to locally adaptive regression splines. Theoretically, (suitably tuned) trend filtering estimates converge to the true underlying function at the minimax rate over the class of functions whose kth derivative is of bounded variation. The proof of this result follows from an asymptotic pairing of trend filtering and locally adaptive regression splines, which have already been shown to converge at the minimax rate (Mammen & van de Geer 1997). At the core of this argument is a new result tying together the fitted values of two lasso problems that share the same outcome vector, but have different predictor matrices.

Speaker:

Ryan Tibshirani is an Assistant Professor of Statistics at Carnegie Mellon University, Pittsburgh, PA.

Go to top

Fall Term 2014

 
Date Event Speaker(s) Title Time Location

September 12, 2014

McGill Statistics Seminar
Fateh Chebana Hydrological applications with the functional data analysis framework

15:30-16:30

BURN 1205
Abstract: River flows records are an essential data source for a variety of hydrological applications including the prevention of flood risks and as well as the planning and management of water resources. A hydrograph is a graphical representation of the temporal variation of flow over a period of time (continuously measured, usually over a year). A flood hydrograph is commonly characterized by a number of features, mainly its peak, volume and duration. Classical and recent multivariate approaches considered in hydrological applications treated these features jointly in order to take into account their dependence structure or their relationship. However, all these approaches are based on the analysis of a limited number of characteristics and do not make use of the full information provided by the hydrograph. Even though these approaches provided good results, they present some drawbacks and limitations. The objective of the present talk is to introduce a new framework for hydrological applications where data, such as hydrographs, are employed as continuous curves: functional data. In this context, the whole hydrograph is considered as one infinite-dimensional observation. This context contributes to addressing the problem of lack of data commonly encountered in hydrology. A number of functional data analysis tools and methods are presented and adapted.
Speaker: Fateh Chebana is a professor of statistics and hydrology at the Institut national de la recherche scientifique (INRS), Centre eau, terre et environnement, in Québec City.
September 19, 2014
McGill Statistics Seminar
Michael McIsaac

Covariates missing by design

15:30-16:30

BURN 1205
Abstract: Incomplete data can arise in many different situations for many different reasons. Sometimes the data may be incomplete for reasons beyond the control of the experimenter. However, it is also possible that this missingness is part of the study design. By using a two-phase sampling approach where only a small sub-sample gives complete information, it is possible to greatly reduce the cost of a study and still obtain precise estimates. This talk will introduce the concepts of incomplete data and two-phase sampling designs and will discuss adaptive two-phase designs which exploit information from an internal pilot study to approximate the optimal sampling scheme for an analysis based on mean score estimating equations.
Speaker: Michael McIsaac is an Assistant Professor in the Department of Public Health Sciences at Queen's University, Kingston, Ontario.

September 26, 2014

McGill Statistics Seminar
Tor Tosteson
Analysis of palliative care studies with joint models for quality-of-life measures and survival

15:30-16:30

BURN 1205
Abstract: In palliative care studies, the primary outcomes are often health related quality of life measures (HRLQ). Randomized trials and prospective cohorts typically recruit patients with advanced stage of disease and follow them until death or end of the study. An important feature of such studies is that, by design, some patients, but not all, are likely to die during the course of the study. This affects the interpretation of the conventional analysis of palliative care trials and suggests the need for specialized methods of analysis. We have developed a “terminal decline model” for palliative care trials that, by jointly modeling the time until death and the HRQL measures, leads to flexible interpretation and efficient analysis of the trial data (Li, Tosteson, Bakitas, STMED 2012).
Speaker: Tor Tosteson is a professor in the Department of Community and Family Medicine and Director of Biostatistics Shared Resource at the Norris Cotton Cancer and the Biostatistics Consulting Core of the Dartmouth Clinical and Translational Institute, Lebanon, NH.

October 3, 2014

McGill Statistics Seminar
Susan R. Wilson
Statistical exploratory data analysis in the modern era

15:30-16:30

BURN 1205
Abstract: Major challenges arising from today's "data deluge" include how to handle the commonly occurring situation of different types of variables (say, continuous and categorical) being simultaneously measured, as well as how to assess the accompanying flood of questions. Based on information theory, a bias-corrected mutual information (BCMI) measure of association that is valid and estimable between all basic types of variables has been proposed. It has the advantage of being able to identify non-linear as well as linear relationships. Based on the BCMI measure, a novel exploratory approach to finding associations in data sets having a large number of variables of different types has been developed. These associations can be used as a basis for downstream analyses such as finding clusters and networks. The application of this exploratory approach is very general. Comparisons also will be made with other measures. Illustrative examples include exploring relationships (i) in clinical and genomic (say, gene expression and genotypic) data, and (ii) between social, economic, health and political indicators from the World Health Organisation.
Speaker: Susan R. Wilson is a retired Professor of Statistics in the Mathematical Sciences Institute, Australian National University, Canberra.

October 10, 2014

McGill Statistics Seminar
Eric Cormier A margin-free clustering algorithm appropriate for dependent maxima in the domain of attraction of an extreme-value copula

15:30-16:30

BURN 1205
Abstract: Extracting relevant information in complex spatial-temporal data sets is of paramount importance in statistical climatology. This is especially true when identifying spatial dependencies between quantitative extremes like heavy rainfall. The paper of Bernard et al. (2013) develops a fast and simple clustering algorithm for finding spatial patterns appropriate for extremes. They develop their algorithm by adapting multivariate extreme-value theory to the context of spatial clustering. This is done by relating the variogram, a well-known distance used in geostatistics, to the extremal coefficient of a pair of joint maxima. This gives rise to a straightforward nonparametric estimator of this distance using the empirical distribution function. Their clustering approach is used to analyze weekly maxima of hourly precipitation recorded in France and a spatial pattern consistent with existing weather models arises. This applied talk is devoted to the validation and extension of this clustering approach. A simulation study using the multivariate logistic distribution as well as max-stable random fields shows that this approach provides accurate clustering when the maxima belong to an extreme-value distribution. Furthermore this clustering distance can be viewed as an average absolute rank difference, implying that it is appropriate for margin-free clustering of dependent variables. In particular it is appropriate for dependent maxima in the domain of attraction of an extreme-value copula.
Speaker: Eric Cormier is a PhD student in the Department of Mathematics and Statistics at McGill University.
October 17, 2014
McGill Statistics Seminar
Paramita S. Chaudhuri

Patient privacy, big data, and specimen pooling: Using an old tool for new challenges

15:30-16:30

BURN 1205
Abstract: In the recent past, electronic health records and distributed data networks emerged as a viable resource for medical and scientific research. As the use of confidential patient information from such sources become more common, maintaining privacy of patients is of utmost importance. For a binary disease outcome of interest, we show that the techniques of specimen pooling could be applied for analysis of large and/or distributed data while respecting patient privacy. I will review the pooled analysis for a binary outcome and then show how it can be used for distributed data. Aggregate level data are passed from the nodes of the network to the analysis center and can be used very easily with logistic regression for estimation of disease odds ratio associated with a set of categorical or continuous covariates. Pooling approach allows for consistent estimation of the parameters of logistic regression that can include confounders. Additionally, since the individual covariate values can be accessed within a network, effect modifiers can be accommodated and consistently estimated. Since pooling effectively reduces the size of the dataset by creating pools or sets of individual, the resulting dataset can be analyzed much more quickly as compared to an original dataset that is too big as compared to computing environment.
Speaker: Paramita S. Chaudhuri was recently hired as an Assistant Professor in the Department of Epidemiology, Biostatistics and Occupational Health at McGill.

October 24, 2014

McGill Statistics Seminar
Karin S. Dorman PREMIER: Probabilistic error-correction using Markov inference in error reads

15:30-16:30

BURN 1205
Abstract: Next generation sequencing (NGS) is a technology revolutionizing genetics and biology. Compared with the old Sanger sequencing method, the throughput is astounding and has fostered a slew of innovative sequencing applications.  Unfortunately, the error rates are also higher, complicating many downstream analyses.  For example, de novo assembly of genomes is less accurate and slower when reads include many errors.  We develop a probabilistic model for NGS reads that can detect and correct errors without a reference genome and while flexibly modeling and estimating the error properties of the sequencing machine.  It uses a penalized likelihood to enforce our prior belief that the kmer spectrum (collection of k-length strings observed in the reads) generated from a genome is sparse when k is sufficiently large.  The model formalizes core ideas that are used in many ad hoc algorithmic approaches to error correction.  We show our method can detect and remove more errors from sequencing reads than existing methods. Though our method carries a higher computational burden than the best algorithmic approaches, the probabilistic approach is extensible, flexible, and well-positioned to support downstream statistical analysis of the increasing volume of sequence data.
Speaker: Karin Dorman is an Associate Professor in the Statistics and Genetics, Development and Cell Biology departments and part of the Bioinformatics & Computational Biology interdepartmental program at Iowa State University, Ames, Iowa.

October 31, 2014

McGill Statistics Seminar
Marie-Pier Côté A copula-based model for risk aggregation

15:30-16:30

BURN 1205
Abstract: A flexible approach is proposed for risk aggregation. The model consists of a tree structure, bivariate copulas, and marginal distributions. The construction relies on a conditional independence assumption whose implications are studied. Selection the tree structure, estimation and model validation are illustrated using data from a Canadian property and casualty insurance company.
Speaker: Marie-Pier Côté is a PhD student in the Department of Mathematics and Statistics at McGill University.

November 7, 2014

McGill Statistics Seminar
Khader Khadraoui Bayesian regression with B-splines under combinations of shape constraints and smoothness properties

15:30-16:30

BURN 1205
Abstract: We approach the problem of shape constrained regression from a Bayesian perspective. A B-spline basis is used to model the regression function. The smoothness of the regression function is controlled by the order of the B-splines and the shape is controlled by the shape of an associated control polygon. Controlling the shape of the control polygon reduces to some inequality constraints on the spline coefficients. Our approach enables us to take into account combinations of shape constraints and to localize each shape constraint on a given interval. The performances of our method is investigated through a simulation study. Applications to real data sets from the food industry and Global Warming are provided.
Speaker: Khader Khadraoui is an Assistant Professor in the Department of Mathematics and Statistics at Université Laval. He obtained his PhD at the Université de Montpellier 2 in 2011.

November 14, 2014

McGill Statistics Seminar
Mayer Alvo Bridging the gap: A likelihood function approach for the analysis of ranking data

15:30-16:30

BURN 1205
Abstract: In the parametric setting, the notion of a likelihood function forms the basis for the development of tests of hypotheses and estimation of parameters. Tests in connection with the analysis of variance stem entirely from considerations of the likelihood function. On the other hand, non- parametric procedures have generally been derived without any formal mechanism and are often the result of clever intuition. In this talk, we propose a more formal approach for deriving tests involving the use of ranks. Specifically, we define a likelihood function motivated by characteristics of the ranks of the data and demonstrate that this leads to well-known tests of hypotheses. We also point to various areas of further exploration.
Speaker: Mayer Alvo is a Professor of Statistics at the University of Ottawa.
November 20, 2014
Colloque de mathématiques et de statistique de Montréal
Martin J. Wainwright

High-dimensional phenomena in mathematical statistics and convex analysis

16:00-17:00

CRM 1360 (U. de Montréal)
Abstract: Statistical models in which the ambient dimension is of the same order or larger than the sample size arise frequently in different areas of science and engineering. Although high-dimensional models of this type date back to the work of Kolmogorov, they have been the subject of intensive study over the past decade, and have interesting connections to many branches of mathematics (including concentration of measure, random matrix theory, convex geometry, and information theory). In this talk, we provide a broad overview of the general area, including vignettes on phase transitions in high-dimensional graph recovery, and randomized approximations of convex programs.
Speaker: Martin J. Wainwright is a Professor at the University of California at Berkeley and the 2014 winner of the prestigious COPSS Presidents' Award.
November 21, 2014
McGill Statistics Seminar
Martin J. Wainwright

Estimating by solving nonconvex programs: Statistical and computational guarantees

15:30-16:30

BURN 1205
Abstract: Many statistical estimators are based on solving nonconvex programs. Although the practical performance of such methods is often excellent, the associated theory is frequently incomplete, due to the potential gaps between global and local optima. In this talk, we present theoretical results that apply to all local optima of various regularized M-estimators, where both loss and penalty functions are allowed to be nonconvex. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for error-in-variables linear models; regression in generalized linear models using nonconvex regularizers such as SCAD and MCP; and graph and inverse covariance matrix estimation. Under suitable regularity conditions, our theory guarantees that any local optimum of the composite objective function lies within statistical precision of the true parameter vector. This result closes the gap between theory and practice for these methods.
Speaker: Martin J. Wainwright is a Professor at the University of California at Berkeley and the 2014 winner of the prestigious COPSS Presidents' Award.

November 28, 2014

McGill Statistics Seminar
Irene Vrbik Model-based methods of classification with applications

15:30-16:30

BURN 1205
Abstract: Model-based clustering via finite mixture models is a popular clustering method for finding hidden structures in data. The model is often assumed to be a finite mixture of multivariate normal distributions; however, flexible extensions have been developed over recent years. This talk demonstrates some methods employed in unsupervised, semi-supervised, and supervised classification that include skew-normal and skew-t mixture models. Both real and simulated data sets are used to demonstrate the efficacy of these techniques.
Speaker: Irene Vrbik is a postdoctoral fellow in the Department of Mathematics and Statistics at McGill. She holds a PhD from the University of Guelph.

December 5, 2014

McGill Statistics Seminar
Julien Roger Copula model selection: A statistical approach

15:30-16:30

BURN 1205
Abstract: Copula model selection is an important problem because similar but differing copula models can offer different conclusions surrounding the dependence structure of random variables. Chen & Fan (2005) proposed a model selection method involving a statistical hypothesis test. The hypothesis test attempts to take into account the randomness of the AIC and other likelihood-based model selection methods for finite samples. Performance of the test compared to the more common approach of AIC is illustrated in a series of simulations.
Speaker: Julien Roger is an MSc student in the Department of Mathematics and Statistics at McGill University.

December 12, 2014

McGill Statistics Seminar
James Sharpnack Testing for structured Normal means

15:30-16:30

BURN 708
Abstract: We will discuss the detection of pattern in images and graphs from a high-dimensional Gaussian measurement. This problem is relevant to many applications including detecting anomalies in sensor and computer networks, large-scale surveillance, co-expressions in gene networks, disease outbreaks, etc. Beyond its wide applicability, structured Normal means detection serves as a case study in the difficulty of balancing computational complexity with statistical power. We will begin by discussing the detection of active rectangles in images and sensor grids. We will develop an adaptive scan test and determine its asymptotic distribution. We propose an approximate algorithm that runs in nearly linear time but achieves the same asymptotic distribution as the naive, quadratic run-time algorithm. We will move on to the more general problem of detecting a well-connected active subgraph within a graph in the Normal means context. Because the generalized likelihood ratio test is computationally infeasible, we propose approximate algorithms and study their statistical efficiency. One such algorithm that we develop is the graph Fourier scan statistic, whose statistical performance is characterized by the spectrum of the graph Laplacian. Another relaxation that we have developed is the Lovasz extended scan statistic (LESS), which is based on submodular optimization and the performance is described using electrical network theory. We also introduce the spanning tree wavelet basis over graphs, a localized basis that reflects the topology of the graph. For each of these tests we compare their statistical guarantees to an information theoretic lower bound.
Speaker: James Sharpnack is a Postdoctoral Fellow at the University of California, San Diego.

Go to top

Winter Term 2015

 
Date Event Speaker(s) Title Time Location

January 9, 2015

McGill Statistics Seminar
James O. Ramsay
Space-time data analysis: Out of the Hilbert box

15:30-16:30

BURN 1205
Abstract: Given the discouraging state of current efforts to curb global warming, we can imagine that we will soon turn our attention to mitigation. On a global scale, distressed populations will turn to national and international organizations for solutions to dramatic problems caused by climate change. These institutions in turn will mandate the collection of data on a scale and resolution that will present extraordinary statistical and computational challenges to those of us viewed as having the appropriate expertise. A review of the current state of our space-time data analysis machinery suggests that we have much to do. Most of current spatial modelling methodology is based on concepts translated from time series analysis, is heavily dependent on various kinds of stationarity assumptions, uses the Gaussian distribution to model data and depends on a priori coordinate systems that do not exist in nature. A way forward from this restrictive framework is proposed by modelling data over textured domains using layered coordinate systems.
Speaker: James O. Ramsay is a retired professor of psychology at McGill University. He is also an Adjunct Professor at Carleton University, in Ottawa.

January 13, 2015

McGill Statistics Seminar
Ryan P. Browne
Mixtures of coalesced generalized hyperbolic distributions

15:30-16:30

BURN 1205
Abstract: A mixture of coalesced generalized hyperbolic distributions is developed by joining a finite mixture of generalized hyperbolic distributions with a mixture of multiple scaled generalized hyperbolic distributions. The result is a mixture of mixtures with shared model parameters and common mode. We begin by discussing the generalized hyperbolic distribution, which has the t, Gaussian and others as special cases. The generalized hyperbolic distribution can represented as a normal-variance mixture using a generalized inverse Gaussian distribution. This representation makes it a suitable candidate for the expectation-maximization algorithm. Secondly, we discuss the multiple scale generalized hyperbolic distribution which arises via implementation of a multi-dimensional weight function. A parameter estimation scheme is developed using the ever-expanding class of MM algorithms and the Bayesian information criterion is used for model selection. Special consideration is given to the contour shape. We use the coalesced distribution for clustering and compare them to finite mixtures of skew-t distributions using simulated and real data sets. Finally, the role of generalized hyperbolic mixtures within the wider model-based clustering, classification, and density estimation literature is discussed.
Speaker: Ryan P. Browne is an Assistant Professor in the Department of Mathematics and Statistics at McMaster University, in Hamilton, Ontario.
January 15, 2015
Colloque de mathématiques et de statistique de Montréal
Fang Yao

Functional data analysis and related topics

16:00-17:00

CRM 1360 (U. de Montréal)
Abstract: Functional data analysis (FDA) has received substantial attention, with applications arising from various disciplines, such as engineering, public health, finance etc. In general, the FDA approaches focus on nonparametric underlying models that assume the data are observed from realizations of stochastic processes satisfying some regularity conditions, e.g., smoothness constraints. The estimation and inference procedures usually do not depend on merely a finite number of parameters, which contrasts with parametric models, and exploit techniques, such as smoothing methods and dimension reduction, that allow data to speak for themselves. In this talk, I will give an overview of FDA methods and related topics developed in recent years.
Speaker: Fang Yao is a Professor in the Department of Statistics at the University of Toronto. He is the 2014 recipient of the CRM-SSC Prize.
January 16, 2015
McGill Statistics Seminar
Fang Yao

Simultaneous white noise models and shrinkage recovery of functional data

15:30-16:30

BURN 1205
Abstract: We consider the white noise representation of functional data taken as i.i.d. realizations of a Gaussian process. The main idea is to establish an asymptotic equivalence in Le Cam’s sense between an experiment which simultaneously describes these realizations and a collection of white noise models. In this context, we project onto an arbitrary basis and apply a novel variant of Stein-type estimation for optimal recovery of the realized trajectories. A key inequality is derived showing that the corresponding risks, conditioned on the underlying curves, are minimax optimal and can be made arbitrarily close to those that an oracle with knowledge of the process would attain. Empirical performance is illustrated through simulated and real data examples.
Speaker: Fang Yao is a Professor in the Department of Statistics at the University of Toronto. He is the 2014 recipient of the CRM-SSC Prize.

January 30, 2015

McGill Statistics Seminar
Yuekai Sun
Distributed estimation and inference for sparse regression

15:30-16:30

BURN 1205
Abstract: We address two outstanding challenges in sparse regression: (i) computationally efficient estimation in distributed settings; (ii) valid inference for the selected coefficients. The main computational challenge in a distributed setting is harnessing the computational capabilities of all the machines while keeping communication costs low. We devise an approach that requires only a single round of communication among the machines. We show the approach recovers the convergence rate of the (centralized) lasso as long as each machine has access to an adequate number of samples. Turning to the second challenge, we devise an approach to post-selection inference by conditioning on the selected model. In a nutshell, our approach gives inferences with the same frequency interpretation as those given by data/sample splitting, but it is more broadly applicable and more powerful. The validity of our approach also does not depend on the correctness of the selected model, i.e., it gives valid inferences even when the selected model is incorrect.

This talk is based on joint work with Jason Lee, Qiang Liu, Dennis Sun, and Jonathan Taylor.

Speaker: Yuekai Sun is a PhD candidate at the Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA.

February 2, 2015

McGill Statistics Seminar
Liqun Diao Joint analysis of multiple multi-state processes via copulas

15:30-16:30

BURN 1214
Abstract: A copula-based model is described which enables joint analysis of multiple progressive multi-state processes. Unlike intensity-based or frailty-based approaches to joint modeling, the copula formulation proposed herein ensures that a wide range of marginal multi-state processes can be specified and the joint model will retain these marginal features. The copula formulation also facilitates a variety of approaches to estimation and inference including composite likelihood and two-stage estimation procedures. We consider processes with Markov margins in detail, which are often suitable when chronic diseases are progressive in nature. We give special attention to the setting in which individuals are examined intermittently and transition times are consequently interval-censored. Simulation studies give empirical insight into the different methods of analysis and an application involving progression in joint damage in psoriatic arthritis provides further illustration.
Speaker: Liqun Diao is a postdoctoral fellow in the Department of Statistics and Actuarial Science at the University of Waterloo.

February 5, 2015

McGill Statistics Seminar
Yi Yang A fast unified algorithm for solving group Lasso penalized learning problems

15:30-16:30

BURN 1B39
Abstract: We consider a class of group-lasso learning problems where the objective function is the sum of an empirical loss and the group-lasso penalty. For a class of loss function satisfying a quadratic majorization condition, we derive a unified algorithm called groupwise-majorization-descent (GMD) for efficiently computing the solution paths of the corresponding group-lasso penalized learning problem. GMD allows for general design matrices, without requiring the predictors to be group-wise orthonormal. As illustration examples, we develop concrete algorithms for solving the group-lasso penalized least squares and several group-lasso penalized large margin classifiers. These group-lasso models have been implemented in an R package gglasso publicly available from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/web/packages/gglasso. On simulated and real data, gglasso consistently outperforms the existing software for computing the group-lasso that implements either the classical groupwise descent algorithm or Nesterov's method. An application in risk segmentation of insurance business is illustrated by analysis of an auto insurance claim dataset.
Speaker: Yi Yang is a PhD student in the School of Statistics at the University of Minnesota.

February 13, 2015

McGill Statistics Seminar
Johannes Lederer Tuning parameters in high-dimensional statistics

15:30-16:30

BURN 1205
Abstract: High-dimensional statistics is the basis for analyzing large and complex data sets that are generated by cutting-edge technologies in genetics, neuroscience, astronomy, and many other fields. However, Lasso, Ridge Regression, Graphical Lasso, and other standard methods in high-dimensional statistics depend on tuning parameters that are difficult to calibrate in practice. In this talk, I present two novel approaches to overcome this difficulty. My first approach is based on a novel testing scheme that is inspired by Lepski’s idea for bandwidth selection in non-parametric statistics. This approach provides tuning parameter calibration for estimation and prediction with the Lasso and other standard methods and is to date the only way to ensure high performance, fast computations, and optimal finite sample guarantees. My second approach is based on the minimization of an objective function that avoids tuning parameters altogether. This approach provides accurate variable selection in regression settings and, additionally, opens up new possibilities for the estimation of gene regulation networks, microbial ecosystems, and many other network structures.
Speaker: Johannes Lederer is Jacob Wolfowitz Visiting Assistant Professor in the Department for Statistical Science at Cornell University, Ithaca, NY.

February 20, 2015

McGill Statistics Seminar
Martin Lysy Comparison and assessment of particle diffusion models in biological fluids

15:30-16:30

BURN 1205
Abstract: Rapidly progressing particle tracking techniques have revealed that foreign particles in biological fluids exhibit rich and at times unexpected behavior, with important consequences for disease diagnosis and drug delivery. Yet, there remains a frustrating lack of coherence in the description of these particles' motion. Largely this is due to a reliance on functional statistics (e.g., mean-squared displacement) to perform model selection and assess goodness-of-fit. However, not only are such functional characteristics typically estimated with substantial variability, but also they may fail to distinguish between a number of stochastic processes --- each making fundamentally different predictions for relevant quantities of scientific interest. In this talk, I will describe a detailed Bayesian analysis of leading candidate models for subdiffusive particle trajectories in human pulmonary mucus. Efficient and scalable computational strategies will be proposed. Model selection will be achieved by way of intrinsic Bayes factors, which avoid both non-informative priors and "using the data twice". Goodness-of-fit will be evaluated via second-order criteria along with exact model residuals. Our findings suggest that a simple model of fractional Brownian motion describes the data just as well as a first-principles physical model of visco-elastic subdiffusion.
Speaker: Martin Lysy is an Assistant Professor in the Department of Statistics and Actuarial Science at the University of Waterloo.

February 27, 2015

McGill Statistics Seminar
Lynn Lin A novel statistical framework to characterize antigen-specific T-cell functional diversity in single-cell expression data

15:30-16:30

BURN 1205
Abstract: I will talk about COMPASS, a new Bayesian hierarchical framework for characterizing functional differences in antigen-specific T cells by leveraging high-throughput, single-cell flow cytometry data. In particular, I will illustrate, using a variety of data sets, how COMPASS can reveal subtle and complex changes in antigen-specific T-cell activation profiles that correlate with biological endpoints. Applying COMPASS to data from the RV144 (“the Thai trial”) HIV clinical trial, it identified novel T-cell subsets that were inverse correlates of HIV infection risk. I also developed intuitive metrics for summarizing multivariate antigen-specific T-cell activation profiles for endpoints analysis. In addition, COMPASS identified correlates of latent infection in an immune study of Tuberculosis among South African adolescents. COMPASS is available as an R package and is sufficiently general that it can be adapted to new high-throughput data types, such as Mass Cytometry (CyTOF) and single-cell gene expressions, enabling interdisciplinary collaboration, which I will also highlight in my talk.
Speaker: Lynn Lin is currently a Postdoctoral Fellow at the Fred Hutchinson Cancer Research Center, Seattle, WA.

March 13, 2015

McGill Statistics Seminar
David A. Stephens Bayesian approaches to causal inference: A lack-of-success story

15:30-16:30

BURN 1205
Abstract: Despite almost universal acceptance across most fields of statistics, Bayesian inferential methods have yet to breakthrough to widespread use in causal inference, despite Bayesian arguments being a core component of early developments in the field. Some quasi-Bayesian procedures have been proposed, but often these approaches rely on heuristic, sometimes flawed, arguments. In this talk I will discuss some formulations of classical causal inference problems from the perspective of standard Bayesian representations, and propose some inferential solutions. This is joint work with Olli Saarela, Dalla Lana School of Public Health, University of Toronto, Erica Moodie, Department of Epidemiology, Biostatistics and Occupational Health, McGill University, and Marina Klein, Division of Infectious Diseases, Faculty of Medicine, McGill University.
Speaker: David A. Stephens ia a James McGill Professor in the Department of Mathematics and Statistics at McGill University.

March 20, 2015

McGill Statistics Seminar
Beate Franke Testing for network community structure

15:30-16:30

BURN 1205
Abstract: Networks provide a useful means to summarize sparse yet structured massive datasets, and so are an important aspect of the theory of big data. A key question in this setting is to test for the significance of community structure or what in social networks is termed homophily, the tendency of nodes to be connected based on similar characteristics. Network models where a single parameter per node governs the propensity of connection are popular in practice, because they are simple to understand and analyze. They frequently arise as null models to indicate a lack of community structure, since they cannot readily describe the division of a network into groups of nodes whose aggregate links behave in a block-like manner. Here we discuss asymptotic regimes under families of such models, and show their potential for enabling hypothesis tests in this setting. As an important special case, we treat network modularity, which summarizes the difference between observed and expected within-community edges under such null models, and which has seen much success in practical applications of large-scale network analysis. Our focus here is on statistical rather than algorithmic properties, however, in order to yield new insights into the canonical problem of testing for network community structure.
Speaker: Beate Franke is a PhD student in the Department of Statistical Science at University College London.

May 7, 2015

Colloque de mathématiques et de statistique du Québec
Robert Lund A statistical view of some recent climate controversies

15:30-16:30

Université de Sherbrooke
Abstract: This talk looks at some recent climate controversies from a statistical standpoint. The issues are motivated via changepoints and their detection. Changepoints are ubiquitous features in climatic time series, occurring whenever stations relocate or gauges are changed. Ignoring changepoints can produce spurious trend conclusions. Changepoint tests involving cumulative sums, likelihood ratio, and maximums of F-statistics are introduced; the asymptotic distributions of these statistics are quantified under the changepoint-free null hypothesis. The case of multiple changepoints is considered. The methods are used to study several controversies, including extreme temperature trends in the United States and Atlantic Basin tropical cyclone counts and strengths.
Speaker: Robert Lund is a Professor of Statistics in the Department of Mathematical Sciences at Clemson University in South Carolina.

May 14, 2015

McGill Statistics Seminar
José María Sarabia Some new classes of bivariate distributions based on conditional specification

15:30-16:30

BURN 1205
Abstract: A bivariate distribution can sometimes be characterized completely by properties of its conditional distributions. In this talk, we will discuss models of bivariate distributions whose conditionals are members of prescribed parametric families of distributions. Some relevant models with specified conditionals will be discussed, including the normal and lognormal cases, the skew-normal and other families of distributions. Finally, some conditionally specified densities will be shown to provide convenient flexible conjugate prior families in certain multiparameter Bayesian settings.
Speaker: José María Sarabia is a Professor of Statistics in the Department of Economics at the University of Cantabria, Santander, Spain.

Go to top

Fall Term 2015

Date Event Speaker(s) Title Time Location

September 11, 2015

McGill Statistics Seminar
Anne-Laure Fougères Bias correction in multivariate extremes

15:30-16:30

BURN 1205
Abstract: The estimation of the extremal dependence structure of a multivariate extreme-value distribution is spoiled by the impact of the bias, which increases with the number of observations used for the estimation. Already known in the univariate setting, the bias correction procedure is studied in this talk under the multivariate framework. New families of estimators of the stable tail dependence function are obtained. They are asymptotically unbiased versions of the empirical estimator introduced by Huang (1992). Given that the new estimators have a regular behavior with respect to the number of observations, it is possible to deduce aggregated versions so that the choice of threshold is substantially simplified. An extensive simulation study is provided as well as an application on real data.
Speaker: Anne-Laure Fougères is a Professor in the Institut Camille-Jordan, Université Claude-Bernard, Lyon, France. Her main research interests are in extreme-value theory, multivariate data modeling, and functional estimation under shape constraints.
September 18, 2015
McGill Statistics Seminar
Yi Yang
A unified algorithm for fitting penalized models with high-dimensional data

15:30-16:30

BURN 1205
Abstract: In the light of high-dimensional problems, research on the penalized model has received much interest. Correspondingly, several algorithms have been developed for solving penalized high-dimensional models. I will describe fast and efficient unified algorithms for computing the solution path for a collection of penalized models. In particular, we will look at an algorithm for solving L1-penalized learning problems and an algorithm for solving group-lasso learning problems. These algorithm take advantage of a majorization-minimization trick to make each update simple and efficient. The algorithms also enjoy a proven convergence property. To demonstrate the generality of these algorithms, I extend them to a class of elastic net penalized large margin classification methods and to elastic net penalized Cox proportional hazards models. These algorithms have been implemented in three R packages gglasso, gcdnet and fastcox, which are publicly available from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/web/packages. On simulated and real data, our algorithms consistently outperform the existing software in speed for computing penalized models and often delivers better quality solutions.
Speaker: Yi Yang is a newly hired Assistant Professor in the Department of Mathematics and Statistics at McGill University, Montréal.

September 25, 2015

McGill Statistics Seminar
Yue Zhao
Topics in statistical inference for the semiparametric elliptical copula model

15:30-16:30

BURN 1205
Abstract: This talk addresses aspects of the statistical inference problem for the semiparametric elliptical copula model. The semiparametric elliptical copula model is the family of distributions whose dependence structures are specified by parametric elliptical copulas but whose marginal distributions are left unspecified. An elliptical copula is uniquely characterized by a characteristic generator and a copula correlation matrix Sigma. In the first part of this talk, I will consider the estimation of Sigma. A natural estimate for Sigma is the plug-in estimator Sigmahat with Kendall's tau statistic. I will first exhibit a sharp bound on the operator norm of Sigmahat - Sigma. I will then consider a factor model of Sigma, for which I will propose a refined estimator Sigmatilde by fitting a low-rank matrix plus a diagonal matrix to Sigmahat using least squares with a nuclear norm penalty on the low-rank matrix. The bound on the operator norm of Sigmahat - Sigma serves to scale the penalty term, and we obtained finite-sample oracle inequalities for Sigmatilde that I will present. In the second part of this talk, we will look at the classification of two distributions that have the same Gaussian copula but that are otherwise arbitrary in high dimensions. Under this semiparametric Gaussian copula setting, I will give an accurate semiparametric estimator of the log-density ratio, which leads to an empirical decision rule and a bound on its associated excess risk. Our estimation procedure takes advantage of the potential sparsity as well as the low noise condition in the problem, which allows us to achieve faster convergence rate of the excess risk than is possible in the existing literature on semiparametric Gaussian copula classification. I will demonstrate the efficiency of our semiparametric empirical decision rule by showing that the bound on the excess risk nearly achieves a convergence rate of 1 over square-root-n in the simple setting of Gaussian distribution classification.
Speaker: Yue Zhao is a Postdoctoral Fellow in the Department of Mathematics and Statistics at McGill University, Montréal.

October 2, 2015

McGill Statistics Seminar
Didier Chételat
Estimating covariance matrices of intermediate size

15:30-16:30

BURN 1205
Abstract: In finance, the covariance matrix of many assets is a key component of financial portfolio optimization and is usually estimated from historical data. Much research in the past decade has focused on improving estimation by studying the asymptotics of large covariance matrices in the so-called high-dimensional regime, where the dimension p grows at the same pace as the sample size n, and this approach has been very successful. This choice of growth makes sense in part because, based on results for eigenvalues, it appears that there are only two limits: the high-dimensional one when p grows like n, and the classical one, when p grows more slowly than n. In this talk, I will present evidence that this binary view is false, and that there could be hidden intermediate regimes lying in between. In turn, this allows for corrections to the sample covariance matrix that are more appropriate when the dimension is large but moderate with respect to the sample size, as is often the case; this can also lead to better optimization for portfolio volatility in many situations of interest.
Speaker: Didier Chételat is a newly hired Assistant Professor in the Department of Decision Sciences at HEC Montréal.

October 9, 2015

McGill Statistics Seminar
Michelle Carey Parameter estimation of partial differential equations over irregular domains

15:30-16:30

BURN 1205
Abstract: Spatio-temporal data are abundant in many scientific fields; examples include daily satellite images of the earth, hourly temperature readings from multiple weather stations, and the spread of an infectious disease over a particular region. In many instances the spatio-temporal data are accompanied by mathematical models expressed in terms of partial differential equations (PDEs). These PDEs determine the theoretical aspects of the behavior of the physical, chemical or biological phenomena considered. Azzimonti (2013) showed that including the associated PDE as a regularization term as opposed to the conventional two-dimensional Laplacian provides a considerable improvement in the estimation accuracy. The PDEs parameters often have interesting interpretations. Although they are typically unknown and must be inferred from expert knowledge of the phenomena considered. In this talk I will discuss extending the profiling with a parameter cascading procedure outlined in Ramsay et al. (2007) to incorporate PDE parameter estimation. I will also show how, following Sangalli et al. (2013), the estimation procedure can be extended to include finite-element methods (FEMs). This allows the proposed method to account for attributes of the geometry of the physical problem such as irregular shaped domains, external and internal boundary features, as well as strong concavities. Thus this talk will introduce a methodology for data-driven estimates of the parameters of PDEs defined over irregular domains.
Speaker: Michelle Carey is a Postdoctoral Fellow in the Department of Mathematics and Statistics at McGill University, Montréal.
October 16, 2015
McGill Statistics Seminar
George Michailidis Estimating high-dimensional multi-layered networks through penalized maximum likelihood

15:30-16:30

BURN 1205
Abstract: Gaussian graphical models represent a good tool for capturing interactions between nodes represent the underlying random variables. However, in many applications in biology one is interested in modeling associations both between, as well as within molecular compartments (e.g., interactions between genes and proteins/metabolites). To this end, inferring multi-layered network structures from high-dimensional data provides insight into understanding the conditional relationships among nodes within layers, after adjusting for and quantifying the effects of nodes from other layers. We propose an integrated algorithmic approach for estimating multi-layered networks, that incorporates a screening step for significant variables, an optimization algorithm for estimating the key model parameters and a stability selection step for selecting the most stable effects. The proposed methodology offers an efficient way of estimating the edges within and across layers iteratively, by solving an optimization problem constructed based on penalized maximum likelihood (under a Gaussianity assumption). The optimization is solved on a reduced parameter space that is identified through screening, which remedies the instability in high-dimension. Theoretical properties are considered to ensure identifiability and consistent estimation of the parameters and convergence of the optimization algorithm, despite the lack of global convexity. The performance of the methodology is illustrated on synthetic data sets and on an application on gene and metabolic expression data for patients with renal disease.
Speaker: George Michailidis is Professor and Director of the Informatics Institute at the University of Florida, Gainesville. His research interests include multivariate analysis and machine learning; computational statistics; change-point estimation; stochastic processing networks; bioinformatics; network tomography; visual analytics; as well as statistical methodology with applications to computer, communications and sensor networks.

October 23, 2015

McGill Statistics Seminar
Weixin Yao Robust mixture regression and outlier detection via penalized likelihood

15:30-16:30

BURN 1205
Abstract: Finite mixture regression models have been widely used for modeling mixed regression relationships arising from a clustered and thus heterogenous population. The classical normal mixture model, despite of its simplicity and wide applicability, may fail dramatically in the presence of severe outliers. We propose a robust mixture regression approach based on a sparse, case-specific, and scale-dependent mean-shift parameterization, for simultaneously conducting outlier detection and robust parameter estimation. A penalized likelihood approach is adopted to induce sparsity among the mean-shift parameters so that the outliers are distinguished from the good observations, and a thresholding-embedded Expectation-Maximization (EM) algorithm is developed to enable stable and efficient computation. The proposed penalized estimation approach is shown to have strong connections with other robust methods including the trimmed likelihood and the M-estimation methods. Comparing with several existing methods, the proposed methods show outstanding performance in numerical studies.
Speaker: Weixin Yao is an Associate Professor of Statistics at the University of California, Riverside. His research interests include mixture models, nonparametric and semiparametric modeling, longitudinal data analysis, robust estimation, high-dimensional modeling, variable selection, and dimension reduction.

October 30, 2015

Colloque de mathématiques de Montréal
Emmanuel Candès A knockoff filter for controlling the false discovery rate

16:00-17:00

Salle 1360, Pavillon André-Aisenstadt, Université de Montréal
Abstract: The big data era has created a new scientific paradigm: collect data first, ask questions later. Imagine that we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are truly associated with the response. At the same time, we need to know that the false discovery rate (FDR) - the expected fraction of false discoveries among all discoveries - is not too high, in order to assure the scientist that most of the discoveries are indeed true and replicable. We introduce the knockoff filter, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables. This method works by constructing fake variables, knockoffs, which can then be used as controls for the true variables; the method achieves exact FDR control in finite-sample settings no matter the design or covariates, the number of variables in the model, and the amplitudes of the unknown regression coefficients, and does not require any knowledge of the noise level. This is joint work with Rina Foygel Barber.
Speaker: Emmanuel Candès is a professor of mathematics, statistics, and electrical engineering at Stanford University, where he is also the Barnum-Simons Chair in Mathematics and Statistics.

November 6, 2015

McGill Statistics Seminar
Lawrence McCandless Bayesian analysis of non-identifiable models, with an example from epidemiology and biostatistics

15:30-16:30

BURN 1205
Abstract: Most regression models in biostatistics assume identifiability, which means that each point in the parameter space corresponds to a unique likelihood function for the observable data. Recently there has been interest in Bayesian inference for non-identifiable models, which can better represent uncertainty in some contexts. One example is in the field of epidemiology, where the investigator is concerned with bias due to unmeasured confounders (omitted variables). In this talk, I will illustrate Bayesian analysis of a non-identifiable model from epidemiology using government administrative data from British Columbia. I will show how to use the software STAN, which is new software developed by Andrew Gelman and others in the USA. STAN allows the careful study of posterior distributions in a vast collection of Bayesian models, including non-identifiable models for bias in epidemiology, which are poorly suited to conventional Gibbs sampling.
Speaker: Lawrence McCandless is an Associate Professor in the Faculty of Health Sciences at Simon Fraser University, Burnaby, BC. His broad research interests include Bayesian inference, causal inference, mediation analysis, meta-analysis, and survival analysis.

November 13, 2015

McGill Statistics Seminar
Masoud Asgharian Prevalent cohort studies: Length-biased sampling with right censoring

15:30-16:30

BURN 1205
Abstract: Logistic or other constraints often preclude the possibility of conducting incident cohort studies. A feasible alternative in such cases is to conduct a cross-sectional prevalent cohort study for which we recruit prevalent cases, i.e., subjects who have already experienced the initiating event, say the onset of a disease. When the interest lies in estimating the lifespan between the initiating event and a terminating event, say death for instance, such subjects may be followed prospectively until the terminating event or loss to follow-up, whichever happens first. It is well known that prevalent cases have, on average, longer lifespans. As such, they do not form a representative random sample from the target population; they comprise a biased sample. If the initiating events are generated from a stationary Poisson process, the so-called stationarity assumption, this bias is called length bias. I present the basics of nonparametric inference using length-biased right censored failure time data. I'll then discuss some recent progress and current challenges. Our study is mainly motivated by challenges and questions raised in analyzing survival data collected on patients with dementia as part of a nationwide study in Canada, called the Canadian Study of Health and Aging (CSHA). I'll use these data throughout the talk to discuss and motivate our methodology and its applications.
Speaker: Masoud Asgharian is a Professor in the Department of Mathematics and Statistics at McGill University, Montréal.
November 26, 2015
Colloque de statistique de Montréal
Richard J. Cook

Inference regarding within-family association in disease onset times under biased sampling schemes

15:30-16:30

BURN 306
Abstract: In preliminary studies of the genetic basis for chronic conditions, interest routinely lies in the within-family dependence in disease status. When probands are selected from disease registries and their respective families are recruited, a variety of ascertainment bias-corrected methods of inference are available which are typically based on models for correlated binary data. This approach ignores the age that family members are at the time of assessment. We consider copula-based models for assessing the within-family dependence in the disease onset time and disease progression, based on right-censored and current status observation of the non-probands. Inferences based on likelihood, composite likelihood and estimating functions are each discussed and compared in terms of asymptotic and empirical  relative efficiency. This is joint work with Yujie Zhong.
Speaker: Richard J. Cook is professor in the Department of Statistics and Actuarial Science at the University of Waterloo, Ontario, where he holds a Canada Research Chair (Tier I) in Statistical Methods for Health Research.

December 10, 2015

Colloque de statistique de Montréal
Nicolai Meinshausen Causal discovery with confidence using invariance principles

15:30-16:30

UdeM, Pav. Roger-Gaudry, salle S-116
Abstract: What is interesting about causal inference? One of the most compelling aspects is that any prediction under a causal model is valid in environments that are possibly very different to the environment used for inference. For example, variables can be actively changed and predictions will still be valid and useful. This invariance is very useful but still leaves open the difficult question of inference. We propose to turn this invariance principle around and exploit the invariance for inference. If we observe a system in different environments (or under different but possibly not well specified interventions) we can identify all models that are invariant. We know that any causal model has to be in this subset of invariant models. This allows causal inference with valid confidence intervals. We propose different estimators, depending on the nature of the interventions and depending on whether hidden variables and feedbacks are present. Some empirical examples demonstrate the power and possible pitfalls of this approach.
Speaker: Nicolail Meinshausen is a Professor in the Department of Statistics of ETH Zürich. He was, among others, the Medallion Lecturer at the Joint Statistical Meetings (JSM) in Seattle, WA, in August 2015.

 

Go to top

Winter Term 2016

 

Date Event Speaker(s) Title Time Location

January 22, 2016

McGill Statistics Seminar
David Haziza
Robust estimation in the presence of influential units in surveys

15:30-16:30

BURN 1205
Abstract:

Influential units are those which make classical estimators (e.g., the Horvitz-Thompson estimator or calibration estimators) very unstable. The problem of influential units is particularly important in business surveys, which collect economic variables, whose distribution are highly skewed (heavy right tail). In this talk, we will attempt to answer the following questions:
 
(1)   What is an influential value in surveys?
(2)   How measure the influence of unit?
(3)   How reduce the impact of influential units at the estimation stage?

To measure the influence of unit, we use the concept of conditional bias and argue that it is an appropriate influence measure since it accounts for the sampling design, the type of parameter to be estimated and the type of estimator. Using the conditional bias, we propose a general robust estimator, which possesses the desirable feature of being applicable with arbitrary sampling designs. For stratified simple random sampling, it is essentially equivalent to the estimator Kokic and Bell (1994).  The proposed robust estimator involves a  $\Psi$-function, which depends on a tuning constant. We propose a method for determining the tuning constant, which consists of minimizing the maximum estimated conditional bias. We show that the resulting robust estimator is design-consistent. The implementation of the estimator will be also discussed. Finally, the results of an empirical study, which compares several estimators in terms of bias and relative efficiency, will be presented.

Speaker: David Haziza is a professor of statistics at the Université de Montréal. His main research area is survey sampling.
January 29, 2016
McGill Statistics Seminar
Annaliza McGillivray

Estimating high-dimensional networks with hubs with an application to microbiome data

15:30-16:30

BURN 1205
Abstract: In this talk, we investigate the problem of estimating high-dimensional networks in which there are a few highly connected “hub" nodes. Methods based on L1-regularization have been widely used for performing sparse selection in the graphical modelling context. However, the L1 penalty penalizes each edge equally and independently of each other without taking into account any structural information. We introduce a new method for estimating undirected graphical models with hubs, called the hubs weighted graphical lasso (HWGL). This is a two-step procedure with a hub screening step, followed by network reconstruction in the second step using a weighted lasso approach that incorporates the inferred network topology. Empirically, we show that the HWGL outperforms competing methods and illustrate the methodology with an application to microbiome data.
Speaker: Annaliza McGillivray ins a PhD candidate in our department. She works with Abbas Khalili and David Stephens.

February 5, 2016

McGill Statistics Seminar
Denis Talbot The Bayesian causal effect estimation algorithm

15:30-16:30

BURN 1214
Abstract: Estimating causal exposure effects in observational studies ideally requires the analyst to have a vast knowledge of the domain of application. Investigators often bypass difficulties related to the identification and selection of confounders through the use of fully adjusted outcome regression models. However, since such models likely contain more covariates than required, the variance of the regression coefficient for exposure may be unnecessarily large. Instead of using a fully adjusted model, model selection can be attempted. Most classical statistical model selection approaches, such as Bayesian model averaging, do not readily address causal effect estimation. We present a new model averaged approach to causal inference, Bayesian causal effect estimation (BCEE), which is motivated by the graphical framework for causal inference. BCEE aims to unbiasedly estimate the causal effect of a continuous exposure on a continuous outcome while being more efficient than a fully adjusted approach.
Speaker: Denis Talbot is an Assistant Professor in the Département de médecine sociale et préventive, Université Laval, Québec.

February 11, 2016

CRM-SSC Prize Address
Matías Salibián-Barrera Outlier detection for functional data using principal components

16:00-17:00

CRM 6254 (U. de Montréal)
Abstract: Principal components analysis is a widely used technique that provides an optimal lower-dimensional approximation to multivariate observations. In the functional case, a new characterization of elliptical distributions on separable Hilbert spaces allows us to obtain an equivalent stochastic optimality property for the principal component subspaces of random elements on separable Hilbert spaces. This property holds even when second moments do not exist. These lower-dimensional approximations can be very useful in identifying potential outliers among high-dimensional or functional observations. In this talk we propose a new class of robust estimators for principal components, which is consistent for elliptical random vectors, and Fisher-consistent for elliptically distributed random elements on arbitrary Hilbert spaces. We illustrate our method on two real functional data sets, where the robust estimator is able to discover atypical observations in the data that would have been missed otherwise. This talk is the result of recent collaborations with Graciela Boente (Buenos Aires, Argentina) and David Tyler (Rutgers, USA).
Speaker: Matías Salibián-Barrera is an Associate Professor in the Department of Statistics at The University of British Columbia, and the recipient of the 2015 CRM-SSC Prize.

February 19, 2016

McGill Statistics Seminar
James McVittie An introduction to statistical lattice models and observables

15:30-16:30

BURN 1205
Abstract: The study of convergence of random walks to well defined curves is founded in the fields of complex analysis, probability theory, physics and combinatorics. The foundations of this subject were motivated by physicists interested in the properties of one-dimensional models that represented some form of physical phenomenon. By taking physical models and generalizing them into abstract mathematical terms, macroscopic properties about the model could be determined from the microscopic level. By using model specific objects known as observables, the convergence of the random walks on particular lattice structures can be proven to converge to continuous curves such as Brownian Motion or Stochastic Loewner Evolution as the size of the lattice step approaches 0. This seminar will introduce the field of statistical lattice models, the types of observables that can be used to prove convergence as well as a proof for the q-state Potts model showing that local non-commutative matrix observables do not exist. No prior physics knowledge is required for this seminar.
Speaker: James McVittie is an MSc student in our Department, his supervisors are David Stephens and David Wolfson.

February 26, 2016

McGill Statistics Seminar
Etienne Marceau Aggregation methods for portfolios of dependent risks with Archimedean copulas

15:30-16:30

BURN 1205
Abstract: In this talk, we will consider a portfolio of dependent risks represented by a vector of dependent random variables whose joint cumulative distribution function (CDF) is defined with an Archimedean copula. Archimedean copulas are very popular and their extensions, nested Archimedean copulas, are well suited for vectors of random vectors in high dimension. I will describe a simple approach which makes it possible to compute the CDF of the sum or a variety of other functions of those random variables. In particular, I will derive the CDF and the TVaR of the sum of those risks using the Frank copula, the Shifted Negative Binomial copula, and the Ali-Mikhail-Haq (AMH) copula. The computation of the contribution of each risk under the TVaR-based allocation rule will also be illustrated. Finally, the links between the Clayton copula, the Shifted Negative Binomial copula, and the AMH copula will be discussed.
Speaker: Etienne Marceau is a professor in the School of Actuarial Science at Université Laval, Québec City. His research interests include actuarial mathematics, risk theory, and dependence modeling. He is the author of numerous research articles and of the book « Modélisation et évaluation des risques en actuariat » (Springer, 2013).

March 10, 2016

Colloque de mathématiques de Montréal
Gennady Samorodnitsky Ridges and valleys in the high excursion sets of Gaussian random fields

15:30-16:30

MAASS 217, McGill
Abstract: It is well known that normal random variables do not like taking large values.  Therefore, a continuous Gaussian random field on a compact set does not like exceeding a large level.  If it does exceed a large level at some point, it tends to go back below the level a short distance away from that point.  One, therefore, does not expect the excursion set above a high for such a field to possess any interesting structure.  Nonetheless, if we want to know how likely are two points in such an excursion set to be connected by a path ("a ridge") in the excursion set, how do we figure that out? If we know that a ridge in the excursion set exists (e.g.  the field is above a high level on the surface of a sphere), how likely is there to be also a valley (e.g.  the field going to below a fraction of the level somewhere inside that sphere)?

We use the large deviation approach.  Some surprising results (and pictures) are obtained.
Speaker: Gennady Samorodnitsky is a Professor in the School of Operations Research and Information Engineering at Cornell. He is a distinguished probabilist with broad interests, including stochastic models, long-range dependence, random fields, scale-free random graphs and extreme-value theory.

March 11, 2016

McGill Statistics Seminar
Han Liu Nonparametric graphical models: Foundation and trends

15:30-16:30

BURN 1205
Abstract: We consider the problem of learning the structure of a non-Gaussian graphical model. We introduce two strategies for constructing tractable nonparametric graphical model families. One approach is through semiparametric extension of the Gaussian or exponential family graphical models that allows arbitrary graphs. Another approach is to restrict the family of allowed graphs to be acyclic, enabling the use of fully nonparametric density estimation in high dimensions. These two approaches can both be viewed as adding structural regularization to a general pairwise nonparametric Markov random field and reflect an interesting tradeoff of model flexibility with structural complexity. In terms of graph estimation, these methods achieve the optimal parametric rates of convergence. In terms of computation, these methods are as scalable as the best implemented parametric methods. Such a "free-lunch phenomenon" makes them extremely attractive for large-scale applications. We will also introduce several new research directions along this line of work, including latent-variable extension, model-based nonconvex optimization, graph uncertainty assessment, and nonparametric graph property testing.
Speaker: Han Liu is an Assistant Professor of Operations Research and Financial Engineering at Princeton University, Princeton, NJ. He is a member of the Stastistical Machine Learning Lab and recipient of the prestigious 2015 IMS Tweedie New Researcher Award.

March 18, 2016

McGill Statistics Seminar
William E. Strawderman Robust minimax shrinkage estimation of location vectors under concave loss

15:30-16:30

BURN 1205
Abstract: We consider the problem of estimating the mean vector, q, of a multivariate spherically symmetric
distribution under a loss function which is a concave function of squared error. In particular we find
conditions on the shrinkage factor under which Stein-type shrinkage estimators dominate the usual
minimax best equivariant estimator. In problems where the scale is known, minimax shrinkage factors
which generally depend on both the loss and the sampling distribution are found. When the scale is
estimated through the squared norm of a residual vector, for a large subclass of concave losses, we find
minimax shrinkage factors which are independent of both the loss and the underlying distribution.
Recent applications in predictive density estimation are examples where such losses arise naturally.
Speaker: William E. Strawderman is Professor in the Department of Statistics at Rutgers University, Piscataway, NJ.

April 1, 2016

McGill Statistics Seminar
 Michel Harel Asymptotic behavior of binned kernel density estimators for locally non-stationary random fields

15:30-16:30

BURN 1205
Abstract: In this talk, I will describe the finite- and large-sample behavior of binned kernel density estimators for dependent and locally non-stationary random fields converging to stationary random fields. In addition to looking at the bias and asymptotic normality of the estimators, I will present results from a simulation study which shows that the kernel density estimator and the binned kernel density estimator have the same behavior and both estimate accurately the true density when the number of fields increases. This work finds applications in various fields, including the study of epidemics and mining research. My specific illustration will be concerned with the 2002 incidence rates of tuberculosis in the departments of France.
Speaker: Michel Harel is Professor of Statistics at the Université de Limoges and a member of the Institut de mathématiques de Toulouse, France.

April 8, 2016

McGill Statistics Seminar
Ruth Heller Multivariate tests of associations based on univariate tests

15:30-16:30

BURN 1205
Abstract: For testing two random vectors for independence, we consider testing whether the distance of one vector from an arbitrary center point is independent from the distance of the other vector from an arbitrary center point by a univariate test. We provide conditions under which it is enough to have a consistent univariate test of independence on the distances to guarantee that the power to detect dependence between the random vectors increases to one, as the sample size increases. These conditions turn out to be minimal. If the univariate test is distribution-free, the multivariate test will also be distribution-free. If we consider multiple center points and aggregate the center-specific univariate tests, the power may be further improved. We suggest a specific aggregation method for which the resulting multivariate test will be distribution-free if the univariate test is distribution-free. We show that several multivariate tests recently proposed in the literature can be viewed as instances of this general approach.
Speaker: Ruth Heller is a Senior Lecturer in the Department of Statistics and Operations Research at Tel-Aviv University.

Go to top

Fall Term 2016

Date Event Speaker(s) Title Time Location

September 9, 2016

McGill Statistics Seminar
Fei Gu
Two-set canonical variate model in multiple populations with invariant loadings

15:30-16:30

BURN 1205
Abstract:

Goria and Flury (Definition 2.1, 1996) proposed the two-set canonical variate model (referred to as the CV-2 model hereafter) and its extension in multiple populations with invariant weight coefficients (Definition 2.2). The equality constraints imposed on the weight coefficients are in line with the approach to interpreting the canonical variates (i.e., the linear combinations of original variables) advocated by Harris (1975, 1989), Rencher (1988, 1992), and Rencher and Christensen (2003). However, the literature in psychology and education shows that the standard approach adopted by most researchers, including Anderson (2003), is to use the canonical loadings (i.e., the correlations between the canonical variates and the original variables in the same set) to interpret the canonical variates. In case of multicollinearity (giving rise to the so-called suppression effects) among the original variables, it is not uncommon to obtain different interpretations from the two approaches. Therefore, following the standard approach in practice, an alternative (probably more realistic) extension of Goria and Flury’s CV-2 model in multiple populations is to impose the equality constraints on the canonical loadings. The utility of this multiple-population extension are illustrated with two numeric examples.

Speaker: Fei Gu is an Assistant Professor at the Department of Psychology, McGill University.

September 16, 2016

CRM Colloque de statistique
Prakasa Rao
Statistical inference for fractional diffusion processes

16:00-17:00

LB-921.04, Library Building, Concordia Univ.
Abstract:

There are some time series which exhibit long-range dependence as noticed by Hurst in his investigations of river water levels along Nile river. Long-range dependence is connected with the concept of self-similarity in that increments of a self-similar process with stationary increments exhibit long-range dependence under some conditions. Fractional Brownian motion is an example of such a process. We discuss statistical inference for stochastic processes modeled by stochastic differential equations driven by a fractional Brownian motion. These processes are termed as fractional diffusion processes. Since fractional Brownian motion is not a semimartingale, it is not possible to extend the notion of a stochastic integral with respect to a fractional Brownian motion following the ideas of Ito integration. There are other methods of extending integration with respect to a fractional Brownian motion. Suppose a complete path of a fractional diffusion process is observed over a finite time interval. We will present some results on inference problems for such processes.

Speaker: Dr. B.L.S. Prakasa Rao is Ramanujan Chair Professor at CR Rao Advanced Institute, Hyderabad, India
September 23, 2016
McGill Statistics Seminar
Jean-François Coeurjolly

Stein estimation of the intensity parameter of a stationary spatial Poisson point process

15:30-16:30

BURN 1205
Abstract:

We revisit the problem of estimating the intensity parameter of a homogeneous Poisson point process observed in a bounded window of $R^d$ making use of a (now) old idea going back to James and Stein. For this, we prove an integration by parts formula for functionals defined on the Poisson space. This formula extends the one obtained by Privault and Réveillac (Statistical inference for Stochastic Processes, 2009) in the one-dimensional case and is well-suited to a notion of derivative of Poisson functionals which satisfy the chain rule. The new estimators can be viewed as biased versions of the MLE with a tailored-made bias designed to reduce the variance of the MLE. We study a large class of examples and show that with a controlled probability the corresponding estimator outperforms the MLE. We illustrate in a simulation study that for very reasonable practical cases (like an intensity of 10 or 20 of a Poisson point process observed in the d-dimensional euclidean ball of with d = 1, ..., 5), we can obtain a relative (mean squared error) gain above 20% for the Stein estimator with respect to the maximum likelihood. This is a joint work with M. Clausel and J. Lelong (Univ. Grenoble Alpes, France).

Speaker: Jean-François Coeurjolly is a Professor in the Department of Mathematics at Université du Québec à Montréal (UQÀM).

September 30, 2016

McGill Statistics Seminar
Hui Zou
CoCoLasso for high-dimensional error-in-variables regression

15:30-16:30

BURN 1205
Abstract:

Much theoretical and applied work has been devoted to high-dimensional regression with clean data. However, we often face corrupted data in many applications where missing data and measurement errors cannot be ignored. Loh and Wainwright (2012) proposed a non-convex modification of the Lasso for doing high-dimensional regression with noisy and missing data. It is generally agreed that the virtues of convexity contribute fundamentally the success and popularity of the Lasso. In light of this, we propose a new method named CoCoLasso that is convex and can handle a general class of corrupted datasets including the cases of additive measurement error and random missing data. We establish the estimation error bounds of CoCoLasso and its asymptotic sign-consistent selection property. We further elucidate how the standard cross validation techniques can be misleading in presence of measurement error and develop a novel corrected cross-validation technique by using the basic idea in CoCoLasso. The corrected cross-validation has its own importance. We demonstrate the superior performance of our method over the non-convex approach by simulation studies.

Speaker: Hui Zou is a Professor in the School of Statistics at the University of Minnesota.

October 7, 2016

McGill Statistics Seminar
Luc Devroye
Cellular tree classifiers

15:30-16:30

BURN 1205
Abstract:

Suppose that binary classification is done by a tree method in which the leaves of a tree correspond to a partition of d-space. Within a partition, a majority vote is used. Suppose furthermore that this tree must be constructed recursively by implementing just two functions, so that the construction can be carried out in parallel by using "cells": first of all, given input data, a cell must decide whether it will become a leaf or internal node in the tree. Secondly, if it decides on an internal node, it must decide how to partition the space linearly. Data are then split into two parts and sent downstream to two new independent cells. We discuss the design and properties of such classifiers.

Speaker: Luc P. Devroye is a James McGill Professor in the School of Computer Science of McGill University. Since joining the McGill faculty in 1977 he has won numerous awards, including an E.W.R. Steacie Memorial Fellowship (1987), a Humboldt Research Award (2004), the Killam Prize (2005) and the Statistical Society of Canada gold medal (2008). He received an honorary doctorate from the Université catholique de Louvain in 2002, and he received an honorary doctorate from Universiteit Antwerpen on March 29, 2012.

October 14, 2016

McGill Statistics Seminar
Geneviève Lefebvre
A Bayesian finite mixture of bivariate regressions model for causal mediation analyses

15:30-16:30

BURN 1205
Abstract:

Building on the work of Schwartz, Gelfand and Miranda (Statistics in Medicine (2010); 29(16), 1710-23), we propose a Bayesian finite mixture of bivariate regressions model for causal mediation analyses. Using an identifiability condition within each component of the mixture, we express the natural direct and indirect effects of the exposure on the outcome as functions of the component-specific regression coefficients. On the basis of simulated data, we examine the behaviour of the model for estimating these effects in situations where the associations between exposure, mediator and outcome are confounded, or not. Additionally, we demonstrate that this mixture model can be used to account for heterogeneity arising through unmeasured binary mediator-outcome confounders. Finally, we apply our mediation mixture model to estimate the natural direct and indirect effects of exposure to inhaled corticosteroids during pregnancy on birthweight using a cohort of asthmatic women from the province of Québec.

Speaker: Geneviève Lefebvre is an Associate Professor in the Department of Mathematics at the Université du Québec à Montréal (UQAM)
October 21, 2016
McGill Statistics Seminar
Chien-Lin Su

Statistical analysis of two-level hierarchical clustered data

15:30-16:30

BURN 1205
Abstract:

Multi-level hierarchical clustered data are commonly seen in financial and biostatistics applications. In this talk, we introduce several modeling strategies for describing the dependent relationships for members within a cluster or between different clusters (in the same or different levels). In particular we will apply the hierarchical Kendall copula, first proposed by Brechmann (2014), to model two-level hierarchical clustered survival data. This approach provides a clever way of dimension reduction in modeling complicated multivariate data. Based on the model assumptions, we propose statistical inference methods, including parameter estimation and a goodness-of-fit test, suitable for handling censored data. Simulation and data analysis results are also presented.

Speaker: Chien-Lin Su is a postdoc fellow under the supervision of Professor Russell Steele (McGill) and Lajmi Lakhal-Chaieb (Laval). He received his Master's degree in mathematics in 2009 and PhD degree in statistics from National Chiao Tung University (NCTU), Taiwan in 2015. His research interests include multivariate survival analysis and copula research in biomedical and financial applications. He received a grant from National Science Council (NSC) of Taiwan and conducted research as a research trainee with the supervision of Professor Johanna G. Nešlehová from July 2013 to February 2014.

October 28, 2016

CRM Colloque de statistique
Jerry Lawless
Efficient tests of covariate effects in two-phase failure time studies

15:30-16:30

BURN 1205
Abstract:

Two-phase studies are frequently used when observations on certain variables are expensive or difficult to obtain. One such situation is when a cohort exists for which certain variables have been measured (phase 1 data); then, a sub-sample of individuals is selected, and additional data are collected on them (phase 2). Efficiency for tests and estimators can be increased by basing the selection of phase 2 individuals on data collected at phase 1. For example, in large cohorts, expensive genomic measurements are often collected at phase 2, with oversampling of persons with “extreme” phenotypic responses. A second example is case-cohort or nested case-control studies involving times to rare events, where phase 2 oversamples persons who have experienced the event by a certain time. In this talk I will describe two-phase studies on failure times, present efficient methods for testing covariate effects. Some extensions to more complex outcomes and areas needing further development will be discussed.

Speaker: Jerry Lawless is a Distinguished Professor Emeritus in the Department of Statistics and Actuarial Science at the University of Waterloo. He has been a consultant to industry and government, is a past editor of Technometrics and a past president of the Statistical Society of Canada. He is a Fellow of the American Statistical Association (1983) and of the Institute of Mathematical Statistics (1990), and a recipient of the Gold Medal of the Statistical Society of Canada (1999). He was elected a Fellow of the Royal Society of Canada in 2000.

November 2, 2016

McGill Statistics Seminar
Tim Hesterberg

First talk: Bootstrap in practice

Second talk: Statistics and Big Data at Google

1. 15:00-16:00
2. 17:35-18:25

1st: BURN 306
2nd: ADAMS AUD
Abstract:

First talk: This talk focuses on three practical aspects of resampling: communication, accuracy, and software. I'll introduce the bootstrap and permutation tests, and discussed how they may be used to help clients understand statistical results. I'll talk about accuracy -- there are dramatic differences in how accurate different bootstrap methods are. Surprisingly, the most common bootstrap methods are less accurate than classical methods for small samples, and more accurate for larger samples. There are simple variations that dramatically improve the accuracy. Finally, I'll compare two R packages, the the easy-to-use "resample" package, and the more-powerful "boot" package.

Second talk: Google lives on data. Search, Ads, YouTube, Maps, ... - they all live on data. I'll tell stories about how we use data, how we're always experimenting to make improvements (yes, this includes your searches), and how we adapt statistical ideas to do things that have never been done before.

Speaker: Tim Hesterberg is a Senior Statistician at Google. He received his PhD in Statistics from Stanford University, under Brad Efron. He is on the executive boards of the National Institute of Statistical Sciences and the Interface Foundation of North America (Interface between Computing Science and Statistics).

November 4, 2016

McGill Statistics Seminar
Sean Lawlor and Alexandre Piché


Lawlor: Time-varying mixtures of Markov chains: An application to traffic modeling

Piché: Bayesian nonparametric modeling of heterogeneous groups of censored data

 

15:30-16:30

BURN 1205
Abstract:

Piché: Analysis of survival data arising from different groups, whereby the data in each group is scarce, but abundant overall, is a common issue in applied statistics. Bayesian nonparametrics are tools of choice to handle such datasets given their ability to share information across groups. In this presentation, we will compare three popular Bayesian nonparametric methods on the modeling of survival functions coming from related heterogeneous groups. Specifically, we will first compare the modeling accuracy of the Dirichlet process, the hierarchical Dirichlet process, and the nested Dirichlet process on simulated datasets of different sizes, where groups differ in shape or in expectation, and finally we will compare the models on real world injury datasets.

Lawlor: Time-varying mixture models are useful for representing complex, dynamic distributions. Components in the mixture model can appear and disappear, and persisting components can evolve. This allows great flexibility in streaming data applications where the model can be adjusted as new data arrives. Fitting a mixture model, especially when the model order varies with time, with computational guarantees which can meet real-time requirements is difficult with existing algorithms. Multiple issues exist with existing approximate inference methods ranging from estimation of the model order to random restarts due to the ability to converge to different local minima. Monte-Carlo methods can be used to estimate the parameters of the generating distribution and estimate the model order, but when the distribution of each mixand has a high-dimensional parameter space, they suffer from the curse of dimensionality and can take far too long to converge. This paper proposes a generative model for time-varying mixture models, tailored for mixtures of discrete-time Markov chains. A novel, deterministic inference procedure is introduced and is shown to be suitable for applications requiring real-time estimation. The method is guaranteed to converge to a local maximum of the posterior likelihood at each time step with a computational complexity which is low enough for real-time applications. As a motivating application, we model and predict traffic patterns in a transportation network. Experiments illustrate the performance of the scheme and offer insights regarding tuning of the parameters of the algorithm. The experiments also investigate the predictive power of the fitted model compared to less complex models and demonstrate the superiority of the mixture model approach for prediction of traffic routes in real data.

Speaker:

Sean Lawlor is a Doctoral Candidate in Electrical Engineering from the Department of Electrical and Computer Engineering, McGill University 

Alexandre Piché is a MSc student in our Department. His supervisor is Russell Steele.

November 11, 2016

McGill Statistics Seminar
Teng Zhang
Tyler's M-estimator: Subspace recovery and high-dimensional regime

15:30-16:30

BURN 1205
Abstract:

Given a data set, Tyler's M-estimator is a widely used covariance matrix estimator with robustness to outliers or heavy-tailed distribution. We will discuss two recent results of this estimator. First, we show that when a certain percentage of the data points are sampled from a low-dimensional subspace, Tyler's M-estimator can be used to recover the subspace exactly. Second, in the high-dimensional regime that the number of samples n and the dimension p both go to infinity, p/n converges to a constant y between 0 and 1, and when the data samples are identically and independently generated from the Gaussian distribution N(0,I), we showed that the difference between the sample covariance matrix and a scaled version of Tyler's M-estimator tends to zero in spectral norm, and the empirical spectral densities of both estimators converge to the Marcenko-Pastur distribution. We also prove that when the data samples are generated from an elliptical distribution, the limiting distribution of Tyler's M-estimator converges to a Marcenko-Pastur-Type distribution. The second part is joint work with Xiuyuan Cheng and Amit Singer.

Speaker: Teng Zhang is an Assistant Professor in the Department of Mathematics at the University of Central Florida.

November 18, 2016

McGill Statistics Seminar
Yoshua Bengio
Progress in theoretical understanding of deep learning

15:30-16:30

BURN 1205
Abstract:

Deep learning has arisen around 2006 as a renewal of neural networks research allowing such models to have more layers. Theoretical investigations have shown that functions obtained as deep compositions of simpler functions (which includes both deep and recurrent nets) can express highly varying functions (with many ups and downs and different input regions that can be distinguished) much more efficiently (with fewer parameters) than otherwise, under a prior which seems to work well for artificial intelligence tasks. Empirical work in a variety of applications has demonstrated that, when well trained, such deep architectures can be highly successful, remarkably breaking through previous state-of-the-art in many areas, including speech recognition, object recognition, language models, machine translation and transfer learning. Although neural networks have long been considered lacking in theory and much remains to be done, theoretical advances have been made and will be discussed, to support distributed representations, depth of representation, the non-convexity of the training objective, and the probabilistic interpretation of learning algorithms (especially of the auto-encoder type, which were lacking one). The talk will focus on the intuitions behind these theoretical results.

Speaker: Yoshua Bengio is a Professor of the Department of Computer Science and Operations Research at the University of Montreal, head of the Montreal Institute for Learning Algorithms (MILA), CIFAR Program co-director of the CIFAR Neural Computation and Adaptive Perception program, Canada Research Chair in Statistical Learning Algorithms.
November 25, 2016
McGill Statistics Seminar
Alexandra Schmidt

Spatio-temporal models for skewed processes

15:30-16:30

BURN 1205
Abstract:

In the analysis of most spatio-temporal processes in environmental studies, observations present skewed distributions. Usually, a single transformation of the data is used to approximate normality, and stationary Gaussian processes are assumed to model the transformed data. The choice of transformation is key for spatial interpolation and temporal prediction. We propose a spatio-temporal model for skewed data that does not require the use of data transformation. The process is decomposed as the sum of a purely temporal structure with two independent components that are considered to be partial realizations from independent spatial Gaussian processes, for each time t. The model has an asymmetry parameter that might vary with location and time, and if this is equal to zero, the usual Gaussian model results. The inference procedure is performed under the Bayesian paradigm, and uncertainty about parameters estimation is naturally accounted for. We fit our model to different synthetic data and to monthly average temperature observed between 2001 and 2011 at monitoring locations located in the south of Brazil. Different model comparison criteria, and analysis of the posterior distribution of some parameters, suggest that the proposed model outperforms standard ones used in the literature. This is joint work with Kelly Gonçalves (UFRJ, Brazil) and Patricia L. Velozo (UFF, Brazil)

Speaker: Alexandra M. Schmidt is an Associate Professor of Biostatistics in the Department of Epidemiology, Biostatistics, and Occupational Health at McGill University.

December 1, 2016

CRM Colloque de statistique
Richard Samworth
High-dimensional changepoint estimation via sparse projection

15:30-16:30

BURN 708
Abstract:

Changepoints are a very common feature of Big Data that arrive in the form of a data stream. We study high-dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the coordinates. The challenge is to borrow strength across the coordinates in order to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called 'inspect' for estimation of the changepoints: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimisation problem derived from the CUSUM transformation of the time series. We then apply an existing univariate changepoint detection algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated changepoints and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data generating mechanisms.

Speaker: Richard Samworth is a Professor of Statistics in the Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge. He is a Fellow of the American Statistical Association (2015) and of the Institute of Mathematical Statistics (2014), and a recipient of the Philip Leverhulme Prize, Leverhulme Trust (2014) and the Guy Medal in Bronze, Royal Statistical Society (2012).

December 2, 2016

McGill Statistics Seminar
Andrea Giussani
Modeling dependence in bivariate multi-state processes: A frailty approach

15:30-16:30

BURN 1205
Abstract:

The aim of this talk is to present a statistical framework for the analysis of dependent bivariate multistate processes, allowing one to study the dependence both across subjects in a pair and among individual-specific events. As for the latter, copula- based models are employed, whereas dependence between multi-state models can be accomplished by means of frailties. The well known Marshall-Olkin Bivariate Exponential Distribution (MOBVE) is considered for the joint distribution of frailties. The reason is twofold: on the one hand, it allows one to model shocks that affect the two individual-specific frailties; on the other hand, the MOBVE is the only bivariate exponential distribution with exponential marginals, which allows for the modeling of each multi-state process as a shared frailty model. We first discuss a frailty bivariate survival model with some new results, and then move to the construction of the frailty bivariate multi-state model, with the corresponding observed data likelihood maximization estimating procedure in presence of right censoring. The last part of the talk will be dedicated to some open problems related to the modeling of multiple multi-state processes in presence of Marshall-Olkin type copulas.

Speaker: Andrea Giussani is a PhD candidate in Statistics at Bocconi University, Milan (Italy). His current scientific research is focused on event-history and survival data analysis.

Go to top

Winter Term 2017

Date Event Speaker(s) Title Time Location
January 13, 2017
McGill Statistics Seminar
Victor Veitch

(Sparse) exchangeable graphs

15:30-16:30

BURN 1205
Abstract:

Many popular statistical models for network valued datasets fall under the remit of the graphon framework, which (implicitly) assumes the networks are densely connected. However, this assumption rarely holds for the real-world networks of practical interest. We introduce a new class of models for random graphs that generalises the dense graphon models to the sparse graph regime, and we argue that this meets many of the desiderata one would demand of a model to serve as the foundation for a statistical analysis of real-world networks. The key insight is to define the models by way of a novel notion of exchangeability; this is analogous to the specification of conditionally i.i.d. models by way of de Finetti's representation theorem. We further develop this model class by explaining the foundations of sampling and estimation of network models in this setting. The later result can be can be understood as the (sparse) graph analogue of estimation via the empirical distribution in the i.i.d. sequence setting.

Speaker: Victor Veitch is a PhD candidate in the Department of Statistical Sciences at the University of Toronto working in the group of Daniel Roy. He is interested in the theory and application of machine learning and statistical inference, with a particular focus on Bayesian non-parametrics and random networks.

January 20, 2017

McGill Statistics Seminar
Tudor Manole
Order selection in multidimensional finite mixture models

15:30-16:30

BURN 1205
Abstract:

Finite mixture models provide a natural framework for analyzing data from heterogeneous populations. In practice, however, the number of hidden subpopulations in the data may be unknown. The problem of estimating the order of a mixture model, namely the number of subpopulations, is thus crucial for many applications. In this talk, we present a new penalized likelihood solution to this problem, which is applicable to models with a multidimensional parameter space. The order of the model is estimated by starting with a large number of mixture components, which are clustered and then merged via two penalty functions. Doing so estimates the unknown parameters of the mixture, at the same time as the order. We will present extensive simulation studies, showing our approach outperforms many of the most common methods for this problem, such as the Bayesian Information Criterion. Real data examples involving normal and multinomial mixtures further illustrate its performance.

Speaker: Tudor Manole is currently in our honors undergraduate Math. program working with Abbas Khalili.

January 27, 2017

CRM-SSC Prize 2012 Colloque
Radu Craiu
Bayesian inference for conditional copula models

15:30-16:30

ROOM 6254

Pavillon Andre-Aisenstadt 2920, UdeM

Abstract:

Conditional copula models describe dynamic changes in dependence and are useful in establishing high dimensional dependence structures or in joint modelling of response vectors in regression settings. We describe some of the methods developed for estimating the calibration function when multiple predictors are needed and for resolving some of the model choice questions concerning the selection of copula families and the shape of the calibration function. This is joint work with Evgeny Levi, Avideh Sabeti and Mian Wei.

Speaker: Radu Craiu is Professor of Statistics in the Department of Statistical Sciences at University of Toronto.

February 3, 2017

McGill Statistics Seminar
Hua Zhou
MM algorithms for variance component models

15:30-16:30

BURN 1205
Abstract:

Variance components estimation and mixed model analysis are central themes in statistics with applications in numerous scientific disciplines. Despite the best efforts of generations of statisticians and numerical analysts, maximum likelihood estimation and restricted maximum likelihood estimation of variance component models remain numerically challenging. In this talk, we present a novel iterative algorithm for variance components estimation based on the minorization-maximization (MM) principle. MM algorithm is trivial to implement and competitive on large data problems. The algorithm readily extends to more complicated problems such as linear mixed models, multivariate response models possibly with missing data, maximum a posteriori estimation, and penalized estimation. We demonstrate, both numerically and theoretically, that it converges faster than the classical EM algorithm when the number of variance components is greater than two.

Speaker: Hua Zhou is Associate Professor of Biostatistics in Department of Biostatistics, UCLA School of Public Health.
February 10, 2017
McGill Statistics Seminar
Zhihua Su

Sparse envelope model: Efficient estimation and response variable selection in multivariate linear regression

15:30-16:30

BURN 1205
Abstract:

The envelope model is a method for efficient estimation in multivariate linear regression. In this article, we propose the sparse envelope model, which is motivated by applications where some response variables are invariant to changes of the predictors and have zero regression coefficients. The envelope estimator is consistent but not sparse, and in many situations it is important to identify the response variables for which the regression coefficients are zero. The sparse envelope model performs variable selection on the responses and preserves the efficiency gains offered by the envelope model. Response variable selection arises naturally in many applications, but has not been studied as thoroughly as predictor variable selection. In this article, we discuss response variable selection in both the standard multivariate linear regression and the envelope contexts. In response variable selection, even if a response has zero coefficients, it still should be retained to improve the estimation efficiency of the nonzero coefficients. This is different from the practice in predictor variable selection. We establish consistency, the oracle property and obtain the asymptotic distribution of the sparse envelope estimator.

Speaker: Zhihua Su is an Assistant Professor of Statistics in the Department of Statistics at the University of Florida.

February 17, 2017

McGill Statistics Seminar
Joelle Pineau
Building end-to-end dialogue systems using deep neural architectures

15:30-16:30

BURN 1205
Abstract:

The ability for a computer to converse in a natural and coherent manner with a human has long been held as one of the important steps towards solving artificial intelligence. In this talk I will present recent results on building dialogue systems from large corpuses using deep neural architectures. I will highlight several challenges related to data acquisition, algorithmic development, and performance evaluation.

Speaker: Joelle Pineau is an associate professor of Computer Science at McGill University, where she co-directs the Reasoning and Learning Lab.

February 24, 2017

McGill Statistics Seminar
James A. Hanley
The first pillar of statistical wisdom

15:30-16:30

BURN 1205
Abstract:

This talk will provide an introduction to the first of the pillars in Stephen Stigler's 2016 book The Seven Pillars of Statistical Wisdom, namely “Aggregation.” It will focus on early instances of the sample mean in scientific work, on the early error distributions, and on how their “centres” were fitted.

Speaker: James A. Hanley is a Professor in the Department of Epidemiology, Biostatistics and Occupational Health, at McGill University.

March 10, 2017

McGill Statistics Seminar
Nima Aghaeepour
High-throughput single-cell biology: The challenges and opportunities for machine learning scientists

15:30-16:30

BURN 1205
Abstract:

The immune system does a lot more than killing “foreign” invaders. It’s a powerful sensory system that can detect stress levels, infections, wounds, and even cancer tumors. However, due to the complex interplay between different cell types and signaling pathways, the amount of data produced to characterize all different aspects of the immune system (tens of thousands of genes measured and hundreds of millions of cells, just from a single patient) completely overwhelms existing bioinformatics tools. My laboratory specializes in the development of machine learning techniques that address the unique challenges of high-throughput single-cell immunology. Sharing our lab space with a clinical and an immunological research laboratory, my students and fellows are directly exposed to the real-world challenges and opportunities of bringing machine learning and immunology to the (literal) bedside.

Speaker: Nima Aghaeepour is a CIHR Fellow, an ISAC Scholar, and an OCRF Ann Schreiber Investigator with Garry Nolan at Stanford University.
March 17, 2017
CRM Colloque de statistique
Sayan Mukherjee

Inference in dynamical systems

15:30-16:30

BURN 1205
Abstract:

We consider the asymptotic consistency of maximum likelihood parameter estimation for dynamical systems observed with noise. Under suitable conditions on the dynamical systems and the observations, we show that maximum likelihood parameter estimation is consistent. Furthermore, we show how some well-studied properties of dynamical systems imply the general statistical properties related to maximum likelihood estimation. Finally, we exhibit classical families of dynamical systems for which maximum likelihood estimation is consistent. Examples include shifts of finite type with Gibbs measures and Axiom A attractors with SRB measures. We also relate Bayesian inference to the thermodynamic formalism in tracking dynamical systems.

Speaker: Sayan Mukherjee is a Professor in the Department of Statistical Science at Duke University. His research interest is in Geometry and topology in probabilistic modeling, Statistical and computational biology, Modeling of massive data.

March 24, 2017

McGill Statistics Seminar
Hamid Pezeshk
Bayesian sample size determination for clinical trials

15:30-16:30

BURN 1205
Abstract:

Sample size determination problem is an important task in the planning of clinical trials. The problem may be formulated formally in statistical terms. The most frequently used methods are based on the required size, and power of the trial for a specified treatment effect. In contrast to the Bayesian decision-theoretic approach, there is no explicit balancing of the cost of a possible increase in the size of the trial against the benefit of the more accurate information which it would give. In this talk a fully Bayesian approach to the sample size determination problem is discussed. This approach treats the problem as a decision problem and employs a utility function to find the optimal sample size of a trial. Furthermore, we assume that a regulatory authority, which is deciding on whether or not to grant a licence to a new treatment, uses a frequentist approach. The optimal sample size for the trial is then found by maximising the expected net benefit, which is the expected benefit of subsequent use of the new treatment minus the cost of the trial.

Speaker: Hamid Pezeshk is a Professor from the School of Mathematics, Statistics and Computer Science University of Tehran and the School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran.
March 31, 2017
McGill Statistics Seminar
Chen Xu

Distributed kernel regression for large-scale data

15:30-16:30

BURN 1205
Abstract:

In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a strategy, a full dataset is first split into several manageable segments; the final output is then aggregated from the individual outputs of the segments. Despite its popularity in practice, it remains largely unknown that whether such a distributive strategy provides valid theoretical inferences to the original data; if so, how efficient does it work? In this talk, I address these fundamental issues for the non-parametric distributed kernel regression, where accurate prediction is the main learning task. I will begin with the naive simple averaging algorithm and then talk about an improved approach via ADMM. The promising preference of these methods is supported by both simulation and real data examples.

Speaker: Chen Xu is an Assistant Professor in the Department of Mathematics and Statistics, University of Ottawa.
April 6, 2017
CRM Colloque de statistique
Jason Fine

Instrumental Variable Regression with Survival Outcomes

15:30-16:30

Universite Laval, Pavillon Vachon, Salle 3840
Abstract:

Instrumental variable (IV) methods are popular in non-experimental studies to estimate the causal effects of medical interventions or exposures. These approaches allow for the consistent estimation of such effects even if important confounding factors are unobserved. Despite the increasing use of these methods, there have been few extensions of IV methods to censored data regression problems. We discuss challenges in applying IV structural equational modelling techniques to the proportional hazards model and suggest alternative modelling frameworks. We demonstrate the utility of the accelerated lifetime and additive hazards models for IV analyses with censored data. Assuming linear structural equation models for either the event time or the hazard function, we proposed closed-form, two-stage estimators for the causal effect in the structural models for the failure time outcomes. The asymptotic properties of the estimators are derived and the resulting inferences are shown to perform well in simulation studies and in an application to a data set on the effectiveness of a novel chemotherapeutic agent for colon cancer.

Speaker: Jason Fine is a professor with tenure jointly appointed in the Department of Biostatistics and the Department of Statistics and Operations Research at UNC-Chapel Hill.

 

Website design: Dr Johanna Nešlehová

 

 

Last edited by on Fri, 09/01/2017 - 17:11