Current statistics seminars

McGill Statistics Seminar Series 2016-2017

 

Fall Term 2016

Date Event Speaker(s) Title Time Location

September 9, 2016

McGill Statistics Seminar
Fei Gu
Two-set canonical variate model in multiple populations with invariant loadings

15:30-16:30

BURN 1205
Abstract:

Goria and Flury (Definition 2.1, 1996) proposed the two-set canonical variate model (referred to as the CV-2 model hereafter) and its extension in multiple populations with invariant weight coefficients (Definition 2.2). The equality constraints imposed on the weight coefficients are in line with the approach to interpreting the canonical variates (i.e., the linear combinations of original variables) advocated by Harris (1975, 1989), Rencher (1988, 1992), and Rencher and Christensen (2003). However, the literature in psychology and education shows that the standard approach adopted by most researchers, including Anderson (2003), is to use the canonical loadings (i.e., the correlations between the canonical variates and the original variables in the same set) to interpret the canonical variates. In case of multicollinearity (giving rise to the so-called suppression effects) among the original variables, it is not uncommon to obtain different interpretations from the two approaches. Therefore, following the standard approach in practice, an alternative (probably more realistic) extension of Goria and Flury’s CV-2 model in multiple populations is to impose the equality constraints on the canonical loadings. The utility of this multiple-population extension are illustrated with two numeric examples.

Speaker: Fei Gu is an Assistant Professor at the Department of ´╗┐Psychology, McGill University.

September 16, 2016

CRM Colloque de statistique
Prakasa Rao
Statistical inference for fractional diffusion processes

16:00-17:00

LB-921.04, Library Building, Concordia Univ.
Abstract:

There are some time series which exhibit long-range dependence as noticed by Hurst in his investigations of river water levels along Nile river. Long-range dependence is connected with the concept of self-similarity in that increments of a self-similar process with stationary increments exhibit long-range dependence under some conditions. Fractional Brownian motion is an example of such a process. We discuss statistical inference for stochastic processes modeled by stochastic differential equations driven by a fractional Brownian motion. These processes are termed as fractional diffusion processes. Since fractional Brownian motion is not a semimartingale, it is not possible to extend the notion of a stochastic integral with respect to a fractional Brownian motion following the ideas of Ito integration. There are other methods of extending integration with respect to a fractional Brownian motion. Suppose a complete path of a fractional diffusion process is observed over a finite time interval. We will present some results on inference problems for such processes.

Speaker: Dr. B.L.S. Prakasa Rao is Ramanujan Chair Professor at CR Rao Advanced Institute, Hyderabad, India
September 23, 2016
McGill Statistics Seminar
Jean-François Coeurjolly

Stein estimation of the intensity parameter of a stationary spatial Poisson point process

15:30-16:30

BURN 1205
Abstract:

We revisit the problem of estimating the intensity parameter of a homogeneous Poisson point process observed in a bounded window of $R^d$ making use of a (now) old idea going back to James and Stein. For this, we prove an integration by parts formula for functionals defined on the Poisson space. This formula extends the one obtained by Privault and Réveillac (Statistical inference for Stochastic Processes, 2009) in the one-dimensional case and is well-suited to a notion of derivative of Poisson functionals which satisfy the chain rule. The new estimators can be viewed as biased versions of the MLE with a tailored-made bias designed to reduce the variance of the MLE. We study a large class of examples and show that with a controlled probability the corresponding estimator outperforms the MLE. We illustrate in a simulation study that for very reasonable practical cases (like an intensity of 10 or 20 of a Poisson point process observed in the d-dimensional euclidean ball of with d = 1, ..., 5), we can obtain a relative (mean squared error) gain above 20% for the Stein estimator with respect to the maximum likelihood. This is a joint work with M. Clausel and J. Lelong (Univ. Grenoble Alpes, France).

Speaker: Jean-François Coeurjolly is a Professor in the Department of Mathematics at Université du Québec à Montréal (UQÀM).

September 30, 2016

McGill Statistics Seminar
Hui Zou
CoCoLasso for high-dimensional error-in-variables regression

15:30-16:30

BURN 1205
Abstract:

Much theoretical and applied work has been devoted to high-dimensional regression with clean data. However, we often face corrupted data in many applications where missing data and measurement errors cannot be ignored. Loh and Wainwright (2012) proposed a non-convex modification of the Lasso for doing high-dimensional regression with noisy and missing data. It is generally agreed that the virtues of convexity contribute fundamentally the success and popularity of the Lasso. In light of this, we propose a new method named CoCoLasso that is convex and can handle a general class of corrupted datasets including the cases of additive measurement error and random missing data. We establish the estimation error bounds of CoCoLasso and its asymptotic sign-consistent selection property. We further elucidate how the standard cross validation techniques can be misleading in presence of measurement error and develop a novel corrected cross-validation technique by using the basic idea in CoCoLasso. The corrected cross-validation has its own importance. We demonstrate the superior performance of our method over the non-convex approach by simulation studies.

Speaker: Hui Zou is a Professor in the School of Statistics at the University of Minnesota.

October 7, 2016

McGill Statistics Seminar
Luc Devroye
Cellular tree classifiers

15:30-16:30

BURN 1205
Abstract:

Suppose that binary classification is done by a tree method in which the leaves of a tree correspond to a partition of d-space. Within a partition, a majority vote is used. Suppose furthermore that this tree must be constructed recursively by implementing just two functions, so that the construction can be carried out in parallel by using "cells": first of all, given input data, a cell must decide whether it will become a leaf or internal node in the tree. Secondly, if it decides on an internal node, it must decide how to partition the space linearly. Data are then split into two parts and sent downstream to two new independent cells. We discuss the design and properties of such classifiers.

Speaker: Luc P. Devroye is a James McGill Professor in the School of Computer Science of McGill University. Since joining the McGill faculty in 1977 he has won numerous awards, including an E.W.R. Steacie Memorial Fellowship (1987), a Humboldt Research Award (2004), the Killam Prize (2005) and the Statistical Society of Canada gold medal (2008). He received an honorary doctorate from the Université catholique de Louvain in 2002, and he received an honorary doctorate from Universiteit Antwerpen on March 29, 2012.

October 14, 2016

McGill Statistics Seminar
Geneviève Lefebvre
A Bayesian finite mixture of bivariate regressions model for causal mediation analyses

15:30-16:30

BURN 1205
Abstract:

Building on the work of Schwartz, Gelfand and Miranda (Statistics in Medicine (2010); 29(16), 1710-23), we propose a Bayesian finite mixture of bivariate regressions model for causal mediation analyses. Using an identifiability condition within each component of the mixture, we express the natural direct and indirect effects of the exposure on the outcome as functions of the component-specific regression coefficients. On the basis of simulated data, we examine the behaviour of the model for estimating these effects in situations where the associations between exposure, mediator and outcome are confounded, or not. Additionally, we demonstrate that this mixture model can be used to account for heterogeneity arising through unmeasured binary mediator-outcome confounders. Finally, we apply our mediation mixture model to estimate the natural direct and indirect effects of exposure to inhaled corticosteroids during pregnancy on birthweight using a cohort of asthmatic women from the province of Québec.

Speaker: Geneviève Lefebvre is an Associate Professor in the Department of Mathematics at the Université du Québec à Montréal (UQAM)
October 21, 2016
McGill Statistics Seminar
Chien-Lin Su

Statistical analysis of two-level hierarchical clustered data

15:30-16:30

BURN 1205
Abstract:

Multi-level hierarchical clustered data are commonly seen in financial and biostatistics applications. In this talk, we introduce several modeling strategies for describing the dependent relationships for members within a cluster or between different clusters (in the same or different levels). In particular we will apply the hierarchical Kendall copula, first proposed by Brechmann (2014), to model two-level hierarchical clustered survival data. This approach provides a clever way of dimension reduction in modeling complicated multivariate data. Based on the model assumptions, we propose statistical inference methods, including parameter estimation and a goodness-of-fit test, suitable for handling censored data. Simulation and data analysis results are also presented.

Speaker: Chien-Lin Su is a postdoc fellow under the supervision of Professor Russell Steele (McGill) and Lajmi Lakhal-Chaieb (Laval). He received his Master's degree in mathematics in 2009 and PhD degree in statistics from National Chiao Tung University (NCTU), Taiwan in 2015. His research interests include multivariate survival analysis and copula research in biomedical and financial applications. He received a grant from National Science Council (NSC) of Taiwan and conducted research as a research trainee with the supervision of Professor Johanna G. Nešlehová from July 2013 to February 2014.

October 28, 2016

CRM Colloque de statistique
Jerry Lawless
Efficient tests of covariate effects in two-phase failure time studies

15:30-16:30

BURN 1205
Abstract:

Two-phase studies are frequently used when observations on certain variables are expensive or difficult to obtain. One such situation is when a cohort exists for which certain variables have been measured (phase 1 data); then, a sub-sample of individuals is selected, and additional data are collected on them (phase 2). Efficiency for tests and estimators can be increased by basing the selection of phase 2 individuals on data collected at phase 1. For example, in large cohorts, expensive genomic measurements are often collected at phase 2, with oversampling of persons with “extreme” phenotypic responses. A second example is case-cohort or nested case-control studies involving times to rare events, where phase 2 oversamples persons who have experienced the event by a certain time. In this talk I will describe two-phase studies on failure times, present efficient methods for testing covariate effects. Some extensions to more complex outcomes and areas needing further development will be discussed.

Speaker: Jerry Lawless is a Distinguished Professor Emeritus in the Department of Statistics and Actuarial Science at the University of Waterloo. He has been a consultant to industry and government, is a past editor of Technometrics and a past president of the Statistical Society of Canada. He is a Fellow of the American Statistical Association (1983) and of the Institute of Mathematical Statistics (1990), and a recipient of the Gold Medal of the Statistical Society of Canada (1999). He was elected a Fellow of the Royal Society of Canada in 2000.

November 2, 2016

McGill Statistics Seminar
Tim Hesterberg

First talk: Bootstrap in practice

Second talk: Statistics and Big Data at Google

1. 15:00-16:00
2. 17:35-18:25

1st: BURN 306
2nd: ADAMS AUD
Abstract:

First talk: This talk focuses on three practical aspects of resampling: communication, accuracy, and software. I'll introduce the bootstrap and permutation tests, and discussed how they may be used to help clients understand statistical results. I'll talk about accuracy -- there are dramatic differences in how accurate different bootstrap methods are. Surprisingly, the most common bootstrap methods are less accurate than classical methods for small samples, and more accurate for larger samples. There are simple variations that dramatically improve the accuracy. Finally, I'll compare two R packages, the the easy-to-use "resample" package, and the more-powerful "boot" package.

Second talk: Google lives on data. Search, Ads, YouTube, Maps, ... - they all live on data. I'll tell stories about how we use data, how we're always experimenting to make improvements (yes, this includes your searches), and how we adapt statistical ideas to do things that have never been done before.

Speaker: Tim Hesterberg is a Senior Statistician at Google. He received his PhD in Statistics from Stanford University, under Brad Efron. He is on the executive boards of the National Institute of Statistical Sciences and the Interface Foundation of North America (Interface between Computing Science and Statistics).

November 4, 2016

McGill Statistics Seminar
Sean Lawlor and Alexandre Piché


Lawlor: Time-varying mixtures of Markov chains: An application to traffic modeling

Piché: Bayesian nonparametric modeling of heterogeneous groups of censored data

 

15:30-16:30

BURN 1205
Abstract:

Piché: Analysis of survival data arising from different groups, whereby the data in each group is scarce, but abundant overall, is a common issue in applied statistics. Bayesian nonparametrics are tools of choice to handle such datasets given their ability to share information across groups. In this presentation, we will compare three popular Bayesian nonparametric methods on the modeling of survival functions coming from related heterogeneous groups. Specifically, we will first compare the modeling accuracy of the Dirichlet process, the hierarchical Dirichlet process, and the nested Dirichlet process on simulated datasets of different sizes, where groups differ in shape or in expectation, and finally we will compare the models on real world injury datasets.

Lawlor: Time-varying mixture models are useful for representing complex, dynamic distributions. Components in the mixture model can appear and disappear, and persisting components can evolve. This allows great flexibility in streaming data applications where the model can be adjusted as new data arrives. Fitting a mixture model, especially when the model order varies with time, with computational guarantees which can meet real-time requirements is difficult with existing algorithms. Multiple issues exist with existing approximate inference methods ranging from estimation of the model order to random restarts due to the ability to converge to different local minima. Monte-Carlo methods can be used to estimate the parameters of the generating distribution and estimate the model order, but when the distribution of each mixand has a high-dimensional parameter space, they suffer from the curse of dimensionality and can take far too long to converge. This paper proposes a generative model for time-varying mixture models, tailored for mixtures of discrete-time Markov chains. A novel, deterministic inference procedure is introduced and is shown to be suitable for applications requiring real-time estimation. The method is guaranteed to converge to a local maximum of the posterior likelihood at each time step with a computational complexity which is low enough for real-time applications. As a motivating application, we model and predict traffic patterns in a transportation network. Experiments illustrate the performance of the scheme and offer insights regarding tuning of the parameters of the algorithm. The experiments also investigate the predictive power of the fitted model compared to less complex models and demonstrate the superiority of the mixture model approach for prediction of traffic routes in real data.

Speaker:

Sean Lawlor is a Doctoral Candidate in Electrical Engineering from the Department of Electrical and Computer Engineering, McGill University 

Alexandre Piché is a MSc student in our Department. His supervisor is Russell Steele.

November 11, 2016

McGill Statistics Seminar
Teng Zhang
Tyler's M-estimator: Subspace recovery and high-dimensional regime

15:30-16:30

BURN 1205
Abstract:

Given a data set, Tyler's M-estimator is a widely used covariance matrix estimator with robustness to outliers or heavy-tailed distribution. We will discuss two recent results of this estimator. First, we show that when a certain percentage of the data points are sampled from a low-dimensional subspace, Tyler's M-estimator can be used to recover the subspace exactly. Second, in the high-dimensional regime that the number of samples n and the dimension p both go to infinity, p/n converges to a constant y between 0 and 1, and when the data samples are identically and independently generated from the Gaussian distribution N(0,I), we showed that the difference between the sample covariance matrix and a scaled version of Tyler's M-estimator tends to zero in spectral norm, and the empirical spectral densities of both estimators converge to the Marcenko-Pastur distribution. We also prove that when the data samples are generated from an elliptical distribution, the limiting distribution of Tyler's M-estimator converges to a Marcenko-Pastur-Type distribution. The second part is joint work with Xiuyuan Cheng and Amit Singer.

Speaker: Teng Zhang is an Assistant Professor in the Department of Mathematics at the University of Central Florida.

November 18, 2016

McGill Statistics Seminar
Yoshua Bengio
Progress in theoretical understanding of deep learning

15:30-16:30

BURN 1205
Abstract:

Deep learning has arisen around 2006 as a renewal of neural networks research allowing such models to have more layers. Theoretical investigations have shown that functions obtained as deep compositions of simpler functions (which includes both deep and recurrent nets) can express highly varying functions (with many ups and downs and different input regions that can be distinguished) much more efficiently (with fewer parameters) than otherwise, under a prior which seems to work well for artificial intelligence tasks. Empirical work in a variety of applications has demonstrated that, when well trained, such deep architectures can be highly successful, remarkably breaking through previous state-of-the-art in many areas, including speech recognition, object recognition, language models, machine translation and transfer learning. Although neural networks have long been considered lacking in theory and much remains to be done, theoretical advances have been made and will be discussed, to support distributed representations, depth of representation, the non-convexity of the training objective, and the probabilistic interpretation of learning algorithms (especially of the auto-encoder type, which were lacking one). The talk will focus on the intuitions behind these theoretical results.

Speaker: Yoshua Bengio is a Professor of the Department of Computer Science and Operations Research at the University of Montreal, head of the Montreal Institute for Learning Algorithms (MILA), CIFAR Program co-director of the CIFAR Neural Computation and Adaptive Perception program, Canada Research Chair in Statistical Learning Algorithms.
November 25, 2016
McGill Statistics Seminar
Alexandra Schmidt

Spatio-temporal models for skewed processes

15:30-16:30

BURN 1205
Abstract:

In the analysis of most spatio-temporal processes in environmental studies, observations present skewed distributions. Usually, a single transformation of the data is used to approximate normality, and stationary Gaussian processes are assumed to model the transformed data. The choice of transformation is key for spatial interpolation and temporal prediction. We propose a spatio-temporal model for skewed data that does not require the use of data transformation. The process is decomposed as the sum of a purely temporal structure with two independent components that are considered to be partial realizations from independent spatial Gaussian processes, for each time t. The model has an asymmetry parameter that might vary with location and time, and if this is equal to zero, the usual Gaussian model results. The inference procedure is performed under the Bayesian paradigm, and uncertainty about parameters estimation is naturally accounted for. We fit our model to different synthetic data and to monthly average temperature observed between 2001 and 2011 at monitoring locations located in the south of Brazil. Different model comparison criteria, and analysis of the posterior distribution of some parameters, suggest that the proposed model outperforms standard ones used in the literature. This is joint work with Kelly Gonçalves (UFRJ, Brazil) and Patricia L. Velozo (UFF, Brazil)

Speaker: Alexandra M. Schmidt is an Associate Professor of Biostatistics in the Department of Epidemiology, Biostatistics, and Occupational Health at McGill University.

December 1, 2016

CRM Colloque de statistique
Richard Samworth
High-dimensional changepoint estimation via sparse projection

15:30-16:30

BURN 708
Abstract:

Changepoints are a very common feature of Big Data that arrive in the form of a data stream. We study high-dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the coordinates. The challenge is to borrow strength across the coordinates in order to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called 'inspect' for estimation of the changepoints: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimisation problem derived from the CUSUM transformation of the time series. We then apply an existing univariate changepoint detection algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated changepoints and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data generating mechanisms.

Speaker: Richard Samworth is a Professor of Statistics in the Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge. He is a Fellow of the American Statistical Association (2015) and of the Institute of Mathematical Statistics (2014), and a recipient of the Philip Leverhulme Prize, Leverhulme Trust (2014) and the Guy Medal in Bronze, Royal Statistical Society (2012).

December 2, 2016

McGill Statistics Seminar
Andrea Giussani
Modeling dependence in bivariate multi-state processes: A frailty approach

15:30-16:30

BURN 1205
Abstract:

The aim of this talk is to present a statistical framework for the analysis of dependent bivariate multistate processes, allowing one to study the dependence both across subjects in a pair and among individual-specific events. As for the latter, copula- based models are employed, whereas dependence between multi-state models can be accomplished by means of frailties. The well known Marshall-Olkin Bivariate Exponential Distribution (MOBVE) is considered for the joint distribution of frailties. The reason is twofold: on the one hand, it allows one to model shocks that affect the two individual-specific frailties; on the other hand, the MOBVE is the only bivariate exponential distribution with exponential marginals, which allows for the modeling of each multi-state process as a shared frailty model. We first discuss a frailty bivariate survival model with some new results, and then move to the construction of the frailty bivariate multi-state model, with the corresponding observed data likelihood maximization estimating procedure in presence of right censoring. The last part of the talk will be dedicated to some open problems related to the modeling of multiple multi-state processes in presence of Marshall-Olkin type copulas.

Speaker: Andrea Giussani is a PhD candidate in Statistics at Bocconi University, Milan (Italy). His current scientific research is focused on event-history and survival data analysis.

 

Winter Term 2017

Date Event Speaker(s) Title Time Location
January 13, 2017
McGill Statistics Seminar
Victor Veitch

(Sparse) exchangeable graphs

15:30-16:30

BURN 1205
Abstract:

Many popular statistical models for network valued datasets fall under the remit of the graphon framework, which (implicitly) assumes the networks are densely connected. However, this assumption rarely holds for the real-world networks of practical interest. We introduce a new class of models for random graphs that generalises the dense graphon models to the sparse graph regime, and we argue that this meets many of the desiderata one would demand of a model to serve as the foundation for a statistical analysis of real-world networks. The key insight is to define the models by way of a novel notion of exchangeability; this is analogous to the specification of conditionally i.i.d. models by way of de Finetti's representation theorem. We further develop this model class by explaining the foundations of sampling and estimation of network models in this setting. The later result can be can be understood as the (sparse) graph analogue of estimation via the empirical distribution in the i.i.d. sequence setting.

Speaker: Victor Veitch is a PhD candidate in the Department of Statistical Sciences at the University of Toronto working in the group of Daniel Roy. He is interested in the theory and application of machine learning and statistical inference, with a particular focus on Bayesian non-parametrics and random networks.

January 20, 2017

McGill Statistics Seminar
Tudor Manole
Order selection in multidimensional finite mixture models

15:30-16:30

BURN 1205
Abstract:

Finite mixture models provide a natural framework for analyzing data from heterogeneous populations. In practice, however, the number of hidden subpopulations in the data may be unknown. The problem of estimating the order of a mixture model, namely the number of subpopulations, is thus crucial for many applications. In this talk, we present a new penalized likelihood solution to this problem, which is applicable to models with a multidimensional parameter space. The order of the model is estimated by starting with a large number of mixture components, which are clustered and then merged via two penalty functions. Doing so estimates the unknown parameters of the mixture, at the same time as the order. We will present extensive simulation studies, showing our approach outperforms many of the most common methods for this problem, such as the Bayesian Information Criterion. Real data examples involving normal and multinomial mixtures further illustrate its performance.

Speaker: Tudor Manole is currently in our honors undergraduate Math. program working with Abbas Khalili.

January 27, 2017

CRM-SSC Prize 2012 Colloque
Radu Craiu
Bayesian inference for conditional copula models

15:30-16:30

ROOM 6254

Pavillon Andre-Aisenstadt 2920, UdeM

Abstract:

Conditional copula models describe dynamic changes in dependence and are useful in establishing high dimensional dependence structures or in joint modelling of response vectors in regression settings. We describe some of the methods developed for estimating the calibration function when multiple predictors are needed and for resolving some of the model choice questions concerning the selection of copula families and the shape of the calibration function. This is joint work with Evgeny Levi, Avideh Sabeti and Mian Wei.

Speaker: Radu Craiu is Professor of Statistics in the Department of Statistical Sciences at University of Toronto.

February 3, 2017

McGill Statistics Seminar
Hua Zhou
MM algorithms for variance component models

15:30-16:30

BURN 1205
Abstract:

Variance components estimation and mixed model analysis are central themes in statistics with applications in numerous scientific disciplines. Despite the best efforts of generations of statisticians and numerical analysts, maximum likelihood estimation and restricted maximum likelihood estimation of variance component models remain numerically challenging. In this talk, we present a novel iterative algorithm for variance components estimation based on the minorization-maximization (MM) principle. MM algorithm is trivial to implement and competitive on large data problems. The algorithm readily extends to more complicated problems such as linear mixed models, multivariate response models possibly with missing data, maximum a posteriori estimation, and penalized estimation. We demonstrate, both numerically and theoretically, that it converges faster than the classical EM algorithm when the number of variance components is greater than two.

Speaker: Hua Zhou is Associate Professor of Biostatistics in Department of Biostatistics, UCLA School of Public Health.
February 10, 2017
McGill Statistics Seminar
Zhihua Su

Sparse envelope model: Efficient estimation and response variable selection in multivariate linear regression

15:30-16:30

BURN 1205
Abstract:

The envelope model is a method for efficient estimation in multivariate linear regression. In this article, we propose the sparse envelope model, which is motivated by applications where some response variables are invariant to changes of the predictors and have zero regression coefficients. The envelope estimator is consistent but not sparse, and in many situations it is important to identify the response variables for which the regression coefficients are zero. The sparse envelope model performs variable selection on the responses and preserves the efficiency gains offered by the envelope model. Response variable selection arises naturally in many applications, but has not been studied as thoroughly as predictor variable selection. In this article, we discuss response variable selection in both the standard multivariate linear regression and the envelope contexts. In response variable selection, even if a response has zero coefficients, it still should be retained to improve the estimation efficiency of the nonzero coefficients. This is different from the practice in predictor variable selection. We establish consistency, the oracle property and obtain the asymptotic distribution of the sparse envelope estimator.

Speaker: Zhihua Su is an Assistant Professor of Statistics in the Department of Statistics at the University of Florida.

February 17, 2017

McGill Statistics Seminar
Joelle Pineau
Building end-to-end dialogue systems using deep neural architectures

15:30-16:30

BURN 1205
Abstract:

The ability for a computer to converse in a natural and coherent manner with a human has long been held as one of the important steps towards solving artificial intelligence. In this talk I will present recent results on building dialogue systems from large corpuses using deep neural architectures. I will highlight several challenges related to data acquisition, algorithmic development, and performance evaluation.

Speaker: Joelle Pineau is an associate professor of Computer Science at McGill University, where she co-directs the Reasoning and Learning Lab.

February 24, 2017

McGill Statistics Seminar
James A. Hanley
The first pillar of statistical wisdom

15:30-16:30

BURN 1205
Abstract:

This talk will provide an introduction to the first of the pillars in Stephen Stigler's 2016 book The Seven Pillars of Statistical Wisdom, namely “Aggregation.” It will focus on early instances of the sample mean in scientific work, on the early error distributions, and on how their “centres” were fitted.

Speaker: James A. Hanley is a Professor in the Department of Epidemiology, Biostatistics and Occupational Health, at McGill University.

March 10, 2017

McGill Statistics Seminar
Nima Aghaeepour
High-throughput single-cell biology: The challenges and opportunities for machine learning scientists

15:30-16:30

BURN 1205
Abstract:

The immune system does a lot more than killing “foreign” invaders. It’s a powerful sensory system that can detect stress levels, infections, wounds, and even cancer tumors. However, due to the complex interplay between different cell types and signaling pathways, the amount of data produced to characterize all different aspects of the immune system (tens of thousands of genes measured and hundreds of millions of cells, just from a single patient) completely overwhelms existing bioinformatics tools. My laboratory specializes in the development of machine learning techniques that address the unique challenges of high-throughput single-cell immunology. Sharing our lab space with a clinical and an immunological research laboratory, my students and fellows are directly exposed to the real-world challenges and opportunities of bringing machine learning and immunology to the (literal) bedside.

Speaker: Nima Aghaeepour is a CIHR Fellow, an ISAC Scholar, and an OCRF Ann Schreiber Investigator with Garry Nolan at Stanford University.
March 17, 2017
CRM Colloque de statistique
Sayan Mukherjee

Inference in dynamical systems

15:30-16:30

BURN 1205
Abstract:

We consider the asymptotic consistency of maximum likelihood parameter estimation for dynamical systems observed with noise. Under suitable conditions on the dynamical systems and the observations, we show that maximum likelihood parameter estimation is consistent. Furthermore, we show how some well-studied properties of dynamical systems imply the general statistical properties related to maximum likelihood estimation. Finally, we exhibit classical families of dynamical systems for which maximum likelihood estimation is consistent. Examples include shifts of finite type with Gibbs measures and Axiom A attractors with SRB measures. We also relate Bayesian inference to the thermodynamic formalism in tracking dynamical systems.

Speaker: Sayan Mukherjee is a Professor in the Department of Statistical Science at Duke University. His research interest is in Geometry and topology in probabilistic modeling, Statistical and computational biology, Modeling of massive data.

March 24, 2017

McGill Statistics Seminar
Hamid Pezeshk
Bayesian sample size determination for clinical trials

15:30-16:30

BURN 1205
Abstract:

Sample size determination problem is an important task in the planning of clinical trials. The problem may be formulated formally in statistical terms. The most frequently used methods are based on the required size, and power of the trial for a specified treatment effect. In contrast to the Bayesian decision-theoretic approach, there is no explicit balancing of the cost of a possible increase in the size of the trial against the benefit of the more accurate information which it would give. In this talk a fully Bayesian approach to the sample size determination problem is discussed. This approach treats the problem as a decision problem and employs a utility function to find the optimal sample size of a trial. Furthermore, we assume that a regulatory authority, which is deciding on whether or not to grant a licence to a new treatment, uses a frequentist approach. The optimal sample size for the trial is then found by maximising the expected net benefit, which is the expected benefit of subsequent use of the new treatment minus the cost of the trial.

Speaker: Hamid Pezeshk is a Professor from the School of Mathematics, Statistics and Computer Science University of Tehran and the School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran.
March 31, 2017
McGill Statistics Seminar
Chen Xu

Distributed kernel regression for large-scale data

15:30-16:30

BURN 1205
Abstract:

In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a strategy, a full dataset is first split into several manageable segments; the final output is then aggregated from the individual outputs of the segments. Despite its popularity in practice, it remains largely unknown that whether such a distributive strategy provides valid theoretical inferences to the original data; if so, how efficient does it work? In this talk, I address these fundamental issues for the non-parametric distributed kernel regression, where accurate prediction is the main learning task. I will begin with the naive simple averaging algorithm and then talk about an improved approach via ADMM. The promising preference of these methods is supported by both simulation and real data examples.

Speaker: Chen Xu is an Assistant Professor in the Department of Mathematics and Statistics, University of Ottawa.
April 6, 2017
CRM Colloque de statistique
Jason Fine

Instrumental Variable Regression with Survival Outcomes

15:30-16:30

Universite Laval, Pavillon Vachon, Salle 3840
Abstract:

Instrumental variable (IV) methods are popular in non-experimental studies to estimate the causal effects of medical interventions or exposures. These approaches allow for the consistent estimation of such effects even if important confounding factors are unobserved. Despite the increasing use of these methods, there have been few extensions of IV methods to censored data regression problems. We discuss challenges in applying IV structural equational modelling techniques to the proportional hazards model and suggest alternative modelling frameworks. We demonstrate the utility of the accelerated lifetime and additive hazards models for IV analyses with censored data. Assuming linear structural equation models for either the event time or the hazard function, we proposed closed-form, two-stage estimators for the causal effect in the structural models for the failure time outcomes. The asymptotic properties of the estimators are derived and the resulting inferences are shown to perform well in simulation studies and in an application to a data set on the effectiveness of a novel chemotherapeutic agent for colon cancer.

Speaker: Jason Fine is a professor with tenure jointly appointed in the Department of Biostatistics and the Department of Statistics and Operations Research at UNC-Chapel Hill.

 

Website design: Prof. Johanna G. Nešlehová

 

 

Last edited by on Thu, 04/06/2017 - 10:33