|
August 31, 2011
Wednesday!
|
McGill Statistics Seminar
|
Johanna Ziegel
|
Precision estimation for stereological volumes |
15:30-16:30
|
BURN 1205 |
| Abstract: |
Volume estimators based on Cavalieri’s principle are widely used in the bio-
sciences. For example in neuroscience, where volumetric measurements of brain
structures are of interest, systematic samples of serial sections are obtained by
magnetic resonance imaging or by a physical cutting procedure. The volume v is
then estimated by ˆv, which is the sum over the areas of the structure of interest in
the section planes multiplied by the width of the sections, t > 0.
Assessing the precision of such volume estimates is a question of great practical
importance, but statistically a challenging task due to the strong spatial dependence
of the data and typically small sample sizes. In this talk, an overview of classical
and new approaches to this problem will be presented. A special focus will be given
to some recent advances on distribution estimators and confidence intervals for ˆv;
see Hall and Ziegel (2011).
References
Hall, P. and Ziegel, J. (2011). Distribution estimators and confidence intervals for
stereological volumes. Biometrika 98, 417–431.
|
| Speaker: |
Johanna Ziegel is a Postdoctoral Fellow in the Institute of Applied Mathematics at Heidelberg University. She holds a Ph.D. in Statistics from ETH Zürich and spent a year as a Postdoctoral Fellow with Peter Hall at the University of Melbourne. |
|
| September 9, 2011 |
CRM-ISM-GERAD Colloque de statistique
|
Ed Susko and Aurélie Labbe
|
Susko: Properties of Bayesian posteriors and bootstrap support in phylogenetic inference
Labbe: An integrated hierarchical Bayesian model for multivariate eQTL genetic mapping
|
14:00-16:30
|
UdeM, Pav. André-Aisenstadt,
SALLE 1360 |
| Abstract: |
Susko: The data generated by large scale sequencing projects is complex, high-dimensional, multivariate discrete data. In studies of evolutionary biology, the parameter space of evolutionary trees is an unusual additional complication from a statistical perspective. In this talk I will briefly introduce the general approaches to utilizing sequence data in phylogenetic inference. A particular issue of interest in phylogenetic inference is assessments of uncertainty about the true tree or structures that might be present in it. The primary way in which uncertainty is assessed in practice is through bootstrap support (BP) for splits, large values indicating strong support for the split. A difficulty with this measure, however, has been deciding how large is large enough. We discuss the interpretation of BP and ways of adjusting it so that it has an interpretation similar to a p-value. A related issue, having to do with the behaviour of methods when data are generated from a star tree, gives rise to an interesting example in which, due to the unusual statistical nature,Bayesian and maximum likelihood methods give strikingly different results, even asymptotically.
Labbe: Recently, expression quantitative loci (eQTL) mapping studies, where expression levels of thousands of genes are viewed as quantitative traits, have been used to provide greater insight into the biology of gene regulation. Current data analysis and interpretation of eQTL studies involve the use of multiple methods and applications, the output of which is often fragmented. In this talk, we present an integrated hierarchical Bayesian model that jointly models all genes and SNPs to detect eQTLs.
We propose a model (named iBMQ) that is speci cally designed to handle a large number G of gene expressions, a large number S of regressors (genetic markers) and a small number n of individuals in what we call a "large G, large S, small n" paradigm. This method incorporates genotypic and gene expression data into a single model while 1) specifically coping with the high dimensionality of eQTL data (large number of genes), 2) borrowing strength from all gene expression data for the mapping procedures, and 3) controlling the number of false positives to a desirable level.
|
| Speakers: |
Ed Susko, Dalhousie University
Aurélie Labbe, McGill University
|
| Schedule: |
Talk 1: Aurélie Labbe 14:00-15:00
Coffee Break 15:00-15:30
Talk 2: Ed Susko 15:30-16:30
|
|
| September 16, 2011 |
McGill Statistics Seminar
|
Elif F. Acar
|
Inference and model selection for pair-copula constructions |
15:30-16:30
|
BURN 1205 |
| Abstract: |
Pair-copula constructions (PCCs) provide an elegant way to construct highly flexible multivariate distributions. However, for convenience of inference, pair-copulas are often assumed to depend on the conditioning variables only indirectly. In this talk, I will show how nonparametric smoothing techniques can be used to avoid this assumption. Model selection for PCCs will also be addressed within the proposed method. |
| Speaker: |
Elif F. Acar is a Postdoctoral Fellow in the Department of Mathematics and Statistics at McGill University. She holds a Ph.D. in Statistics from the University of Toronto. |
|
| September 23, 2011 |
McGill Statistics Seminar
|
Shaowei Lin
|
What is singular learning theory? |
15:30-16:30
|
BURN 1205 |
| Abstract: |
In this talk, we give a basic introduction to Sumio Watanabe's
Singular Learning Theory, as outlined in his book "Algebraic Geometry
and Statistical Learning Theory". Watanabe's key insight to studying
singular models was to use a deep result in algebraic geometry known
as Hironaka's Resolution of Singularities. This result allows him to
reparametrize the model in a normal form so that central limit
theorems can be applied. In the second half of the talk, we discuss
new algebraic methods where we define fiber ideals for discrete/Gaussian models. We show that the key to understanding the singular model lies in monomializing its fiber ideal. |
| Speaker: |
Shaowei Lin is a Postdoctoral Fellow at UC Berkeley. He received a B.Sc. from Stanford and a Ph.D. from UC Berkeley under the supervision of Bernd Sturmfels. |
|
| September 30, 2011 |
McGill Statistics Seminar
|
Ioana A. Cosma
|
Data sketching for cardinality and entropy estimation
|
15:30-16:30
|
BURN 1205 |
| Abstract: |
Streaming data is ubiquitous in a wide range of areas from engineering and information technology, finance, and commerce, to atmospheric physics, and earth sciences. The online approximation of properties of data streams is of great interest, but this approximation process is hindered by the sheer size of the data and the speed at which it is generated. Data stream algorithms typically allow only one pass over the data, and maintain sub-linear representations of the data from which target properties can be inferred with high efficiency.
In this talk we consider the online approximation of two important characterizations of data streams: cardinality and empirical Shannon entropy. We assume that the number of distinct elements observed in the stream is prohibitively large, so that the vector of cumulative
quantities cannot be stored on main computer memory for fast and efficient access. We focus on two techniques that use pseudo-random variates to form low-dimensional data sketches (using hashing and random projections), and derive estimators of the cardinality and empirical entropy. We discuss various properties of our estimators such as relative asymptotic efficiency, recursive computability, and error and complexity bounds. Finally, we present results on simulated data and seismic measurements from a volcano.
References:
Peter Clifford and Ioana A. Cosma (2011) “A statistical analysis of probabilistic counting algorithms” (to appear in the Scandinavian Journal of Statistics, preprint on arXiv:0801.3552).
Peter Clifford and Ioana A. Cosma (2009)“A simple sketching algorithm for entropy estimation” (in preparation, preprint on arXiv:0908.3961).
|
| Speaker: |
Ioana A. Cosma is a Postdoctoral Fellow in the Statistical Laboratoty at Cambridge University, England. She holds a Ph.D. in Statistics from the University of Oxford. |
|
| October 7, 2011 |
McGill Statistics Seminar
|
|
Nonexchangeability and radial asymmetry identification via bivariate quantiles, with financial applications |
15:30-16:30
|
BURN 1205 |
| Abstract: |
In this talk, the following topics will be discussed: A class of bivariate probability integral transforms and Kendall distribution; bivariate quantile curves, central and lateral regions; non-exchangeability and radial asymmetry identification; new measures of nonexchangeability and radial asymmetry; financial applications and a few open problems (joint work with Flavio Ferreira). |
| Speaker: |
| Nikolai Kolev is a Professor of Statistics at the University of Sao Paulo, Brazil. |
|
|
| October 14, 2011 |
CRM-ISM-GERAD Colloque de statistique
|
Debbie Dupuis and Richard A. Davis
|
Dupuis: Modeling non-stationary extremes: The case of heat waves
Davis: Estimating extremal dependence in time series via the extremogram
|
14:00-16:30
|
McGill
TROTTIER 1080
|
| Abstract: |
Dupuis: Environmental processes are often non-stationary since climate patterns cause systematic seasonal effects and long-term climate changes cause trends. The usual limit models are not applicable for non-stationary processes, but models from standard extreme value theory can be used along with statistical modeling to provide useful inference. Traditional approaches include letting model parameters be a function of covariates or using time-varying thresholds. These approaches are inadequate for the study of heat waves however and we show how a recent pre-processing approach by Eastoe and Tawn (2009) can be used in conjunction with an innovative change-point analysis to model daily maximum temperature. The model is then fitted to data from four U.S. cities and used to estimate the recurrence probabilities of runs over seasonally high temperatures. We show that the probability of long and intense heat waves has increased considerably over 50 years.
Davis: The extremogram is a flexible quantitative tool that measures various types of extremal dependence in a stationary time series. In many respects, the extremogram can be viewed as an extreme-value analogue of the autocorrelation function (ACF) for a time series. Under mixing conditions, the asymptotic normality of the empirical extremogram was derived in Davis and Mikosch (2009). Unfortunately, the limiting variance is a difficult quantity to estimate. Instead we employ the stationary bootstrap to the empirical extremogram and establish that this resampling procedure provides an asymptotically correct approximation to the central limit theorem. This in turn can be used for constructing credible confidence bounds for the sample extremogram. The use of the stationary bootstrap for the extremogram is illustrated in a variety of real and simulated data sets. The cross-extremogram measures cross-sectional extremal dependence in multivariate time series. A measure of this dependence, especially left tail dependence, is of great importance in the calculation of portfolio risk. We find that after devolatilizing the marginal series, extremal dependence still remains, which suggests that the extremal dependence is not due solely to the heteroskedasticity in the stock returns process. However, for the univariate series, the filtering removes all extremal dependence. Following Geman and Chang (2010), a return time extremogram which measures the waiting time between rare or extreme events in univariate and bivariate stationary time series is calculated. The return time extremogram suggests the existence of extremal clustering in the return times of extreme events for financial assets. The stationary bootstrap can again provide an asymptotically correct approximation to the central limit theorem and can be used for constructing credible confidence bounds for this return time extremogram. (This is joint work with Thomas Mikosch and Ivor Cribben.)
|
| Speaker: |
Debbie Dupuis is a Professor of Statistics at HEC Montréal. She works in extreme-value theory, robust estimation and computational statistics.
Richard A. Davis is a Professor of Statistics at Columbia University. He works in applied probability, time series, stochastic processes and extreme-value theory. Together with P. J. Brockwell, he is the author of the well-known textbook Introduction to Time Series and Forecasting.
|
| Schedule: |
Talk 1: Debbie Dupuis 14:00 -- 15:00
Coffee Break 15:00 -- 15:30
Talk 2: Richard A. Davis 15:30 -- 16:30
|
|
| October 21, 2011 |
McGill Statistics Seminar
|
William Astle
|
Bayesian modelling of GWAS data using linear mixed models |
15:30-16:30
|
BURN 1205 |
| Abstract: |
Genome-wide association studies (GWAS) are used to identify physical positions (loci) on the genome where genetic variation is causally associated with a phenotype of interest at the population level. Typical studies are based on the measurement of several hundred thousand single nucleotide polymorphism (SNP) variants spread across the genome, in a few thousand individuals. The resulting datasets are large and require computationally efficient methods of statistical analysis.
Two variance components linear mixed models have recently been proposed as a method of analysis for GWAS data that can control for the confounding effects of population stratification, by modelling the correlation between study subjects induced by relatedness. Unfortunately, standard methods for fitting linear mixed models are computationally intensive because computation of the likelihood depends on the inversion of a large matrix which is a function of the model parameters. I will describe a fast method for calculating the likelihood of a two variance components linear model which allows analysis of a large GWAS dataset using mixed models by Bayesian inference. A Bayesian analysis of GWAS provides a natural way of overcoming the so-called "multiple-testing" problem which arises from the large dimension of the predictor variable space. In the Bayesian framework we should have low prior belief that any particular genetic variant explains a large proportion of the phenotypic variation. The normal-exponential-gamma prior as been proposed as a good representation of such belief and I will describe an efficient MCMC algorithm which allows to incorporate this prior into the modelling. |
| Speaker: |
William Astle is a Postdoctoral Fellow at McGill University, working with Aurélie Labbe and David A. Stephens. He holds a Ph.D. from Imperial Colledge, London.
|
|
| October 28, 2011 |
McGill Statistics Seminar
|
Andrew Patton
|
Simulated method of moments estimation for copula-based multivariate models |
15:00-16:00
|
BURN 1205 |
| Abstract: |
This paper considers the estimation of the parameters of a copula via a simulated method of moments type approach. This approach is attractive when the likelihood of the copula model is not known in closed form, or when the researcher has a set of dependence measures or other functionals of the copula, such as pricing errors, that are of particular interest. The proposed approach naturally also nests method of moments and generalized method of moments estimators. Combining existing results on simulation based estimation with recent results from empirical copula process theory, we show the consistency and asymptotic normality of the proposed estimator, and obtain a simple test of over-identifying restrictions as a goodness-of-fit test. The results apply to both iid and time series data. We analyze the finite-sample behavior of these estimators in an extensive simulation study. We apply the model to a group of seven financial stock returns and find evidence of statistically significant tail dependence, and that the dependence between these assets is stronger in crashes than booms. |
| Speaker: |
Andrew Patton is an Associate Professor of Economics at Duke University, Durham, North Carolina. |
|
| November 3, 2011 |
McGill Statistics Seminar
|
Alessandro Rinaldo
|
Maximum likelihood estimation in network models
|
16:00-17:00
|
BURN 1205 |
| Abstract: |
This talk is concerned with maximum likelihood estimation (MLE) in exponential statistical models for networks (random graphs) and, in particular, with the beta model, a simple model for undirected graphs in which the degree sequence is the minimal sufficient statistic. The speaker will present necessary and sufficient conditions for the existence of the MLE of the beta model parameters that are based on a geometric object known as the polytope of degree sequences. Using this result, it is possible to characterize in a combinatorial fashion sample points leading to a non-existent MLE and non-estimability of the probability parameters under a non-existent MLE. The speaker will further indicate some conditions guaranteeing that the MLE exists with probability tending to 1 as the number of nodes increases. Much of this analysis applies also to other well-known models for networks, such as the Rasch model, the Bradley-Terry model and the more general p1 model of Holland and Leinhardt. These results are in fact instantiations of rather general geometric properties of exponential families with polyhedral support that will be illustrated with a simple exponential random graph model. |
| Speaker: |
Alessandro Rinaldo is an Assistant Professor of Statistics at Carnegie Mellon University, Pittsburgh, Pennsylvania. |
|
| November 4, 2011 |
McGill Statistics Seminar
|
Martin Lysy
|
A Bayesian method of parametric inference for diffusion processes |
15:30-16:30
|
BURN 1205 |
| Abstract: |
Diffusion processes have been used to model a multitude of continuous-time phenomena in Engineering and the Natural Sciences, and as in this case, the volatility of financial assets. However, parametric inference has long been complicated by an intractable likelihood function. For many models the most effective solution involves a large amount of missing data for which the typical Gibbs sampler can be arbitrarily slow. On the other hand, joint parameter and missing data proposals can lead to a radical improvement, but their acceptance rate tends to scale exponentially with the number of observations.
We consider here a novel method of dividing the inference process into separate data batches, each small enough to benefit from joint proposals, to be processed consecutively. A filter combines batch contributions to produce likelihood inference based on the whole dataset. Although the result is not always unbiased, it has very low variability, often achieving considerable accuracy in a short amount of time. We present an example using Heston's popular model for option pricing, but much of the methodology can be extended beyond diffusions to Hidden Markov and other State-Space models. |
| Speaker: |
Martin Lysy is finishing his Ph.D. in the Deparment of Statistics, Harvard University.
|
|
| November 11, 2011 |
CRM-ISM-GERAD Colloque de statistique
|
Hélène Guérin and Ana-Maria Staicu
|
Guérin: An ergodic variant of the telegraph process for a toy model of bacterial chemotaxis
Staicu: Skewed functional processes and their applications
|
14:00-16:30
|
UdeM |
| Abstract: |
|
Guérin: I will study the long time behavior of a variant of the classic telegraph process, with non-constant jump rates that induce a drift towards the origin. This process can be seen as a toy model for velocity-jump processes recently proposed as mathematical models of bacterial chemotaxis. I will give its invariant law and construct an explicit coupling for velocity and position, providing exponential ergodicity with moreover a quantitative control of the total variation distance to equilibrium at each time instant. It is a joint work with Joaquin Fontbona (Universidad de Santiago, Chile) and Florent Malrieu (Université Rennes 1, France).
Staicu: We introduce a novel class of models for functional data exhibiting skewness or other shape characteristics that vary with spatial location. Such data are not envisaged by the current approaches to model functional data, due to the lack of Gaussian – like features. Our methodology allows modeling the pointwise quantiles, has interpretability advantages and is computationally feasible. The methods were motivated by and are illustrated with a state-of-the-art study of neuronal tracts in multiple sclerosis patients and healthy controls.
|
|
| Speaker: |
Ana-Maria Staicu obtained her PhD from the University in Toronto and is currently Assistant Professor at the North Carolina State University. She works in functional data analysis and likelihood methods.
Hélène Guérin is an Associate Professor at Université Rennes 1. She obtained her PhD at the Université Paris X Nanterre. Her main research interests are in the probabilistic interpretation of nonlinear partial differential equations.
|
| Schedule: |
Hélène Guérin: 14:00 -- 15:00
Coffee Break 15:00 -- 15:30
Ana-Maria Staicu: 15:30 -- 16:30
|
|
| November 18, 2011 |
McGill Statistics Seminar
|
Amparo Casanova
|
Construction of bivariate distributions via principal components |
15:30-16:30
|
BURN 1205 |
| Abstract: |
The diagonal expansion of a bivariate distribution (Lancaster, 1958) has been used as a tool to construct bivariate distributions; this method has been generalized using principal dimensions of random variables (Cuadras 2002). Sufficient and necessary conditions are given for uniform, exponential, logistic and Pareto marginals in the one and two-dimensional case. The corresponding copulas are obtained. |
| Speaker: |
Amparo Casanova is an Assistant Professor at the Dalla Lana School of Public Health, Division of Biostatistics, University of Toronto.
|
|
| November 25, 2011 |
McGill Statistics Seminar
|
François Bellavance
|
Estimation of the risk of a collision when using a cell phone while driving
|
15:30-16:30 |
BURN 1205 |
| Abstract: |
The use of cell phone while driving raises the question of whether it is associated with an increased collision risk and if so, what is its magnitude. For policy decision making, it is important to rely on an accurate estimate of the real crash risk of cell phone use while driving. Three important epidemiological studies were published on the subject, two using the case-crossover approach and one using a more conventional longitudinal cohort design. The methodology and results of these studies will be presented and discussed. |
| Speaker: |
François Bellavance is a Professor of Statistics at HEC Montréal and the Director of the Transportation Safety Laboratory.
|
|
| December 2, 2011 |
McGill Statistics Seminar
|
Alberto Carabarin
|
Path-dependent estimation of a distribution under generalized censoring
|
15:30-16:30 |
BURN 1205 |
| Abstract: |
This talk focuses on the problem of the estimation of a distribution on an arbitrary complete separable metric space when the data points are subject to censoring by a general class of random sets. A path-dependent estimator for the distribution is proposed; among other properties, the estimator is sequential in the sense that it only uses data preceding any fixed point at which it is evaluated. If the censoring mechanism is totally ordered, the paths may be chosen in such a way that the estimate of the distribution defines a measure. In this case, we can prove a functional central limit theorem for the estimator when the underlying space is Euclidean. This is joint work with Gail Ivanoff (University of Ottawa) |
| Speaker: |
Alberto Carabarin is a Postdoctoral Fellow at McGill University. He works with Christian Genest and Johanna Nešlehová. He holds a PhD from the University of Ottawa.
|
|
| December 9, 2011 |
CRM-ISM-GERAD Colloque de statistique
|
Giles Hooker
|
Detecting evolution in experimental ecology: Diagnostics for missing state variables
|
15:30-16:30 |
UQAM
Salle 5115
|
| Abstract: |
This talk considers goodness of fit diagnostics for time-series data from processes approximately modeled by systems of nonlinear ordinary differential equations. In particular, we seek to determine three nested causes of lack of fit: (i) unmodeled stochastic forcing, (ii) mis-specified functional forms and (iii) mis-specified state variables. Testing lack of fit in differential equations is challenging since the model is expressed in terms of rates of change of the measured variables. Here, lack of fit is represented on the model scale via time-varying parameters. We develop tests for each of the three cases above through bootstrap and permutation methods.
A motivating example is presented from laboratory-based ecology in which algae are grown on nitrogen-rich medium and rotifers are introduced as a predator. The resulting data exhibit dynamics that do not correspond to those generated by classical ecological models. A hypothesized explanation is that more than one algal species are present in the chemostat. We assess the statistical evidence for this claim and show that while models incorporating multiple algal species provide better agreement with the data, their existence cannot be demonstrated without strong model assumptions. We conclude with an examination of the use of control theory to design inputs into dynamic systems to improve parameter estimation and power to detect missing components.
|
| Speaker: |
Giles Hooker is an Assistant Professor in the Department of Statistical Science and the Department of Biological Statistics and Computational Biology at Cornell University. His main research interests include functional data analysis, machine learning and data analysis for dynamical systems.
|
| |
|
|