| September 21, 2012 |
CRM-ISM-GERAD Colloque de statistique
|
Fang Yao |
Regularized semiparametric functional linear regression |
14:30-15:30
|
McGill, Burnside Hall 1214 |
| Abstract: |
In many scientific experiments we need to face analysis with functional data, where the observations are sampled from random process, together with a potentially large number of non-functional covariates. The complex nature of functional data makes it difficult to directly apply existing methods to model selection and estimation. We propose and study a new class of penalized semiparametric functional linear regression to characterize the regression relation between a scalar response and multiple covariates, including both functional covariates and scalar covariates. The resulting method provides a unified and flexible framework to jointly model functional and non-functional predictors, identify important covariates, and improve efficiency and interpretability of the estimates. Featured with two types of regularization: the shrinkage on the effects of scalar covariates and the truncation on principal components of the functional predictor, the new approach is flexible and effective in dimension reduction. One key contribution of this paper is to study theoretical properties of the regularized semiparametric functional linear model. We establish oracle and consistency properties under mild conditions by allowing possibly diverging number of scalar covariates and simultaneously taking the infinite-dimensional functional predictor into account. We illustrate the new estimator with extensive simulation studies, and then apply it to an image data analysis.
|
|
Speaker:
|
Fang Yao (http://www.utstat.utoronto.ca/fyao/) is Associate Professor, Department of Statistics, University of Toronto. His research interests include functional and longitudinal data analysis, nonparametric regression and smoothing methods, statistical modeling of high-dimensional and complex data, with applications involving functional objects (evolutional biology, human genetics, finance and e-commerce, chemical engineering).
|
|
|
September 28, 2012
|
McGill Statistics Seminar
|
Erica Moodie |
The current state of Q-learning for personalized medicine |
14:30-15:30
|
BURN 1205 |
| Abstract: |
In this talk, I will provide an introduction to DTRs and an overview the state of the art (and science) of Q-learning, a popular tool in reinforcement learning. The use of Q-learning and its variance in randomized and non-randomized studies will be discussed, as well as issues concerning inference as the resulting estimators are not always regular. Current and future directions of interest will also be considered.
|
| Speaker: |
Erica Moodie is an Associate Professor in the Department of Epidemiology, Biostatistics and Occupational Health at McGill. |
|
|
October 5, 2012
|
McGill Statistics Seminar
|
Jacob Stöber |
Markov switching regular vine copulas |
14:30-15:30
|
BURN 1205 |
| Abstract: |
Using only bivariate copulas as building blocks, regular vines(R-vines) constitute a flexible class of high-dimensional dependence models. In this talk we introduce a Markov switching R-vine copula model, combining the flexibility of general R-vine copulas with the possibility for dependence structures to change over time. Frequentist as well as Bayesian parameter estimation is discussed. Further, we apply the newly proposed model to examine the dependence of exchange rates as well as stock and stock index returns. We show that changes in dependence are usually closely interrelated with periods of market stress. In such times the Value at Risk of an asset portfolio is significantly underestimated when changes in the dependence structure are ignored.
|
| Speaker: |
Jacob Stöber is a PhD candidate at the Technische Universität München. He is currently visiting Duke University. |
|
|
October 12, 2012
|
McGill Statistics Seminar
|
Elena Rivera Mancia |
Modeling operational risk using a Bayesian approach to EVT |
14:30-15:30
|
BURN 1205 |
| Abstract: |
Extreme Value Theory has been widely used for assessing risk for highly unusual events, either by using block maxima or peaks over the threshold (POT) methods. However, one of the main drawbacks of the POT method is the choice of a threshold, which plays an important role in the estimation since the parameter estimates strongly depend on this value. Bayesian inference is an alternative to handle these difficulties; the threshold can be treated as another parameter in the estimation, avoiding the classical empirical approach. In addition, it is possible to incorporate internal and external observations in combination with expert opinion, providing a natural, probabilistic framework in which to evaluate risk models. In this talk, we analyze operational risk data using a mixture model which combines a parametric form for the center and a GPD for the tail of the distribution, using all observations for inference about the unknown parameters from both distributions, the threshold included. A Bayesian analysis is performed and inference is carried out through Markov Chain Monte Carlo (MCMC) methods in order to determine the minimum capital requirement for operational risk.
|
| Speaker: |
Elena Rivera Mancia is a PhD candidate in our department. Her main supervisor is David A. Stephens, her co-supervisor is Johanna Nešlehová. |
|
| October 19, 2012 |
CRM-ISM-GERAD Colloque de statistique
|
David Madigan |
Observational studies in healthcare: are they any good?
|
14:30-15:30
|
Université de Montréal |
| Abstract: |
Observational healthcare data, such as administrative claims and electronic health records, play an increasingly prominent role in healthcare. Pharmacoepidemiologic studies in particular routinely estimate temporal associations between medical product exposure and subsequent health outcomes of interest, and such studies influence prescribing patterns and healthcare policy more generally. Some authors have questioned the reliability and accuracy of such studies, but few previous efforts have attempted to measure their performance.
The Observational Medical Outcomes Partnership (OMOP, http://omop.fnih.org) has conducted a series of experiments to empirically measure the performance of various observational study designs with regard to predictive accuracy for discriminating between true drug effects and negative controls. In this talk, I describe the past work of the Partnership, explore opportunities to expand the use of observational data to further our understanding of medical products, and highlight areas for future research and development.
(on behalf of the OMOP investigators)
|
| Speaker: |
David Madigan (http://www.stat.columbia.edu/~madigan/) is Professor and Chair, Department of Statistics, Columbia University, New York. An ASA (1999) and IMS (2006) Fellow, he is a recognized authority in data mining; he has just been appointed as Editor for the ASA's journal "Statistical Analysis and Data Mining". He recently served as Editor-in-chief of "Statistical Science". |
|
|
October 26, 2012
|
McGill Statistics Seminar
|
Derek Bingham |
Simulation model calibration and prediction using outputs from multi-fidelity simulators |
14:30-15:30
|
BURN 1205 |
| Abstract: |
Computer simulators are used widely to describe physical processes in lieu of physical observations. In some cases, more than one computer code can be used to explore the same physical system - each with different degrees of fidelity. In this work, we combine field observations and model runs from deterministic multi-fidelity computer simulators to build a predictive model for the real process. The resulting model can be used to perform sensitivity analysis for the system and make predictions with associated measures of uncertainty. Our approach is Bayesian and will be illustrated through a simple example, as well as a real application in predictive science at the Center for Radiative Shock Hydrodynamics at the University of Michigan.
|
| Speaker: |
Derek Bingham is an Associate Professor in the Department of Statistics and Actuarial Science at Simon Fraser University. He holds a Canada Research Chair in Industrial Statistics. |
|
|
November 2, 2012
|
McGill Statistics Seminar
|
Anne-Laure Fougères |
Multivariate extremal dependence: Estimation with bias correction |
14:30-15:30
|
BURN 1205 |
| Abstract: |
Estimating extreme risks in a multivariate framework is highly connected with the estimation of the extremal dependence structure. This structure can be described via the stable tail dependence function L, for which several estimators have been introduced. Asymptotic normality is available for empirical estimates of L, with rate of convergence k^1/2, where k denotes the number of high order statistics used in the estimation. Choosing a higher k might be interesting for an improved accuracy of the estimation, but may lead to an increased asymptotic bias. We provide a bias correction procedure for the estimation of L. Combining estimators of L is done in such a way that the asymptotic bias term disappears. The new estimator of L is shown to allow more flexibility in the choice of k. Its asymptotic behavior is examined, and a simulation study is provided to assess its small sample behavior. This is a joint work with Cécile Mercadier (Université Lyon 1) and Laurens de Haan (Erasmus University Rotterdam).
|
| Speaker: |
Anne-Laure Fougères is Professor of Statistics at Université Claude-Bernard, in Lyon, France. |
|
|
November 9, 2012
|
McGill Statistics Seminar
|
Sidney Resnick |
The multidimensional edge: Seeking hidden risks |
14:30-15:30
|
BURN 1205 |
| Abstract: |
Assessing tail risks using the asymptotic models provided by multivariate extreme value theory has the danger that when asymptotic independence is present (as with the Gaussian copula model), the
asymptotic model provides estimates of probabilities of joint tail regions that are zero. In diverse applications such as finance, telecommunications, insurance and environmental science, it may be difficult to believe in the absence of risk contagion. This problem can be partly ameliorated by using hidden regular variation which assumes a lower order asymptotic behavior on a subcone of the state space and this theory can be made more flexible by extensions in the following directions: (i) higher dimensions than two; (ii) where the lower order variation on a subcone is of extreme value type different from regular variation; and (iii) where the concept is extended to searching for lower order behavior on the complement of the support of the limit measure of regular variation. We discuss some challenges and potential applications to this ongoing effort.
|
| Speaker: |
Sidney Resnick is the Lee Teng Hui Professor in Engineering at the School of Operations Research and Information Engineering, Cornell University. He is the author of several well-known textbooks in probability and extreme-value theory. |
|
|
November 16, 2012
|
McGill Statistics Seminar
|
Taoufik Bouezmarni |
Copula-based regression estimation and Inference |
14:30-15:30
|
BURN 1205 |
| Abstract: |
In this paper we investigate a new approach of estimating a regression function based on copulas. The main idea behind this approach is to write the regression function in terms of a copula and marginal distributions. Once the copula and the marginal distributions are estimated we use the plug-in method to construct the new estimator. Because various methods are available in the literature for estimating both a copula and a distribution, this idea provides a rich and flexible alternative to many existing regression estimators. We provide some asymptotic results related to this copula-based regression modeling when the copula is estimated via profile likelihood and the marginals are estimated nonparametrically. We also study the finite sample performance of the estimator and illustrate its usefulness by analyzing data from air pollution studies.
Joint work with H. Noh and A. El Ghouch from Université catholique de Louvain.
|
| Speaker: |
Taoufik Bouezmarni is an Assistant Professor of Statistics at the Université de Sherbrooke. |
|
| November 23, 2012 |
CRM-ISM-GERAD Colloque de statistique
|
Peter Mueller
|
A nonparametric Bayesian model for local clustering
|
14:30-15:30
|
McGill, Burnside Hall 107
|
| Abstract: |
We propose a nonparametric Bayesian local clustering (NoB-LoC) approach for heterogeneous data. Using genomics data as an example, the NoB-LoC clusters genes into gene sets and simultaneously creates multiple partitions of samples, one for each gene set. In other words, the sample partitions are nested within the gene sets. Inference is guided by a joint probability model on all random elements. Biologically, the model formalizes the notion that biological samples cluster differently with respect to different genetic processes, and that each process is related to only a small subset of genes. These local features are importantly different from global clustering approaches such as hierarchical clustering, which create one partition of samples that applies for all genes in the data set. Furthermore, the NoB-LoC includes a special cluster of genes that do not give rise to any meaningful partition of samples. These genes could be irrelevant to the disease conditions under investigation. Similarly, for a given gene set, the NoB-LoC includes a subset of samples that do not co-cluster with other samples. The samples in this special cluster could, for example, be those whose disease subtype is not characterized by the particular gene set.
This is joint work with Juhee Lee and Yuan Ji.
|
| Speaker: |
Peter Mueller (http://www.math.utexas.edu/users/pmueller/) is Professor, Department of Mathematics, University of Texas at Austin. His research interests include theory and applications of Bayesian nonparametric inference, with applications in genomics, medicine and health sciences. |
|
|
November 30, 2012
|
McGill Statistics Seminar
|
Anne-Sophie Charest |
Sharing confidential datasets using differential privacy |
14:30-15:30
|
BURN 1205 |
| Abstract: |
While statistical agencies would like to share their data with researchers, they must also protect the confidentiality of the data provided by their respondents. To satisfy these two conflicting objectives, agencies use various techniques to restrict and modify the data before publication. Most of these techniques however share a common flaw: their confidentiality protection can not be rigorously measured. In this talk, I will present the criterion of differential privacy, a rigorous measure of the protection offered by such methods. Designed to guarantee confidentiality even in a worst-case scenario, differential privacy protects the information of any individual in the database against an adversary with complete knowledge of the rest of the dataset. I will first give a brief overview of recent and current research on the topic of differential privacy. I will then focus on the publication of differentially-private synthetic contingency tables and present some of my results on the methods for the generation
and proper analysis of such datasets.
|
| Speaker: |
Anne-Sophie Charest is a newly hired Assistant Professor of Statistics at Université Laval, Québec. A McGill graduate, she recently completed her PhD at Carnegie Mellon University, Pittsburgh. |
|
|
December 7, 2012
|
McGill Statistics Seminar
|
Pierre Lafaye de Micheaux
|
Sample size and power determination for multiple comparison procedures aiming at rejecting at least r among m false hypotheses |
14:30-15:30
|
BURN 1205 |
| Abstract: |
Multiple testing problems arise in a variety of situations, notably in clinical trials with multiple endpoints. In such cases, it is often of interest to reject either all hypotheses or at least one of them. More generally, the question arises as to whether one can reject at least r out of m hypotheses. Statistical tools addressing this issue are rare in the literature. In this talk, I will recall well-known hypothesis testing concepts, both in a single- and in a multiple-hypothesis context. I will then present general power formulas for three important multiple comparison procedures: the Bonferroni and Hochberg procedures, as well as Holm’s sequential procedure. Next, I will describe an R package that we developed for sample size calculations in multiple endpoints trials where it is desired to reject at least r out of m hypotheses. This package covers the case where all the variables are continuous and four common variance-covariance patterns. I will show how to use this package to compute the sample size needed in a real-life application.
|
| Speaker: |
Pierre Lafaye de Micheaux is an Associate Professor of Statistics at the Université de Montréal. |
|
| December 14, 2012 |
CRM-ISM-GERAD Colloque de statistique
|
Raymond J. Carroll
|
What percentage of children in the U.S. are eating a healthy diet? A statistical approach
|
14:30-15:30
|
Concordia, Room LB 921-04 |
| Abstract: |
In the United States the preferred method of obtaining dietary intake data is the 24-hour dietary recall, yet the measure of most interest is usual or long-term average daily intake, which is impossible to measure. Thus, usual dietary intake is assessed with considerable measurement error. Also, diet represents numerous foods, nutrients and other components, each of which have distinctive attributes. Sometimes, it is useful to examine intake of these components separately, but increasingly nutritionists are interested in exploring them collectively to capture overall dietary patterns and their effect on various diseases. Consumption of these components varies widely: some are consumed daily by almost everyone on every day, while others are episodically consumed so that 24-hour recall data are zero-inflated. In addition, they are often correlated with each other. Finally, it is often preferable to analyze the amount of a dietary component relative to the amount of energy (calories) in a diet because dietary recommendations often vary with energy level.
We propose the first model appropriate for this type of data, and give the first workable solution to fit such a model. After describing the model, we use survey-weighted MCMC computations to fit the model, with uncertainty estimation coming from balanced repeated replication. The methodology is illustrated through an application to estimating the population distribution of the Healthy Eating Index-2005 (HEI-2005), a multi-component dietary quality index involving ratios of interrelated dietary components to energy, among children aged 2-8 in the United States. We pose a number of interesting questions about the HEI-2005, and show that it is a powerful predictor of the risk of developing colorectal cancer.
|
| Speaker: |
Raymond J. Carroll is a professor of statistics at the Texas A&M University. |
|