MATH 783: Advanced Topics in Statistics: Machine Learning (4 credits)

Winter 2019

Syllabus

Scribe

Class Information

Instructor: Yi Yang, Burnside Hall 1241
Lectures:
- Monday; 10:05 AM - 11:25 AM Burnside Hall 1205
- Wednesday; 3:05 PM – 4:25 PM, Burnside Hall 1234
Office hours:
- Wednesday; 4:00 PM – 6:00 PM, Burnside Hall 1241

Textbook

SB: Convex Optimization: Algorithms and Complexity, by Sebastien Bubeck.
PP: Non-convex Optimization for Machine Learning, by Prateek Jain and Purushottam Kar.
SLS: Statistical Learning with Sparsity, by Trevor Hastie, Robert Tibshirani and Martin Wainwright. Chapter 7 and Chapter 10.

Part 1 - Introduction

lecture notes

Topics:

Applications: sparse recovery, low-rank matrix recovery
Applications: robust linear regression, phase retrieval

Reading:

SB Chapter 1, 2
PP Chapter 1

Part 2 - Mathematical Tools

lecture notes

Topics:

Convex analysis
Convex projections
Projected gradient descent
Convergence guarantees for PGD

Reading:

PP Chapter 2
SB Chapter 3
L1 Projection: Duchi, John, et al. Efficient projections onto the L1-ball for learning in high dimensions, ICML 2008.
Multitask L1 Penalization: Fontaine, S., et al. A unified approach to sparse tweedie modeling of multi-source insurance claim data, Technometrics, revision. Supplemental Materials
More detailed proof on projected gradient descent convergence

Part 3 - Non-Convex Projected Gradient Descent

lecture notes

Topics:

Non-convex projections
Restricted strong convexity and smoothness
Generalized projected gradient descent

Reading:

PP Chapter 3

Part 4 - Application: Sparse Recovery

lecture notes

Topics:

Problem formulation
Sparse regression: two perspectives
Sparse recovery via projected gradient descent
Restricted isometry and other design properties
Ensuring RIP and other Properties
A sparse recovery guarantee for IHT

Reading:

PP Chapter 7
Nullspace Property: Cohen, Albert, et al. Compressed sensing and best k-term approximation, JAMS 2009.
Restricted Eigenvalue Property: Raskutti, Garvesh, et al. Restricted eigenvalue properties for correlated gaussian designs, JMLR 2010.
Restricted Isometry Property: Emmanuel Candes and Terence Tao. Decoding by linear programming, IEEE Transactions on Information Theory 2005.
Restricted Strong Convexity/Smoothness Property: Jain et al. On iterative hard thresholding methods for high-dimensional m-Estimation, NIPS 2014.

Part 5 - Signal Approximation and Compressed Sensing

lecture notes

Topics:

Signals and Sparse Representations
- Orthogonal Bases
- Approximation in Orthogonal Bases
- Reconstruction in Overcomplete Bases
Random Projection and Approximation
- Johnson-Lindenstrauss Approximation
- Compressed Sensing

Reading:

SLS Chapter 10
Compressive Sensing – A 25 Minute Tour by Emmanuel Candes
Single-pixel Camera
A Faster Single-pixel Camera
Compressed Sensing R example

Part 6 - Alternating Minimization

lecture notes

Topics:

Marginal convexity and other properties
Generalized alternating minimization
A convergence guarantee for gAM for convex problems
A convergence guarantee for gAM under MSC/MSS

Reading:

Coordinate descent (coxnet): Simon, N., Friedman, J., Hastie, T., Tibshirani, R. Regularization paths for Cox's proportional hazards model via coordinate descent. JSS 2011.
Coordinate descent (glmnet): Friedman, J., Hastie, T., & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. JSS 2010.

Part 7 - The EM Algorithm

lecture notes

Topics:

The EM algorithm
Implementing the E/M steps
A monotonicity guarantee for EM
Local strong convavity and local strong smoothness
A local convergence guarantee for EM

Reading:

EM Convergence: Balakrishnan, S., Wainwright, M. J., Yu, B. Statistical guarantees for the EM algorithm: From population to sample-based analysis. The Annals of Statistics, 2017.

Part 8 - Stochastic Optimization Techniques

lecture notes

Topics:

Motivating applications
Saddles and why they proliferate
The strict saddle property
The noisy gradient descent algorithm
A local convergence guarantee for NGD

Reading:

Tensor Decompositions: Anandkumar, A., Ge, R., Hsu, D., Kakade, S. M., and Telgarsky, M. Tensor decompositions for learning latent variable models. JMLR 2014.
Stochastic Gradient: Ge, R., Huang, F., Jin, C., and Yuan, Y. Escaping from saddle points-online stochastic gradient for tensor decomposition. COLT 2015.
Online Alternating Minimization: Choromanska, A., Cowen, B., Kumaravel, S., Luss, R., Rigotti, M., Rish, I. and Bouneffouf, D. Beyond backprop: online alternating minimization with auxiliary variables.

MATH 783: Advanced Topics in Statistics: Machine Learning (4 credits)

Winter 2019

Class Information

Textbook

Part 1 - Introduction

Part 2 - Mathematical Tools

Part 3 - Non-Convex Projected Gradient Descent

Part 4 - Application: Sparse Recovery

Part 5 - Signal Approximation and Compressed Sensing

Part 6 - Alternating Minimization

Part 7 - The EM Algorithm

Part 8 - Stochastic Optimization Techniques

Homework