EE 525X: Spring 2018

Data Analytics in Electrical and Computer Engineering


Course Summary

Introduction to a variety of data analysis techniques -- particularly those relevant to electrical and computer engineers. Topics include techniques for classification, visualization, and parameter estimation, with applications to signals, images, matrices, and graphs.

This is a graduate-level course focusing on data analysis foundations. Particular emphasis will be placed on the principles of state-of-the-art data processing techniques; on methods to analyze correctness and efficiency of such techniques; and on the evaluation of these techniques on real-world test datasets.

Course Information

Lectures: MW 4:10pm-5:30pm, Science 0277.

Grading: Six (6) homework problem sets (60%); final course project (40%).

Problem sets will be posted roughly once every fortnight on Blackboard. Problem sets will involve a mix of theory and practical implementation.

The final course project can be carried out either individually or teams of two (2). The project will involve either conducting research on a specific topic, or implementing and evaluating modern data analysis techniques on a real-world dataset.

Pre-reqs: Knowledge of undergraduate probability (EE 322 or equivalent); familiarity with linear algebra and optimization.

Syllabus

  • Review of mathematical basics.
  • Classification: Nearest neighbors, kernel methods, SVMs.
  • Matrix analysis: Singular value decomposition, matrix factorization.
  • Graph analysis: Connectivity, trees, random walks.
  • Data visualization: Clustering, nonlinear dimensionality reduction.
  • Streaming algorithms: Distinct elements, frequency moments, heavy hitters.
  • Sampling: Sparse recovery, compressive sensing, matrix completion.

Book

A. Blum, J. Hopcroft, and R. Kannan, Foundations of Data Science.

Notes

Here is a single monograph containing all lecture notes from the Spring '17 edition. Notes were made using (pandoc-flavored) Markdown, which is a real lifesaver if one wants to quickly typeset math-heavy lecture resources. (Thanks, Boaz Barak, for the idea.)

Lecture Schedule (tentative)

  • Introduction, course overview, logistics.
  • Basics of probability.
  • Modeling data in high dimensions.
  • Introduction to supervised learning.
  • Nearest neighbors.
  • The perceptron algorithm.
  • Support vector machines, the kernel trick.
  • Multi-layer networks.
  • Linear regression.
  • Singular value decomposition (SVD).
  • Applications of the SVD.
  • Principal components analysis.
  • Non-negative matrix factorization.
  • Introduction to graphs.
  • Random walks on graphs.
  • Graphs, electrical networks, and resistances.
  • Applications of random walks.
  • Spectral clustering.
  • k-means, hierarchical clustering.
  • Graphical models.
  • Dimensionality reduction.
  • Introduction to streaming algorithms.
  • Frequent elements, heavy hitters.
  • Compressive sensing and matrix completion.