EE 525X: Spring 2017
Data Analytics in Electrical and Computer Engineering
Course SummaryIntroduction to a variety of data analysis techniques -- particularly those relevant to electrical and computer engineers. Topics include techniques for classification, visualization, and parameter estimation, with applications to signals, images, matrices, and graphs.
This is a graduate-level course focusing on data analysis foundations. Particular emphasis will be placed on the principles of state-of-the-art data processing techniques; on methods to analyze correctness and efficiency of such techniques; and on the evaluation of these techniques on real-world test datasets.
Course InformationLectures: MW 4:10pm-5:30pm, Pearson 3131.
Grading: Six (6) homework problem sets (60%); final course project (40%).
Problem sets will be posted roughly once every fortnight on Blackboard. Problem sets will involve a mix of theory and practical implementation.
The final course project can be carried out either individually or teams of two (2). The project will involve either conducting research on a specific topic, or implementing and evaluating modern data analysis techniques on a real-world dataset.
Pre-reqs: Knowledge of undergraduate probability (EE 322 or equivalent); familiarity with linear algebra and optimization.
- Review of mathematical basics.
- Classification: Nearest neighbors, kernel methods, SVMs.
- Matrix analysis: Singular value decomposition, matrix factorization.
- Graph analysis: Connectivity, trees, random walks.
- Data visualization: Clustering, nonlinear dimensionality reduction.
- Streaming algorithms: Distinct elements, frequency moments, heavy hitters.
- Sampling: Sparse recovery, compressive sensing, matrix completion.
BookA. Blum, J. Hopcroft, and R. Kannan, Foundations of Data Science.
NotesHere is a single monograph containing all lecture notes from the Spring '17 edition. Notes were made using (pandoc-flavored) Markdown, which is a real lifesaver if one wants to quickly typeset math-heavy lecture resources. (Thanks, Boaz Barak, for the idea.)
Lecture Schedule (tentative)
- Introduction, course overview, logistics.
- Basics of probability.
- Modeling data in high dimensions.
- Introduction to supervised learning.
- Nearest neighbors.
- The perceptron algorithm.
- Support vector machines, the kernel trick.
- Multi-layer networks.
- Linear regression.
- Singular value decomposition (SVD).
- Applications of the SVD.
- Principal components analysis.
- Non-negative matrix factorization.
- Introduction to graphs.
- Random walks on graphs.
- Graphs, electrical networks, and resistances.
- Applications of random walks.
- Spectral clustering.
- k-means, hierarchical clustering.
- Graphical models.
- Dimensionality reduction.
- Introduction to streaming algorithms.
- Frequent elements, heavy hitters.
- Compressive sensing and matrix completion.