# Department of Statistics and Data Science

Christopher R. Genovese, Department Head

Rebecca Nugent, Director of Undergraduate Studies

Samantha Nielsen, Lead Academic Advisor

Glenn Clune, Academic Advisor

Email: statadvising@stat.cmu.edu

Location: Baker Hall 132

www.stat.cmu.edu/

## Overview

Uncertainty is inescapable: randomness, measurement error, deception, and incomplete or missing information complicate all our lives. Statistics is the science and art of making predictions and decisions in the face of uncertainty. Statistical issues are central to big questions in public policy, law, medicine, industry, computing, technology, finance, and science. Indeed, the tools of Statistics apply to problems in almost every area of human activity where data are collected.

Statisticians must master diverse skills in computing, mathematics, decision making, forecasting, interpretation of complicated data, and design of meaningful comparisons. Moreover, statisticians must learn to collaborate effectively with people in other fields and, in the process, to understand the substance of these other fields. For all these reasons, Statistics students are highly sought-after in the marketplace.

Recent Statistics majors at Carnegie Mellon have taken jobs at leading companies in many fields, including the National Economic Research Association, Boeing, Morgan Stanley, Deloitte, Rosetta Marketing Group, Nielsen, Proctor and Gamble, Accenture, and Goldman Sachs. Other students have taken research positions at the National Security Agency, the U.S. Census Bureau, and the Science and Technology Policy Institute or worked for Teach for America. Many of our students have also gone on to graduate study at some of the top programs in the country including Carnegie Mellon, the Wharton School at the University of Pennsylvania, Johns Hopkins, University of Michigan, Stanford University, Harvard University, Duke University, Emory University, Yale University, Columbia University, and Georgia Tech.

### The Department and Faculty

The Department of Statistics and Data Science at Carnegie Mellon University is world-renowned for its contributions to statistical theory and practice. Research in the department runs the gamut from pure mathematics to the hottest frontiers of science. Current research projects are helping make fundamental advances in neuroscience, cosmology, public policy, finance, and genetics.

The faculty members are recognized around the world for their expertise and have garnered many prestigious awards and honors. (For example, three members of the faculty have been awarded the COPSS medal, the highest honor given by professional statistical societies.) At the same time, the faculty is firmly dedicated to undergraduate education. The entire faculty, junior and senior, teach courses at all levels. The faculty are accessible and are committed to involving undergraduates in research.

The Department augments all these strengths with a friendly, energetic working environment and exceptional computing resources. Talented graduate students join the department from around the world, and add a unique dimension to the department's intellectual life. Faculty, graduate students, and undergraduates interact regularly.

### How to Take Part

There are many ways to get involved in Statistics at Carnegie Mellon:

- The Bachelor of Science in Statistics in the Dietrich College of Humanities and Social Sciences (DC) is a broad-based, flexible program that helps you master both the theory and practice of Statistics. The program can be tailored to prepare you for later graduate study in Statistics or to complement your interests in almost any field, including Psychology, Physics, Biology, History, Business, Information Systems, and Computer Science.
- The Minor (or Additional Major) in Statistics is a useful complement to a (primary) major in another Department or College. Almost every field of inquiry must grapple with statistical problems, and the tools of statistical theory and data analysis you will develop in the Statistics minor (or Additional Major) will give you a critical edge.
- The Bachelor of Science in Economics and Statistics provides an interdisciplinary course of study aimed at students with a strong interest in the empirical analysis of economic data. Jointly administered by the Department of Statistics and Data Science and the Undergraduate Economics Program, the major's curriculum provides students with a solid foundation in the theories and methods of both fields. (See Dietrich College Interdepartmental Majors as well later in this section)
- The Bachelor of Science in Statistics and Machine Learning is a program housed in the Department of Statistics and Data Science and is jointly administered with the Department of Machine Learning. In this major students take courses focused on skills in computing, mathematics, statistical theory, and the interpretation and display of complex data. The program is geared toward students interested in statistical computation, data science, and "big data" problems.
- The Statistics Concentration and the Operations Research and Statistics Concentration in the Mathematical Sciences Major (see Department of Mathematical Sciences) are administered by the Department of Mathematical Sciences with input from the Department of Statistics and Data Science.
- There are several ongoing exciting research projects in the Department of Statistics and Data Science, and the department enthusiastically seeks to involve undergraduates in this work. Both majors and non-majors are welcome.
- Non-majors are eligible to take most of our courses, and indeed, they are required to do so by many programs on campus. Such courses offer one way to learn more about the Department of Statistics and Data Science and the field in general.

## Curriculum

Statistics consists of two intertwined threads of inquiry: Statistical Theory and Data Analysis. The former uses probability theory to build and analyze mathematical models of data in order to devise methods for making effective predictions and decisions in the face of uncertainty. The latter involves techniques for extracting insights from complicated data, designs for accurate measurement and comparison, and methods for checking the validity of theoretical assumptions. Statistical Theory informs Data Analysis and vice versa. The Department of Statistics and Data Science curriculum follows both of these threads and helps the student develop the complementary skills required.

Throughout the sections of this catalog, we describe the requirements for the Major in Statistics and the different categories within our basic curriculum, followed by the requirements for the Major in Economics and Statistics, the Major in Statistics and Machine Learning, and the Minor in Statistics.

**Note:** We recommend that you use the information provided below as a general guideline, and then schedule a meeting with a Statistics Undergraduate Advisor (statadvising@stat.cmu.edu) to discuss the requirements in more detail, and build a program that is tailored to your strengths and interests.

## B.S. in Statistics

Glenn Clune, Academic Advisor

Peter Freeman, Faculty Advisor

Location: Baker Hall 132

statadvising@stat.cmu.edu

Students in the Bachelor of Science program develop and master a wide array of skills in computing, mathematics, statistical theory, and the interpretation and display of complex data. In addition, Statistics majors gain experience in applying statistical tools to real problems in other fields and learn the nuances of interdisciplinary collaboration. The requirements for the Major in Statistics are detailed below and are organized by categories #1-#7.

### Curriculum

#### 1. Mathematical Foundations (Prerequisites)29–39 units

Mathematics is the language in which statistical models are described and analyzed, so some experience with basic calculus and linear algebra is an important component for anyone pursuing a program of study in Statistics.

##### Calculus*:

Complete one of the following three sequences of mathematics courses at Carnegie Mellon, each of which provides sufficient preparation in calculus:

###### Sequence 1

21-111 | Differential Calculus | 10 |

21-112 | Integral Calculus | 10 |

and one of the following | ||

21-256 | Multivariate Analysis | 9 |

21-259 | Calculus in Three Dimensions | 9 |

###### Sequence 2

21-120 | Differential and Integral Calculus | 10 |

and one of the following | ||

21-256 | Multivariate Analysis | 9 |

21-259 | Calculus in Three Dimensions | 9 |

**Notes:**

**Linear Algebra**:**

Complete *one *of the following three courses:

21-240 | Matrix Algebra with Applications | 10 |

21-241 | Matrices and Linear Transformations | 10 |

21-242 | Matrix Theory | 10 |

* It is recommended that students complete the calculus requirement during their freshman year.

**The linear algebra requirement needs to be completed before taking 36-401 Modern Regression.

21-241 and 21-242 are intended only for students with a very strong mathematical background.

#### 2. Data Analysis:36–45 units

Data analysis is the art and science of extracting insight from data. The art lies in knowing which displays or techniques will reveal the most interesting features of a complicated data set. The science lies in understanding the various techniques and the assumptions on which they rely. Both aspects require practice to master.

The Beginning Data Analysis courses give a hands-on introduction to the art and science of data analysis. The courses cover similar topics but differ slightly in the examples they emphasize. 36-200 draws examples from many fields and satisfy the DC College Core Requirement in Statistical Reasoning. This course is therefore recommended for students in the College. (Note: A score of 4 or 5 on the Advanced Placement (AP) Exam in Statistics may be used to waive this requirement). Other courses emphasize examples in business (36-207), engineering and architecture (36-220), and the laboratory sciences (36-247).

The Intermediate Data Analysis courses build on the principles and methods covered in the introductory course, and more fully explore specific types of data analysis methods in more depth.

The Advanced Data Analysis courses draw on students' previous experience with data analysis and understanding of statistical theory to develop advanced, more sophisticated methods. These core courses involve extensive analysis of real data with emphasis on developing the oral and writing skills needed for communicating results.

##### Sequence 1 (For students beginning their freshman or sophomore year)

###### Beginning*

Choose *one* of the following courses:

36-200 | Reasoning with Data | 9 |

36/70-207 | Probability and Statistics for Business Applications | 9 |

36-220 | Engineering Statistics and Quality Control | 9 |

36-247 | Statistics for Lab Sciences | 9 |

Note: Students who enter the program with 36-225 or 36-226 should discuss options with an advisor. Any 36-300 or 36-400 level course in Data Analysis that does not satisfy any other requirement for a Statistics Major and Minor may be counted as a Statistical Elective.

###### Intermediate*

Choose *one* of the following courses:

36-202 | Statistics & Data Science Methods ^{**} | 9 |

36/70-208 | Regression Analysis | 9 |

36-309 | Experimental Design for Behavioral & Social Sciences | 9 |

36-290 | Introduction to Statistical Research Methodology | 9 |

*Or extra data analysis course in Statistics | ||

** Must take prior to 36-401 |

###### Advanced

Choose *one* of the following courses:

36-303 | Sampling, Survey and Society | 9 |

36-315 | Statistical Graphics and Visualization | 9 |

36-311 | Statistical Analysis of Networks | 9 |

36-461 | Special Topics: Statistical Methods in Epidemiology | 9 |

36-462 | Special Topics: Data Mining | 9 |

36-463 | Special Topics: Multilevel and Hierarchical Models | 9 |

36-464 | Special Topics: Applied Multivariate Methods | 9 |

36-466 | Special Topics: Statistical Methods in Finance | 9 |

36-467 | Special Topics: Data over Space & Time | 9 |

36-468 | Special Topics: Text Analysis | 9 |

36-490 | Undergraduate Research | 9 |

36-497 | Corporate Capstone Project | 9 |

and take the following two courses: | ||

36-401 | Modern Regression | 9 |

36-402 | Advanced Methods for Data Analysis | 9 |

Students can also take a second 36-46x (see section #5).

##### Sequence 2 (For students beginning later in their college career)

###### Advanced

Choose *two* of the following courses:

36-303 | Sampling, Survey and Society | 9 |

36-311 | Statistical Analysis of Networks | 9 |

36-315 | Statistical Graphics and Visualization | 9 |

36-461 | Special Topics: Statistical Methods in Epidemiology | 9 |

36-462 | Special Topics: Data Mining | 9 |

36-463 | Special Topics: Multilevel and Hierarchical Models | 9 |

36-464 | Special Topics: Applied Multivariate Methods | 9 |

36-466 | Special Topics: Statistical Methods in Finance | 9 |

36-467 | Special Topics: Data over Space & Time | 9 |

36-468 | Special Topics: Text Analysis | 9 |

36-490 | Undergraduate Research | 9 |

36-497 | Corporate Capstone Project | 9 |

**All Special Topics are not offered every semester, and new Special Topics are regularly added. See section 5 for details.

__and__ take the following *two* courses:

36-401 | Modern Regression | 9 |

36-402 | Advanced Methods for Data Analysis | 9 |

#### 3. Probability Theory and Statistical Theory:18 units

The theory of probability gives a mathematical description of the randomness inherent in our observations. It is the language in which statistical models are stated, so an understanding of probability is essential for the study of statistical theory. Statistical theory provides a mathematical framework for making inferences about unknown quantities from data. The theory reduces statistical problems to their essential ingredients to help devise and evaluate inferential procedures. It provides a powerful and wide-ranging set of tools for dealing with uncertainty.

To satisfy the theory requirement take the following two courses:

36-225 | Introduction to Probability Theory ^{**} | 9 |

and one of the following two courses: | ||

36-226 | Introduction to Statistical Inference | 9 |

36-326 | Mathematical Statistics (Honors) | 9 |

**It is possible to substitute 36-217, 36-218, or 21-325 for 36-225. (36-225 is the standard introduction to probability, 36-217 is tailored for engineers and computer scientists, 36-218 is a more mathematically rigorous class for Computer Science students and more mathematically advanced Statistics students (Statistics students need advisor approval to enroll),and 21-325 is a rigorous probability theory course offered by the Department of Mathematics.)

__Comment____:__

(i) In order meet the prerequisite requirements for the major, a grade of C or better is required in 36-225, 36-226 and 36-401. In particular, a grade of C or higher is required in order to be able to continue in the major.

#### 4. Statistical Computing:9 units

36-350 | Statistical Computing ^{*} | 9 |

*In rare circumstances, a higher level *Statistical Computing* course, approved by your Statistics advisor, may be used as a substitute.

#### 5. Special Topics9 units

The Department of Statistics and Data Science offers advanced courses that focus on specific statistical applications or advanced statistical methods. These courses are numbered 36-46x (36-461, 36-462, etc.). Two of these courses will be offered every year, one per semester. Past topics included Statistical Learning, Data Mining, Statistics and the Law, Bayesian Statistics, Nonparametric Statistics, Statistical Genetics, Multilevel and Hierarchical Models, and Statistical Methods in Epidemiology. The objective of the course is to expose students to important topics in statistics and/or interesting applications which are not part of the standard undergraduate curriculum.

To satisfy the Special Topics requirement choose *one* of the **36-46x** courses (which are 9 units).

Note: All 36-46x courses require 36-401 as a prerequisite or corequisite.

#### 6. Statistical Elective:9–10 units

Students are required to take one elective which can be within or outside the Department of Statistics and Data Science. **Courses within Statistics **can be any 300 or 400 level course (that is not used to satisfy any other requirement for the statistics major).

The following is a __partial__ list of **courses outside Statistics** that qualify as electives as they provide intellectual infrastructure that will advance the student's understanding of statistics and its applications. Other courses may qualify as well; consult with the Statistics Undergraduate Advisor.

15-110 | Principles of Computing | 10 |

15-112 | Fundamentals of Programming and Computer Science | 12 |

15-121 | Introduction to Data Structures | 10 |

15-122 | Principles of Imperative Computation | 10 |

10-301 | Introduction to Machine Learning | 12 |

10-315 | Introduction to Machine Learning (Undergrad) | 12 |

15-388 | Practical Data Science | 9 |

21-127 | Concepts of Mathematics | 10 |

21-260 | Differential Equations | 9 |

21-292 | Operations Research I | 9 |

21-301 | Combinatorics | 9 |

21-355 | Principles of Real Analysis I | 9 |

80-220 | Philosophy of Science | 9 |

80-221 | Philosophy of Social Science | 9 |

80-310 | Formal Logic | 9 |

85-310 | Research Methods in Cognitive Psychology | 9 |

85-320 | Research Methods in Developmental Psychology | 9 |

85-340 | Research Methods in Social Psychology | 9 |

88-223 | Decision Analysis | 12 |

88-302 | Behavioral Decision Making | 9 |

Note: Additional prerequisites are required for some of these courses. Students should carefully check the course descriptions to determine if additional prerequisites are necessary.

**7. Tracks*:**

##### Self-Defined Concentration Area (with advisor's approval)36 units

The power of Statistics, and much of the fun, is that it can be applied to answer such a wide variety of questions in so many different fields. A critical part of statistical practice is understanding the questions being asked so that appropriate methods of analysis can be used. Hence, a critical part of statistical training is to gain experience applying the abstract tools to real problems.

The Concentration Area is a set of four related courses outside of Statistics that prepares the student to deal with statistical aspects of problems that arise in another field. These courses are usually drawn from a *single* discipline of interest to the student and must be approved by the Statistics Undergraduate Advisor. While these courses are not in Statistics, the concentration area must compliment the overall Statistics degree.

For example, students intending to pursue careers in the health or biomedical sciences could take further courses in Biology or Chemistry, or students intending to pursue graduate work in Statistics could take further courses in advanced Mathematics.

The concentration area can be fulfilled with a minor or additional major, but __not all minors and additional majors fulfill this requirement__. Please make sure to consult the Undergraduate Statistics Advisor

*prior*to pursuing courses for the concentration area. Once the concentration area is approved, any changes made to the previously agreed upon coursework requires re-approval by the Undergraduate Advisor.

__Concentration Approval Process__

- Submit the below materials to the Undergraduate Statistics Advisor
- List of possible coursework to fulfill the concentration*
- 150-200 word essay describing how the proposed courses complement the Statistics degree.

* These courses can be amended later, but must be re-approved by the Statistics Undergraduate Advisor if amended.

* Note: The concentration/track requirement is only for students whose *primary* major is statistics and have no other additional major or minor. The requirement does not apply for students who pursue an *additional* major in statistics.

##### Mathematical Statistics Track46–52 units

21-127 | Concepts of Mathematics | 10 |

21-355 | Principles of Real Analysis I | 9 |

36-410 | Introduction to Probability Modeling | 9 |

And *two* of the following:

36-700 | Probability and Mathematical Statistics | 12 |

or 36-705 | Intermediate Statistics | |

21-228 | Discrete Mathematics | 9 |

21-257 | Models and Methods for Optimization | 9 |

21-292 | Operations Research I | 9 |

21-301 | Combinatorics | 9 |

21-356 | Principles of Real Analysis II | 9 |

##### Statistics and Neuroscience Track45–54 units

85-211 | Cognitive Psychology | 9 |

85-219 | Biological Foundations of Behavior | 9 |

And three electives (at least one from Methodology and Analysis and at least one from Neuroscientific Background):

###### Methodology and Analysis

36-700 | Probability and Mathematical Statistics | 12 |

or 36-705 | Intermediate Statistics | |

10-301 | Introduction to Machine Learning | 12 |

18-290 | Signals and Systems | 12 |

85-314 | Cognitive Neuroscience Research Methods | 9 |

42/86-631 | Neural Data Analysis | 9 |

###### Neuroscience Background

03-362 | Cellular Neuroscience | 9 |

03-363 | Systems Neuroscience | 9 |

15-386 | Neural Computation | 9 |

85-414 | Cognitive Neuropsychology | 9 |

85-419 | Introduction to Parallel Distributed Processing | 9 |

Total Number of Units for the Major: | 146-185* |

Total Number of Units for the Degree: | 360 |

^{* Note: This number can vary depending on the calculus sequence and on the concentration area a student takes. In addition this number includes the 36 units of the “Concentration Area” category which may not be required (see category 7 above for details).}

### Recommendations

Students in the College of Humanities and Social Sciences who wish to major or minor in Statistics are advised to complete both the calculus requirement (one Mathematical Foundations calculus sequence) and the Beginning Data Analysis course 36-200 by the end of their Freshman year.

The linear algebra requirement is a prerequisite for the course 36-401. It is therefore essential to complete this requirement during your junior year at the latest.

### Recommendations for Prospective PhD Students

Students interested in pursuing a PhD in Statistics or Biostatistics (or related programs) after completing their undergraduate degree are strongly recommended to pursue the **Mathematical Statistics Track**.

### Additional Major in Statistics

Students who elect Statistics as a second or third major must fulfill all Statistics degree requirements except for the Concentration Area requirement. Majors in many other programs would naturally complement a Statistics Major, including Tepper's undergraduate business program, Social and Decision Sciences, Policy and Management, and Psychology.

With respect to double-counting courses, it is departmental policy that students must have at least five statistics courses that do not count for their primary major. If students do not have at least five, they typically take additional advanced data analysis electives.

Students are advised to begin planning their curriculum (with appropriate advisors) as soon as possible. This is particularly true if the other major has a complex set of requirements and prerequisites or when many of the other major's requirements overlap with the requirements for a Major in Statistics.

Many departments require Statistics courses as part of their Major or Minor programs. Students seeking transfer credit for those requirements from substitute courses (at Carnegie Mellon or elsewhere) should seek permission from their advisor in the department setting the requirement. The final authority in such decisions rests there. The Department Statistics does not provide approval or permission for substitution or waiver of another department's requirements.

If a waiver or substitution is made in the home department, it is not automatically approved in the Department of Statistics and Data Science. In many of these cases, the student will need to take additional courses to satisfy the Statistics major requirements. Students should discuss this with a Statistics advisor when deciding whether to add an additional major in Statistics.

### Research

One goal of the Statistics program is to give students experience with statistical research. The department gives students research experience through various courses focused on real world experiences and application. There is a variety of research projects in the department as well, and students who would like to pursue working on a project with faculty will need to contact that faculty directly to discuss that possibility.

Before graduation, students are encouraged to participate in a research project under faculty supervision. Students mostly do this through projects in specific courses, such as 36-290, 36-303, 36-490, and/or 36-497. Students can also pursue an independent study, or a summer research position.

Qualified students are also encouraged to participate in an advanced research project through 36-490 Undergraduate Research or 36-497 Corporate Capstone Project. Note that both of these courses require an application. Students who maintain a quality point average of 3.25 overall may also apply to participate in the Dietrich College Senior Honors Program.

### Sample Programs

The following sample programs illustrate three (of many) ways to satisfy the requirements of the Statistics Major. However, keep in mind that the program is flexible enough to support *many* other possible schedules and to emphasize a wide variety of interests.

The first schedule uses calculus sequence 1.

The second schedule is an example of the case when a student enters the program through 36-225 and 36-226 (and therefore skips the beginning data analysis sequence). This schedule has more emphasis on statistical theory and probability.

The third schedule is an example of the Mathematical Statistics track.

In these schedules, C.A. refers to Concentration Area courses.

#### Schedule 1

Freshman | Sophomore | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

36-200 Reasoning with Data | 36-202 Statistics & Data Science Methods | 21-256 Multivariate Analysis | 21-240 Matrix Algebra with Applications |

21-111 Differential Calculus | 21-112 Integral Calculus | C.A. |

Junior | Senior | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

36-225 Introduction to Probability Theory | 36-226 Introduction to Statistical Inference | 36-401 Modern Regression | 36-402 Advanced Methods for Data Analysis |

Stat Elective | 36-350 Statistical Computing | C.A. | 36-46x - Special Topics |

C.A. | C.A. |

#### Schedule 2

Freshman | Sophomore | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

21-120 Differential and Integral Calculus | 21-256 Multivariate Analysis | 36-225 Introduction to Probability Theory | 36-226 Introduction to Statistical Inference |

36-200 Reasoning with Data | 21-240 Matrix Algebra with Applications |

Junior | Senior | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

36-350 Statistical Computing | 36-402 Advanced Methods for Data Analysis | 36-46x Special Topics | Stat Elective |

36-401 Modern Regression | Stat Elective | C.A. | C.A. |

C.A. | C.A. |

#### Schedule 3 - Mathematics Track Only

Freshman | Sophomore | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

21-120 Differential and Integral Calculus | 21-256 Multivariate Analysis | 36-225 Introduction to Probability Theory | 36-226 Introduction to Statistical Inference |

21-260 Differential Equations | 21-127 Concepts of Mathematics | 21-241 Matrices and Linear Transformations |

Junior | Senior | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

36-350 Statistical Computing | 36-402 Advanced Methods for Data Analysis | 36-46x Special Topics | 36-410 Introduction to Probability Modeling |

36-401 Modern Regression | Stat Elective | 21-355 Principles of Real Analysis I | Stat Elective |

21-228 Discrete Mathematics | 21-341 Linear Algebra |

## B.S. in Economics and Statistics

Samantha Nielsen, *Statistics & Data Science Lead Academic Advisor*

Kathleen Conway, *Economics Senior Academic Advisor*

Rebecca Nugent and Edward Kennedy, *Faculty Advisors*

Carol Goldburg, Executive Director, *Undergraduate Economics Program*

Statistics & Data Science Location: Baker Hall 132

statadvising@stat.cmu.edu

Economics Location: Tepper 2400

econprog@andrew.cmu.edu

*The B.S. in Economics and Statistics is jointly advised by the Department of Statistics and Data Science and the Undergraduate Economics Program.*

The Major in Economics and Statistics provides an interdisciplinary course of study aimed at students with a strong interest in the empirical analysis of economic data. With joint curriculum from the Department of Statistics and Data Science and the Undergraduate Economics Program, the major provides students with a solid foundation in the theories and methods of both fields. Students in this major are trained to advance the understanding of economic issues through the analysis, synthesis and reporting of data using the advanced empirical research methods of statistics and econometrics. Graduates are well positioned for admission to competitive graduate programs, including those in statistics, economics and management, as well as for employment in positions requiring strong analytic and conceptual skills - especially those in economics, finance, education, and public policy.

All economics courses counting towards an economics degree must be completed with a grade of "C" or higher.

The requirements for the B.S. in Economics and Statistics are the following:

#### I. Prerequisites38-39 units

##### 1. Mathematical Foundations38-39 units

**Calculus**

21-120 | Differential and Integral Calculus | 10 |

and *one* of the following:

21-256 | Multivariate Analysis | 9 |

21-259 | Calculus in Three Dimensions | 9 |

__Note__: Passing the MSC 21-120 assessment test is an acceptable alternative to completing 21-120.

__Note__: Taking/having credit for both 21-111 and 21-112 is equivalent to 21-120. The Mathematical Foundations total is then 48-49 units. The Economics and Statistics major would then total 201-211 units.

**Linear Algebra**

*One* of the following three courses:

21-240 | Matrix Algebra with Applications | 10 |

21-241 | Matrices and Linear Transformations | 10 |

21-242 | Matrix Theory | 10 |

__Note__: 21-241 and 21-242 are intended only for students with a very strong mathematical background.

#### II. Foundations18-36 units

##### 2. Economics Foundations18 units

73-102 | Principles of Microeconomics | 9 |

73-103 | Principles of Macroeconomics | 9 |

##### 3. Statistical Foundations9-18 units

__Sequence 1 (For students beginning their freshman or sophomore year) __

###### Beginning*

Choose *one* of the following courses:

36-200 | Reasoning with Data | 9 |

36/70-207 | Probability and Statistics for Business Applications | 9 |

36-220 | Engineering Statistics and Quality Control | 9 |

36-247 | Statistics for Lab Sciences | 9 |

Note: Students who enter the program with 36-225 or 36-226 should discuss options with an advisor. Any 36-300 or 36-400 level course in Data Analysis that does not satisfy any other requirement for the Economics and Statistics Major may be counted as a Statistical Elective.

###### Intermediate*

Choose *one* of the following courses:

36-202 | Statistics & Data Science Methods ^{**} | 9 |

36-208 | Regression Analysis | 9 |

36-290 | Introduction to Statistical Research Methodology | 9 |

36-309 | Experimental Design for Behavioral & Social Sciences | 9 |

* | Or extra data analysis course in Statistics |

** | Must take prior to 36-401 Modern Regression. |

**Advanced**

Choose *two* of the following courses:

36-303 | Sampling, Survey and Society | 9 |

36-311 | Statistical Analysis of Networks | 9 |

36-315 | Statistical Graphics and Visualization | 9 |

36-461 | Special Topics: Statistical Methods in Epidemiology | 9 |

36-462 | Special Topics: Data Mining | 9 |

36-463 | Special Topics: Multilevel and Hierarchical Models | 9 |

36-464 | Special Topics: Applied Multivariate Methods | 9 |

36-466 | Special Topics: Statistical Methods in Finance | 9 |

36-467 | Special Topics: Data over Space & Time | 9 |

36-468 | Special Topics: Text Analysis | 9 |

36-490 | Undergraduate Research | 9 |

36-497 | Corporate Capstone Project | 9 |

__Sequence 2 (For students beginning later in their college career) __

**Advanced**

Choose *three* of the following courses:

36-303 | Sampling, Survey and Society | 9 |

36-311 | Statistical Analysis of Networks | 9 |

36-315 | Statistical Graphics and Visualization | 9 |

36-461 | Special Topics: Statistical Methods in Epidemiology | 9 |

36-462 | Special Topics: Data Mining | 9 |

36-463 | Special Topics: Multilevel and Hierarchical Models | 9 |

36-464 | Special Topics: Applied Multivariate Methods | 9 |

36-466 | Special Topics: Statistical Methods in Finance | 9 |

36-467 | Special Topics: Data over Space & Time | 9 |

36-468 | Special Topics: Text Analysis | 9 |

36-490 | Undergraduate Research | 9 |

36-497 | Corporate Capstone Project | 9 |

**All Special Topics are not offered every semester, and new Special Topics are regularly added. See section 5 for details.

#### III. Disciplinary Core126 units

##### 1. Economics Core45 units

73-230 | Intermediate Microeconomics | 9 |

73-240 | Intermediate Macroeconomics | 9 |

73-270 | Professional Communication for Economists | 9 |

73-265 | Economics and Data Science | 9 |

73-274 | Econometrics I | 9 |

73-374 | Econometrics II | 9 |

##### 2. Statistics Core36 units

36-225 | Introduction to Probability Theory ^{*#} | 9 |

and *one* of the following* *two courses:

36-226 | Introduction to Statistical Inference ^{*} | 9 |

36-326 | Mathematical Statistics (Honors) ^{*} | 9 |

and *both* of the following two courses:

36-401 | Modern Regression ^{*} | 9 |

36-402 | Advanced Methods for Data Analysis | 9 |

*In order meet the prerequisite requirements for the major, a grade of C or better is required in 36-225 (or equivalents), 36-226 or 36-326 and 36-401.

#It is possible to substitute 36-217, 36-218, or 21-325 for 36-225 36-225 36-22536-225. (36-225 36-225 36-22536-225 is the standard introduction to probability, 36-217 is tailored for engineers and computer scientists, 36-218 is a more mathematically rigorous class for Computer Science students and more mathematically advanced Statistics students (Statistics students need advisor approval to enroll), and 21-325 21-325 21-32521-325 is a rigorous Probability Theory course offered by the Department of Mathematics.)

##### 3. Computing9 units

36-350 | Statistical Computing ^{*} | 9 |

*In rare circumstances, a higher level __ Statistical__ Computing course, approved by your Statistics advisor, may be used as a substitute.

##### 4. Advanced Electives36 units

Students must take two advanced Economics elective courses (numbered 73-300 through 73-495, excluding 73-374 ) and two (or three - depending on previous coursework, see Section 3) advanced Statistics elective courses (numbered 36-303, 36-311, 36-315, 36-46x, 36-490, or 36-497).

Students pursuing a degree in Economics and Statistics also have the option of earning a concentration area by completing a set of interconnected electives. While a concentration area is not required for this degree, it is an additional option that allows students to explore a group of aligned topics and/or develop a specialized and advanced skill set appropriate for a desired career path. The electives required for this degree may count towards your concentration area. To fulfill a concentration, students must take four courses from the designated set of electives. Please make sure to consult an advisor when choosing these courses.

Total number of units for the major | 191-201 units |

Total number of units for the degree | 360 units |

#### Professional Development

Students are strongly encouraged to take advantage of professional development opportunities and/or coursework. One option is 73-210 Economics Colloquium I, a fall-only course that provides information about careers in Economics, job search strategies, and research opportunities. The Department of Statistics and Data Science also offers a series of workshops pertaining to resume preparation, graduate school applications, careers in the field, among other topics. Students should also take advantage of the Career and Professional Development Center.

### Additional Major in Economics and Statistics

Students who elect Economics and Statistics as a second or third major must fulfill all Economics and Statistics degree requirements. Majors in many other programs would naturally complement an Economics and Statistics Major, including Tepper's undergraduate business program, Social and Decision Sciences, Policy and Management, and Psychology.

With respect to double-counting courses, it is departmental policy that students must have at least six courses (three Economics and three Statistics) that do not count for their primary major. If students do not have at least six, they typically take additional advanced data analysis or economics electives, depending on where the double counting issue is.

Students are advised to begin planning their curriculum (with appropriate advisors) as soon as possible. This is particularly true if the other major has a complex set of requirements and prerequisites or when many of the other major's requirements overlap with the requirements for a Major in Economics and Statistics.

Many departments require Statistics courses as part of their Major or Minor programs. Students seeking transfer credit for those requirements from substitute courses (at Carnegie Mellon or elsewhere) should seek permission from their advisor in the department setting the requirement. The final authority in such decisions rests there. The Department of Statistics and Data Science does not provide approval or permission for substitution or waiver of another department's requirements.

If a waiver or substitution is made in the home department, it is not automatically approved in the Department of Statistics and Data Science. In many of these cases, the student will need to take additional courses to satisfy the Economics and Statistics major requirements. Students should discuss this with a Statistics advisor when deciding whether to add an additional major in Economics and Statistics.

#### Sample Program

The following sample program illustrates one way to satisfy the requirements of the Economics and Statistics Major. Keep in mind that the program is flexible and can support other possible schedules (see footnotes below the schedule).

Freshman | Sophomore | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

21-120 Differential and Integral Calculus | 36-202 Statistics & Data Science Methods | 36-225 Introduction to Probability Theory | 21-240 Matrix Algebra with Applications |

36-200 Reasoning with Data | 21-256 Multivariate Analysis | 73-230 Intermediate Microeconomics | 36-226 Introduction to Statistical Inference |

73-102 Principles of Microeconomics | 73-103 Principles of Macroeconomics | 73-210 Economics Colloquium I *not required | 73-240 Intermediate Macroeconomics |

73-060 Economics: BaseCamp *not required | ----- | 73-274 Econometrics I | |

----- | ----- | 73-265 Economics and Data Science | ----- |

----- |

Junior | Senior | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

36-350 Statistical Computing | 36-402 Advanced Methods for Data Analysis | Statistics Elective | Economics Elective |

36-401 Modern Regression | 73-270 Professional Communication for Economists | Economics Elective | Statistics Elective |

73-374 Econometrics II | ----- | ----- | ----- |

----- | ----- | ----- | ----- |

----- | ----- | ----- |

*In each semester, ----- represents other courses (not related to the major) which are needed in order to complete the 360 units that the degree requires.

Prospective PhD students might add 21-127 fall of sophomore year, replace 21-240 with 21-241, add 21-260 in spring of junior year and 21-355 in fall of senior year.

## B.S. in Statistics and Machine Learning

Samantha Nielsen, *Academic Advisor*

Ryan Tibshirani and Ann Lee, *Faculty Advisors*

Location: Baker Hall 132

statadvising@stat.cmu.edu

Students in the Statistics and Machine Learning program develop and master a wide array of skills in computing, mathematics, statistical theory, and the interpretation and display of complex data. In addition, Statistics and Machine Learning majors gain experience in applying statistical tools to real problems in other fields and learn the nuances of interdisciplinary collaboration. This program is geared towards students interested in statistical computation, data science, or “Big Data” problems. The requirements for the Major in Statistics and Machine Learning are detailed below and are organized by categories.

### Curriculum

#### 1. Mathematical Foundations (Prerequisites)49–59 units

Mathematics is the language in which statistical models are described and analyzed, so some experience with basic calculus and linear algebra is an important component for anyone pursuing a program of study in Statistics and Machine Learning.

##### Calculus*:

Complete one of the following sequences of mathematics courses at Carnegie Mellon, each of which provides sufficient preparation in calculus:

###### Sequence 1

21-111 | Differential Calculus | 10 |

21-112 | Integral Calculus | 10 |

and *one* of the following:

21-256 | Multivariate Analysis | 9 |

21-259 | Calculus in Three Dimensions | 9 |

###### Sequence 2

21-120 | Differential and Integral Calculus | 10 |

and *one* of the following:

21-256 | Multivariate Analysis | 9 |

21-259 | Calculus in Three Dimensions | 9 |

**Notes:**

- Passing the Mathematical Sciences 21-120 assessment test is an acceptable alternative to completing 21-120

##### Integration and Approximation

21-122 | Integration and Approximation | 10 |

##### Linear Algebra**:

Complete *one* of the following three courses:

21-240 | Matrix Algebra with Applications | 10 |

21-241 | Matrices and Linear Transformations | 10 |

21-242 | Matrix Theory | 10 |

* It is recommended that students complete the calculus requirement during their freshman year.

**The linear algebra requirement needs to be completed before taking 36-401 Modern Regression.

21-241 and 21-242 are intended only for students with a very strong mathematical background.

##### Mathematical Theory:

21-127 | Concepts of Mathematics | 10 |

#### 2. Data Analysis45–54 units

Data analysis is the art and science of extracting insight from data. The art lies in knowing which displays or techniques will reveal the most interesting features of a complicated data set. The science lies in understanding the various techniques and the assumptions on which they rely. Both aspects require practice to master.

The Beginning Data Analysis courses give a hands-on introduction to the art and science of data analysis. The courses cover similar topics but differ slightly in the examples they emphasize. 36-200 draws examples from many fields and satisfy the Dietrich College Core Requirement in Statistical Reasoning. One of these courses is therefore recommended for students in the College. (Note: A score of 4 or 5 on the Advanced Placement (AP) Exam in Statistics may be used to waive this requirement). Other courses emphasize examples in business (36-207), engineering and architecture (36-220 ), and the laboratory sciences (36-247 ).

The Intermediate Data Analysis courses build on the principles and methods covered in the introductory course, and more fully explore specific types of data analysis methods in more depth.

The Advanced Data Analysis courses draw on students' previous experience with data analysis and understanding of statistical theory to develop advanced, more sophisticated methods. These core courses involve extensive analysis of real data with emphasis on developing the oral and writing skills needed for communicating results.

##### Sequence 1

###### Beginning*

Choose one of the following courses:

36-200 | Reasoning with Data | 9 |

36/70-207 | Probability and Statistics for Business Applications | 9 |

36-220 | Engineering Statistics and Quality Control | 9 |

36-247 | Statistics for Lab Sciences | 9 |

Note: Students who enter the program with 36-225 or 36-226 should discuss options with an advisor. Any 36-300 or 36-400 level course in Data Analysis that does not satisfy any other requirement for a Statistics Major and Minor may be counted as a Statistical Elective.

###### Intermediate*

Choose *one* of the following courses:

36-202 | Statistics & Data Science Methods ^{**} | 9 |

36/70-208 | Regression Analysis | 9 |

36-309 | Experimental Design for Behavioral & Social Sciences | 9 |

36-290 | Introduction to Statistical Research Methodology | 9 |

*Or extra data analysis course in Statistics | ||

**Must take prior to 36-401 |

###### Advanced

Choose *two* of the following courses:

36-303 | Sampling, Survey and Society | 9 |

36-311 | Statistical Analysis of Networks | 9 |

36-315 | Statistical Graphics and Visualization | 9 |

36-461 | Special Topics: Statistical Methods in Epidemiology | 9 |

36-462 | Special Topics: Data Mining | 9 |

36-463 | Special Topics: Multilevel and Hierarchical Models | 9 |

36-464 | Special Topics: Applied Multivariate Methods | 9 |

36-466 | Special Topics: Statistical Methods in Finance | 9 |

36-467 | Special Topics: Data over Space & Time | 9 |

36-468 | Special Topics: Text Analysis | 9 |

36-490 | Undergraduate Research | 9 |

36-497 | Corporate Capstone Project | 9 |

Special Topics rotate and new ones are regularly added.

__and__ take the following* two* courses:

36-401 | Modern Regression | 9 |

36-402 | Advanced Methods for Data Analysis | 9 |

__Sequence 2__

###### Advanced

Choose *three* of the following courses:

36-303 | Sampling, Survey and Society | 9 |

36-311 | Statistical Analysis of Networks | 9 |

36-315 | Statistical Graphics and Visualization | 9 |

36-461 | Special Topics: Statistical Methods in Epidemiology | 9 |

36-462 | Special Topics: Data Mining | 9 |

36-463 | Special Topics: Multilevel and Hierarchical Models | 9 |

36-464 | Special Topics: Applied Multivariate Methods | 9 |

36-466 | Special Topics: Statistical Methods in Finance | 9 |

36-467 | Special Topics: Data over Space & Time | 9 |

36-468 | Special Topics: Text Analysis | 9 |

36-490 | Undergraduate Research | 9 |

36-497 | Corporate Capstone Project | 9 |

All Special Topics are not offered every semester, and new special topics are regularly added.

__and__ take the following *two* courses:

36-401 | Modern Regression | 9 |

36-402 | Advanced Methods for Data Analysis | 9 |

#### 3. Probability Theory and Statistical Theory18 units

The theory of probability gives a mathematical description of the randomness inherent in our observations. It is the language in which statistical models are stated, so an understanding of probability is essential for the study of statistical theory. Statistical theory provides a mathematical framework for making inferences about unknown quantities from data. The theory reduces statistical problems to their essential ingredients to help devise and evaluate inferential procedures. It provides a powerful and wide-ranging set of tools for dealing with uncertainty.

To satisfy the theory requirement take the following *two* courses**:

36-225 | Introduction to Probability Theory | 9 |

36-226 | Introduction to Statistical Inference | 9 |

or 36-326 | Mathematical Statistics (Honors) |

**It is possible to substitute 36-217
, 36-218, or 21-325
for 36-225
. (36-225
is the standard introduction to probability, 36-217
is tailored for engineers and computer scientists, 36-218 is a more mathematically rigorous class for Computer Science students and more mathematically advanced Statistics students (Statistics students need advisor approval to enroll), and 21-325
is a rigorous Probability Theory course offered by the Department of Mathematics.) 36-326 Mathematical Statistics (Honors) can be substituted for 36-226 Introduction to Statistical Inference and is considered an honors course.

__Comments:__

(i) In order to meet the prerequisite requirements, a grade of at least a C is required in 36-225 , 36-226 and 36-401.

#### 4. Statistical Computing9 units

36-350 | Statistical Computing ^{*} | 9 |

*In rare circumstances, a higher level * Statistical* Computing course, approved by your Statistics advisor, may be used as a substitute.

#### 5. Machine Learning/Computer Science46-48 units

Statistical modeling in practice nearly always requires computation in one way or another. Computational algorithms are sometimes treated as “black-boxes”, whose innards the statistician need not pay attention to. But this attitude is becoming less and less prevalent, and today there is much to be gained from a strong working knowledge of computational tools. Understanding the strengths and weaknesses of various methods allows the data analyst to select the right tool for the job; understanding how they can be adapted to work in new settings greatly extends the realm of problems that he/she can solve. While all Majors in Statistics are given solid grounding in computation, extensive computational training is really what sets the Major in Statistics and Machine Learning apart.

15-112 | Fundamentals of Programming and Computer Science | 12 |

15-122 | Principles of Imperative Computation | 10 |

15-351 | Algorithms and Advanced Data Structures | 12 |

10-301 | Introduction to Machine Learning | 12 |

or 10-315 | Introduction to Machine Learning (Undergrad) |

__and__ take *one* of the following Machine Learning Advanced Electives:

10-405 | Machine Learning with Large Datasets (Undergraduate) | 12 |

10-605 | Machine Learning with Large Datasets | 12 |

10-703 | Deep Reinforcement Learning & Control | 12 |

10-707 | Topics in Deep Learning | 12 |

11-411 | Natural Language Processing | 12 |

11-441 | Machine Learning for Text Mining | 9 |

11-661 | Language and Statistics | 12 |

15-381 | Artificial Intelligence: Representation and Problem Solving | 9 |

15-386 | Neural Computation | 9 |

15-387 | Computational Perception | 9 |

16-311 | Introduction to Robotics | 12 |

16-385 | Computer Vision | 12 |

16-720 | Computer Vision | 12 |

11-761 | Language and Statistics | 12 |

*PhD level ML course as approved by Statistics advisor | ||

** Independent research with an ML faculty member |

Total number of units for the major | 176–198 units |

Total number of units for the degree | 360 units |

### Recommendations

Students in the Dietrich College of Humanities and Social Sciences who wish to major or minor in Statistics are advised to complete both the calculus requirement (one Mathematical Foundations calculus sequence) and the Beginning Data Analysis course 36-200 Reasoning with Data by the end of their Freshman year.

The linear algebra requirement is a prerequisite for the course 36-401 . It is therefore essential to complete this requirement during your junior year at the latest!

### Recommendations for Prospective PhD Students

Students interested in pursuing a PhD in Statistics or Machine Learning (or related programs) after completing their undergraduate degree are strongly recommended to take additional Mathematics courses. They should see a faculty advisor as soon as possible. Students should consider 36-326 Mathematical Statistics (Honors) as an alternative to 36-226 . Although 21-240 Matrix Algebra with Applications is recommended for Statistics majors, students interested in PhD programs should consider taking 21-241 Matrices and Linear Transformations or 21-242 Matrix Theory instead. Additional courses to consider are 21-228 Discrete Mathematics, 21-260 Differential Equations, 21-341 Linear Algebra, 21-355 Principles of Real Analysis I, and 21-356 Principles of Real Analysis II.

Additional experience in programming and computational modeling is also recommended. Students should consider taking more than one course from the list of Machine Learning electives provided under the Computing section.

### Additional Major in Statistics and Machine Learning

Students who elect Statistics and Machine Learning as a second or third major must fulfill *all* degree requirements.

With respect to double-counting courses, it is departmental policy that students must have at least six courses (three Computer Science/Machine Learning and three Statistics) that do not count for their primary major. If students do not have at least six, they typically take additional advanced data analysis or ML electives, depending on where the double counting issue is.

Students are advised to begin planning their curriculum (with appropriate advisors) as soon as possible. This is particularly true if the other major has a complex set of requirements and prerequisites or when many of the other major's requirements overlap with the requirements for a Major in Statistics and Machine Learning.

Many departments require Statistics courses as part of their Major or Minor programs. Students seeking transfer credit for those requirements from substitute courses (at Carnegie Mellon or elsewhere) should seek permission from their advisor in the department setting the requirement. The final authority in such decisions rests there. The Department of Statistics and Data Science does not provide approval or permission for substitution or waiver of another department's requirements.

If a waiver or substitution is made in the home department, it is not automatically approved in the Department of Statistics and Data Science. In many of these cases, the student will need to take additional courses to satisfy the Statistics and Machine Learning major requirements. Students should discuss this with a Statistics advisor when deciding whether to add an additional major in Statistics and Machine Learning.

### Sample Programs

The following sample program illustrates one way to satisfy the requirements of the Statistics and Machine Learning program. Keep in mind that the program is flexible and can support other possible schedules (see footnotes below the schedule). Sample program 1 is for students who have not satisfied the basic calculus requirements. Sample program 2 is for students who have satisfied the basic calculus requirements and choose option 2 for their data analysis courses (see section #2)

#### Schedule 1

Freshman | Sophomore | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

36-200 Reasoning with Data | 36-202 Statistics & Data Science Methods | 36-225 Introduction to Probability Theory | 36-226 Introduction to Statistical Inference |

21-120 Differential and Integral Calculus | 21-256 Multivariate Analysis | 21-122 Integration and Approximation | 21-241 Matrices and Linear Transformations |

15-112 Fundamentals of Programming and Computer Science | 15-112 Fundamentals of Programming and Computer Science | 21-127 Concepts of Mathematics | 15-122 Principles of Imperative Computation |

-----* | ----- | ----- | ----- |

----- | ----- | ----- | ----- |

Junior | Senior | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

36-401 Modern Regression | 36-402 Advanced Methods for Data Analysis | 10-301 Introduction to Machine Learning | ML Elective |

36-350 Statistical Computing | 15-351 Algorithms and Advanced Data Structures | Stat Elective | Stat Elective |

----- | ----- | ----- | ----- |

----- | ----- | ----- | ----- |

----- | ----- | ----- | ----- |

*In each semester, ----- represents other courses (not related to the major) which are needed in order to complete the 360 units that the degree requires.

#### Schedule 2

Freshman | Sophomore | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

21-256 Multivariate Analysis | 21-127 Concepts of Mathematics | 36-225 Introduction to Probability Theory | 36-226 Introduction to Statistical Inference |

15-112 Fundamentals of Programming and Computer Science | ----- | 15-122 Principles of Imperative Computation | 21-241 Matrices and Linear Transformations |

-----* | ----- | ----- | Stat Elective |

----- | ----- | ----- | ----- |

----- | ----- | ----- |

Junior | Senior | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

36-350 Statistical Computing | 36-402 Advanced Methods for Data Analysis | 10-301 Introduction to Machine Learning | ML Elective |

36-401 Modern Regression | 15-351 Algorithms and Advanced Data Structures | Stat Elective | Stat Elective |

----- | ----- | ----- | ----- |

----- | ----- | ----- | ----- |

----- | ----- | ----- | ----- |

*In each semester, "-----" represents other courses (not related to the major) which are needed in order to complete the 360 units that the degree requires.

## The Minor in Statistics

Glenn Clune, *Academic Advisor*

Peter Freeman,* Faculty Advisor*

Location: Baker Hall 132M

statadvising@stat.cmu.edu

The Minor in Statistics develops skills that complement major study in other disciplines. The program helps the student master the basics of statistical theory and advanced techniques in data analysis. This is a good choice for deepening understanding of statistical ideas and for strengthening research skills.

In order to get a minor in Statistics a student must satisfy all of the following requirements:

#### 1. Mathematical Foundations (Prerequisites)29–39 units

##### Calculus:*:

Complete *one* of the following two sequences of mathematics courses at Carnegie Mellon, each of which provides sufficient preparation in calculus:

###### Sequence 1

21-111 | Differential Calculus | 10 |

21-112 | Integral Calculus | 10 |

and *one* of the following:

21-256 | Multivariate Analysis | 9 |

21-259 | Calculus in Three Dimensions | 9 |

###### Sequence 2

21-120 | Differential and Integral Calculus | 10 |

and *one* of the following:

21-256 | Multivariate Analysis | 9 |

21-259 | Calculus in Three Dimensions | 9 |

Note: Other sequences are possible, and require approval from the undergraduate advisor.

Note: Passing the Mathematical Sciences 21-120 assessment test if an acceptable alternative to completing 21-120.

##### Linear Algebra:

Complete *one* of the following three courses:

21-240 | Matrix Algebra with Applications | 10 |

21-241 | Matrices and Linear Transformations | 10 |

21-242 | Matrix Theory | 10 |

*It is recommended that students complete the calculus requirement during their freshman year.

**The linear algebra requirement needs to be complete before taking 36-401 Modern Regression or 36-46X Special Topics.

21-241 and 21-242 are intended only for students with a very strong mathematical background.

#### 2. Data Analysis36 units

Data analysis is the art and science of extracting insight from data. The art lies in knowing which displays or techniques will reveal the most interesting features of a complicated data set. The science lies in understanding the various techniques and the assumptions on which they rely. Both aspects require practice to master.

The Beginning Data Analysis courses give a hands-on introduction to the art and science of data analysis. The courses cover similar topics but differ slightly in the examples they emphasize. 36-200 draws examples from many fields and satisfy the DC College Core Requirement in Statistical Reasoning. One of these courses is therefore recommended for students in the College. (Note: A score of 4 or 5 on the Advanced Placement (AP) Exam in Statistics may be used to waive this requirement). Other courses emphasize examples in business (36-207 ), engineering and architecture (36-220 ), and the laboratory sciences (36-247 ).

The Intermediate Data Analysis courses build on the principles and methods covered in the introductory course, and more fully explore specific types of data analysis methods in more depth.

The Advanced Data Analysis courses draw on students' previous experience with data analysis and understanding of statistical theory to develop advanced, more sophisticated methods. These core courses involve extensive analysis of real data with emphasis on developing the oral and writing skills needed for communicating results.

##### Sequence 1 (For students beginning their freshman or sophomore year)

###### Beginning Data Analysis*

Choose *one* of the following courses:

36-200 | Reasoning with Data | 9 |

36/70-207 | Probability and Statistics for Business Applications | 9 |

36-220 | Engineering Statistics and Quality Control | 9 |

36-247 | Statistics for Lab Sciences | 9 |

*Or extra data analysis course in Statistics

Note: Students who enter the program with 36-225 or 36-226 should discuss options with an advisor. Any 36-300 or 36-400 level course in Data Analysis that does not satisfy any other requirement for a Statistics Major and Minor may be counted as a Statistical Elective.

###### Intermediate Data Analysis*

Choose *one* of the following courses:

36-202 | Statistics & Data Science Methods ^{**} | 9 |

36/70-208 | Regression Analysis | 9 |

36-290 | Introduction to Statistical Research Methodology | 9 |

36-309 | Experimental Design for Behavioral & Social Sciences | 9 |

*Or extra data analysis course in Statistics

**Must take prior to 36-401

###### Advanced Data Analysis and Methodology

Take the following course:

36-401 | Modern Regression | 9 |

and *one* of the following courses:

36-402 | Advanced Methods for Data Analysis | 9 |

36-410 | Introduction to Probability Modeling | 9 |

36-461 | Special Topics: Statistical Methods in Epidemiology | 9 |

36-462 | Special Topics: Data Mining | 9 |

36-463 | Special Topics: Multilevel and Hierarchical Models | 9 |

36-464 | Special Topics: Applied Multivariate Methods | 9 |

36-465 | Special Topics: An Introduction to Bayesian Inference | 9 |

36-466 | Special Topics: Statistical Methods in Finance | 9 |

36-467 | Special Topics: Data over Space & Time | 9 |

36-468 | Special Topics: Text Analysis | 9 |

36-490 | Undergraduate Research | 9 |

36-497 | Corporate Capstone Project | 9 |

Special Topics rotate and new ones are regularly added.

##### Sequence 2 (For students beginning later in their college career)

###### Advanced Data Analysis and Methodology

Take the following course:

36-401 | Modern Regression | 9 |

and take *two* of the following courses (one of which must be 400-level):

36-303 | Sampling, Survey and Society | 9 |

36-311 | Statistical Analysis of Networks | 9 |

36-315 | Statistical Graphics and Visualization | 9 |

36-402 | Advanced Methods for Data Analysis | 9 |

36-410 | Introduction to Probability Modeling | 9 |

36-461 | Special Topics: Statistical Methods in Epidemiology | 9 |

36-462 | Special Topics: Data Mining | 9 |

36-463 | Special Topics: Multilevel and Hierarchical Models | 9 |

36-464 | Special Topics: Applied Multivariate Methods | 9 |

36-465 | Special Topics: An Introduction to Bayesian Inference | 9 |

36-466 | Special Topics: Statistical Methods in Finance | 9 |

36-467 | Special Topics: Data over Space & Time | 9 |

36-468 | Special Topics: Text Analysis | 9 |

36-490 | Undergraduate Research | 9 |

36-497 | Corporate Capstone Project | 9 |

Special Topics rotate and new ones are regularly added.

#### 3. Probability Theory and Statistical Theory18 units

To satisfy the theory requirement take the following *two* courses:

36-225 | Introduction to Probability Theory | 9 |

36-226 | Introduction to Statistical Inference | 9 |

or 36-326 | Mathematical Statistics (Honors) |

**It is possible to substitute 36-217 , 36-218 or 21-325 for 36-225 . (36-225 is the standard introduction to probability, 36-217 is tailored for engineers and computer scientists, 36-218 is a more mathematically rigorous class for Computer Science students and more mathematically advanced Statistics students (Statistics students need advisor approval to enroll), and 21-325 is a rigorous Probability Theory course offered by the Department of Mathematics.) 36-326 Mathematical Statistics (Honors) can be substituted for 36-226 Introduction to Statistical Inference and is considered an honors course.

__Comments:__

(i) In order to be a Major or a Minor in good standing, a grade of at least a C is required in 36-225 , 36-226 and 36-401. In particular, a grade of C or higher is required in order to be able to continue in the major.

Total number of units required for the minor | 83 Units |

### Double Counting

With respect to double-counting courses, it is departmental policy that students must have at least three statistics courses (36-xxx) that do not count for their primary major. If students do not have at least three, they need to take additional advanced electives.

### Sample Programs for the Minor

The following two sample programs illustrates two (of many) ways to satisfy the requirements of the Statistics Minor. Keep in mind that the program is flexible and can support many other possible schedules.

The first schedule uses calculus sequence 1, and 36-202to satisfy the intermediate data analysis requirement. The second schedule is an example of the case when a student enters the Minor through 36-225 and 36-226 (and therefore skips the beginning data analysis course). The schedule uses calculus sequence 2, and an advanced data analysis elective (to replace the beginning data analysis course).

#### Schedule 1

Freshman | Sophomore | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

21-111 Differential Calculus | 21-112 Integral Calculus | 36-202 Statistics & Data Science Methods | 21-240 Matrix Algebra with Applications |

36-200 Reasoning with Data | 21-256 Multivariate Analysis |

Junior | Senior | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

36-225 Introduction to Probability Theory | 36-226 Introduction to Statistical Inference | 36-401 Modern Regression | 36-402 Advanced Methods for Data Analysis |

#### Schedule 2

Freshman | Sophomore | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

21-120 Differential and Integral Calculus | 21-256 Multivariate Analysis | 36-225 Introduction to Probability Theory | 36-226 Introduction to Statistical Inference |

Junior | Senior | ||
---|---|---|---|

Fall | Spring | Fall | Spring |

21-240 Matrix Algebra with Applications | Advanced Data Analysis Elective | 36-401 Modern Regression | 36-462 Special Topics: Data Mining |

## Substitutions and Waivers

Many departments require Statistics courses as part of their Major or Minor programs. Students seeking transfer credit for those requirements from substitute courses (at Carnegie Mellon or elsewhere) should seek permission from their advisor in the department setting the requirement. The final authority in such decisions rests there. The Department of Statistics and Data Science does not provide approval or permission for substitution or waiver of another department's requirements.

However, the Statistics Director of Undergraduate Studies will provide advice and information to the student's advisor about the viability of a proposed substitution. Students should make available as much information as possible concerning proposed substitutions. Students seeking waivers may be asked to demonstrate mastery of the material.

If a waiver or substitution is made in the home department, it is not automatically approved in the Department of Statistics and Data Science. In many of these cases, the student will need to take additional courses to satisfy the Statistics major requirements. Students should discuss this with a Statistics advisor when deciding whether to add an additional major in Statistics.

Statistics Majors and Minors seeking substitutions or waivers should speak to the Academic Advisor in Statistics.

## Course Descriptions

##### About Course Numbers:

*Each Carnegie Mellon course number begins with a two-digit prefix that designates the department offering the course (i.e., 76-xxx courses are offered by the Department of English). Although each department maintains its own course numbering practices, typically, the first digit after the prefix indicates the class level: xx-1xx courses are freshmen-level, xx-2xx courses are sophomore level, etc. Depending on the department, xx-6xx courses may be either undergraduate senior-level or graduate-level, and xx-7xx courses and higher are graduate-level. Consult the Schedule of Classes each semester for course offerings and for any necessary pre-requisites or co-requisites.*

- 36-200 Reasoning with Data
- Fall and Spring: 9 units

This course is an introduction to learning how to make statistical decisions and "reason with data". The approach will emphasize thinking through an empirical problem from beginning to end and using statistical tools to look for evidence for/against an explicit argument/hypothesis. Types of data will include continuous and categorical variables, images, text, networks, and repeated measures over time. Applications will largely drawn from interdisciplinary case studies spanning the humanities, social sciences, and related fields. Methodological topics will include basic exploratory data analysis, elementary probability, hypothesis tests, and empirical research methods. There is no calculus or programming requirement. There will be one weekly computer lab for additional hands-on practice using an interactive software platform that allows student-driven inquiry. Not open to students who have received credit for 36-201, 36-207/70-207, 36-220, 36-247, 36-225, or any upper level course in Statistics This course is the credit-equivalent to 36-201 and will be honored appropriately as a pre-requisite for downstream Statistics courses. As such, this course is not currently open to students who have received credit for 36-201, 36/70-207, 36-220, 36-247, or any 300- or 400-level Statistics course.

- 36-201 Statistical Reasoning and Practice
- Intermittent: 9 units

This course will introduce students to the basic concepts, logic, and issues involved in statistical reasoning, as well as basic statistical methods used to analyze data and evaluate studies. The major topics to be covered include methods for exploratory data analysis, an introduction to research methods, elementary probability, and methods for statistical inference. The objectives of this course are to help students develop a critical approach to the evaluation of study designs, data and results, and to develop skills in the application of basic statistical methods in empirical research. An important feature of the course will be the use of the computer to facilitate the understanding of important statistical ideas and for the implementation of data analysis. In addition to three lectures a week, students will attend a computer lab once a week. Examples will be drawn from areas of applications of particular interest to H&SS students. Not open to students who have received credit for 36-207/70-207, 36-220, 36-225, 36-625, or 36-247.

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-202 Statistics & Data Science Methods
- Spring: 9 units

This course builds on the principles and methods of statistical reasoning developed in 36-200 (or its equivalents). The course covers simple and multiple regression, analysis of variance methods and logistic regression. Other topics may include non-parametric methods and probability models, as time permits. The objectives of this course is to develop the skills of applying the basic principles and methods that underlie statistical practice and empirical research. In addition to three lectures a week, students attend a computer lab once week for "hands-on" practice of the material covered in lecture. Not open to students who have received credit for: 36-208/70-208, 36-309. Students who have completed 36-401 prior to or concurrent with 36-202 will not receive credit for 36-202.

Prerequisites: 36-201 or 36-200 or 36-247 or 70-207 or 36-220 or 36-207

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-207 Probability and Statistics for Business Applications
- Spring: 9 units

This is the first half of a year long sequence in basic statistical methods that are used in business and management. Topics include exploratory and descriptive techniques, probability theory, statistical inference in simple settings, basic categorical analysis, and statistical methods for quality control. Not open to students who have received credit for 36-201, 36-220, 36-625, or 36-247. Cross-listed as 70-207.

Prerequisites: 21-121 or 21-120 or 21-112

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-208 Regression Analysis
- Spring: 9 units

This is the second half of a year long sequence in basic statistical methods that are used in business and management. Topics include time series, regression and forecasting. In addition to two lectures a week, students will attend a computer lab once a week. Not open to students who have received credit for 36-202, 36-626. Cross-listed as 70-208. Students who have completed 36-401 prior to 36-208 will not receive credit for 36-208.

Prerequisites: (21-112 or 21-120) and (36-201 or 70-207 or 36-247 or 36-207 or 36-220) and (73-102 or 73-100)

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-217 Probability Theory and Random Processes
- All Semesters: 9 units

This course provides an introduction to probability theory. It is designed for students in electrical and computer engineering. Topics include elementary probability theory, conditional probability and independence, random variables, distribution functions, joint and conditional distributions, limit theorems, and an introduction to random processes. Some elementary ideas in spectral analysis and information theory will be given. A grade of C or better is required in order to use this course as a pre-requisite for 36-226 and 36-410. Not open to students who have received credit for 36-225, or 36-625.

Prerequisites: 21-259 or 21-256 or 21-123 or 21-122 or 21-112

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-218 Probability Theory for Computer Scientists
- All Semesters: 9 units

Probability theory is the mathematical foundation for the study of both statistics and of random systems. This course is an intensive introduction to probability,from the foundations and mechanics to its application in statistical methods and modeling of random processes. Special topics and many examples are drawn from areas and problems that are of interest to computer scientists and that should prepare computer science students for the probabilistic and statistical ideas they encounter in downstream courses and research. A grade of C or better is required in order to use this course as a pre-requisite for 36-226, 36-326, and 36-410. Not open to students who have received credit for 36-225, 21-325, or 36-700. If you hold a Statistics primary/additional major or minor you will be required to complete 36-226. For those who do not have a major or minor in Statistics, and receive at least a B in 36-218, you will be eligible to move directly onto 36-401.

Prerequisites: 21-259 or 21-112 or 21-122 or 21-123 or 21-256

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-220 Engineering Statistics and Quality Control
- All Semesters: 9 units

This is a course in introductory statistics for engineers with emphasis on modern product improvement techniques. Besides exploratory data analysis, basic probability, distribution theory and statistical inference, special topics include experimental design, regression, control charts and acceptance sampling. Not open to students who have received credit for 36-201, 36-207/70-207, 36-226, 36-626, or 36-247, except when AP credit is awarded for 36-201.

Prerequisites: 21-112 or 21-120 or 21-121

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-225 Introduction to Probability Theory
- Fall: 9 units

This course is the first half of a year long course which provides an introduction to probability and mathematical statistics for students in economics, mathematics and statistics. The use of probability theory is illustrated with examples drawn from engineering, the sciences, and management. Topics include elementary probability theory, conditional probability and independence, random variables, distribution functions, joint and conditional distributions, law of large numbers, and the central limit theorem. A grade of C or better is required in order to advance to 36-226, 36-326, and 36-410. Not open to students who have received credit for 36-217, 36-218, 21-325, 36-700.

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-226 Introduction to Statistical Inference
- Spring: 9 units

This course is the second half of a year long course in probability and mathematical statistics. Topics include maximum likelihood estimation, confidence intervals, and hypothesis testing. If time permits there will also be a discussion of linear regression and the analysis of variance. A grade of C or better is required in order to advance to 36-401, 36-402 or any 36-46x course. Not open to students who have received credit for 36-626.

Prerequisites: 15-359 Min. grade C or 36-225 Min. grade C or 36-217 Min. grade C or 21-325 Min. grade C or 36-218 Min. grade C

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-247 Statistics for Lab Sciences
- Spring: 9 units

This course is a single-semester comprehensive introduction to statistical analysis of data for students in biology and chemistry. Topics include exploratory data analysis, elements of computer programming for statistics, basic concepts of probability, statistical inference, and curve fitting. In addition to two lectures, students attend a computer lab each week. Not open to students who have received credit for 36-201, 36-207/70-207, 36-220, or 36-226.

Prerequisites: 21-112 or 21-120 or 21-121

- 36-290 Introduction to Statistical Research Methodology
- Intermittent: 9 units

This course is designed to introduce statistical research methodology—the procedures by which statisticians go about approaching and analyzing data—to early undergraduates. Students will learn basic concepts of statistical learning—inference vs. prediction, supervised vs. unsupervised learning, regression vs. classification, etc.—and will reinforce this knowledge by applying, e.g., linear regression, random forest, principal components analysis, and/or hierarchical clustering and more to datasets provided by the instructor. Students will also practice disseminating the results of their analyses via oral presentations and posters. Analyses will primarily be carried out using the R programming language, but with attention paid to how one would perform similar analyses using Python. Previous knowledge of R is not required for this course. Space is very limited; there will be an application process. The course is currently open to sophomore statistics students only.

- 36-300 Statistics & Data Science Internship
- Summer: 3 units

The Department of Statistics & Data Science considers experiential learning as an integral part of our program. One such option is through an internship. If a student has an internship, they dont have to register for this class unless they want it listed on their official transcripts. This process should be used by international students interested in Curricular Practical Training (CPT) and should also be authorized by the Office of International Education (OIE). More information regarding CPT is available on OIE's website. This course will be taken as Pass/Fail, and students will be charged tuition for 3 units. There is an approval process in order to register for this course. Please contact the Department of Statistics & Data Science for more details.

- 36-303 Sampling, Survey and Society
- Spring: 9 units

This course will revolve around the role of sampling and sample surveys in the context of U.S. society and its institutions. We will examine the evolution of survey taking in the United States in the context of its economic, social and political uses. This will eventually lead to discussions about the accuracy and relevance of survey responses, especially in light of various kinds of nonsampling error. Students will be required to design, implement and analyze a survey sample.

Prerequisites: 88-250 or 36-208 or 36-218 or 36-226 or 36-202 or 36-625 or 36-225 or 73-261 or 36-309 or 70-208

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-304 Biostatistics
- Fall: 9 units

TBD

- 36-309 Experimental Design for Behavioral & Social Sciences
- Fall: 9 units

Statistical aspects of the design and analysis of planned experiments are studied in this course. A clear statement of the experimental factors will be emphasized. The design aspect will concentrate on choice of models, sample size and order of experimentation. The analysis phase will cover data collection and computation, especially analysis of variance and will stress the interpretation of results. In addition to a weekly lecture, students will attend a computer lab once a week.

Prerequisites: 36-207 or 36-220 or 36-217 or 36-200 or 36-201 or 36-247

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-311 Statistical Analysis of Networks
- Intermittent: 9 units

Networks are omnipresent. In this course, students will get an introduction to network science, mainly focusing on social network analysis. The course will start with some empirical background, and an overview of concepts used when measuring and describing networks. We will also discuss network visualization. Most traditional models cannot be applied straightforwardly to social network data, because of their complex dependence structure. We will discuss random graph models and statistical network models, that have been developed for the study of network structure and growth. We will also cover models of how networks impact individual behavior.

Prerequisite: 36-226

- 36-314 Biostatistics
- Fall: 9 units

This course is an introduction to methods used frequently in biostatistics and public health applications.

Prerequisites: 36-226 or 88-250 or 36-225 or 70-208 or 36-303 or 36-309 or 36-202 or 36-208 or 36-625

- 36-315 Statistical Graphics and Visualization
- Spring: 9 units

Graphical displays of quantitative information take on many forms as they help us understand both data and models. This course will serve to introduce the student to the most common forms of graphical displays and their uses and misuses. Students will learn both how to create these displays and how to understand them. As time permits the course will consider some more advanced graphical methods such as computer-generated animations. Each student will be required to engage in a project using graphical methods to understand data collected from a real scientific or engineering experiment. In addition to two weekly lectures there will be lab sessions where the students learn to use software to aid in the production of appropriate graphical displays.

Prerequisites: 21-325 or 36-625 or 36-225 or 36-309 or 36-303 or 70-208 or 36-218 or 36-217 or 88-250 or 36-226 or 36-208 or 36-202

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-326 Mathematical Statistics (Honors)
- Spring: 9 units

This course is a rigorous introduction to the mathematical theory of statistics. A good working knowledge of calculus and probability theory is required. Topics include maximum likelihood estimation, confidence intervals, hypothesis testing, Bayesian methods, and regression. A grade of C or better is required in order to advance to 36-401, 36-402 or any 36-46x course. Not open to students who have received credit for 36-625. Prerequisites: 15-359 or 21-325 or 36-217 or 36-225 with a grade of A AND advisor approval. Students interested in the course should add themselves to the waitlist pending review.

Prerequisites: 36-225 Min. grade A or 15-359 Min. grade A or 36-218 Min. grade A or 21-325 Min. grade A or 36-217 Min. grade A

- 36-350 Statistical Computing
- Fall and Spring: 9 units

Statistical Computing: An introduction to computing targeted at statistics majors with minimal programming knowledge. The main topics are core ideas of programming (functions, objects, data structures, flow control, input and output, debugging, logical design and abstraction), illustrated through key statistical topics (exploratory data analysis, basic optimization, linear models, graphics, and simulation). The class will be taught in the R language. No previous programming experience required. 36-225 is a pre-req.

Prerequisites: 21-325 Min. grade C or 36-217 Min. grade C or 36-225 Min. grade C or 15-259 Min. grade C or 36-218 Min. grade C

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-375 Data Ethics & Responsible Conduct of Research
- Intermittent: 3 units

TBD

- 36-401 Modern Regression
- Fall: 9 units

This course is an introduction to the real world of statistics and data analysis. We will explore real data sets, examine various models for the data, assess the validity of their assumptions, and determine which conclusions we can make (if any). Data analysis is a bit of an art; there may be several valid approaches. We will strongly emphasize the importance of critical thinking about the data and the question of interest. Our overall goal is to use a basic set of modeling tools to explore and analyze data and to present the results in a scientific report. A grade of C is required to move on to 36-402 or any 36-46x course.

Prerequisites: (36-226 Min. grade C or 36-218 Min. grade B or 36-625 Min. grade C or 36-326 Min. grade C) and (21-240 or 21-241)

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-402 Advanced Methods for Data Analysis
- Spring: 9 units

This course introduces modern methods of data analysis, building on the theory and application of linear models from 36-401. Topics include nonlinear regression, nonparametric smoothing, density estimation, generalized linear and generalized additive models, simulation and predictive model-checking, cross-validation, bootstrap uncertainty estimation, multivariate methods including factor analysis and mixture models, and graphical models and causal inference. Students will analyze real-world data from a range of fields, coding small programs and writing reports. Prerequisites: 36-401

Prerequisite: 36-401 Min. grade C

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-410 Introduction to Probability Modeling
- Spring: 9 units

An introductory-level course in stochastic processes. Topics typically include Poisson processes, Markov chains, birth and death processes, random walks, recurrent events, and renewal theory. Examples are drawn from reliability theory, queuing theory, inventory theory, and various applications in the social and physical sciences.

Prerequisites: 36-225 or 36-217 or 21-325 or 36-625

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-428 Time Series
- Spring: 6 units

The course is designed for graduate students and advanced undergraduate students. It will introduce the analysis and some of the theory of sequences of serially-dependent random variables (known as time series). Students should already have learned mathematical probability and statistics, including multivariate and conditional distributions, linear regression, calculus, matrix algebra, and the fundamentals of complex variables and functions. The focus will be on popular models for time series and the analysis of data that arise in applications.

Prerequisite: 36-401 Min. grade C

- 36-431 Foundations of Causal Inference
- Intermittent: 6 units

This course will provide an introduction to the fundamentals of causal inference. Causal inference is concerned with whether and how one can go beyond statistical associations to draw causal conclusions from observational data. Topics will include: counterfactuals (potential outcomes and graphs), identification and estimation of average treatment effects in experiments and observational studies, nonparametric bounds, sensitivity analysis, instrumental variables, effect modification, and longitudinal studies. Special permission is required for undergraduate students.

- 36-432 Modern Causal Inference
- Intermittent: 6 units

This course will provide an in-depth look at modern causal inference. Topics will include: optimal treatment regimes, mediation, principal stratification, stochastic interventions, accounting for complex confounding and exposures, and methods for efficient nonparametric estimation. Some background in mathematical statistics is advised. Special permission is required for undergraduate students.

- 36-459 Statistical Models of the Brain
- Spring: 12 units

This new course is intended for CNBC students, as an additional option for fulfilling the computational core course requirement, but it will also be open to Statistics and Machine Learning students. It should be of interest to anyone wishing to see the way statistical ideas play out within the brain sciences, and it will provide a series of case studies on the role of stochastic models in scientific investigation. Statistical ideas have been part of neurophysiology and the brainsciences since the first stochastic description of spike trains, and the quantal hypothesis of neurotransmitter release, more than 50 years ago. Many contemporary theories of neural system behavior are built with statistical models. For example, integrate-and-fire neurons are usually assumed to be driven in part by stochastic noise; the role of spike timing involves the distinction between Poisson and non-Poisson neurons; and oscillations are characterized by decomposing variation into frequency-based components. In the visual system, V1 simple cells are often described using linear-nonlinear Poisson models; in the motor system, neural response may involve direction tuning; and CA1 hippocampal receptive field plasticity has been characterized using dynamic place models. It has also been proposed that perceptions, decisions, and actions result from optimal (Bayesian) combination of sensory input with previously-learned regularities; and some investigators report new insights from viewing whole-brain pattern responses as analogous to statistical classifiers. Throughout the field of statistics, models incorporating random ``noise'' components are used as an effective vehicle for data analysis. In neuroscience, however, the models also help form a conceptual framework for understanding neural function. This course will examine some of the most important methods and claims that have come from applying statistical thinking

Prerequisite: 36-401 Min. grade C

- 36-461 Special Topics: Statistical Methods in Epidemiology
- Intermittent: 9 units

Epidemiology is concerned with understanding factors that cause, prevent, and reduce diseases by studying associations between disease outcomes and their suspected determinants in human populations. Epidemiologic research requires an understanding of statistical methods and design. Epidemiologic data is typically discrete, i.e., data that arise whenever counts are made instead of measurements. In this course, methods for the analysis of categorical data are discussed with the purpose of learning how to apply them to data. The central statistical themes are building models, assessing fit and interpreting results. There is a special emphasis on generating and evaluating evidence from observational studies. Case studies and examples will be primarily from the public health sciences.

Prerequisite: 36-401 Min. grade C

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-462 Special Topics: Data Mining
- Intermittent: 9 units

Data mining is the science of discovering patterns and learning structure in large data sets. Covered topics include information retrieval, clustering, dimension reduction, regression, classification, and decision trees. Prerequisites: 36-401 (C or better).

Prerequisite: 36-401 Min. grade C

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-463 Special Topics: Multilevel and Hierarchical Models
- Intermittent: 9 units

Multilevel and hierarchical models are among the most broadly applied "sophisticated" statistical models, especially in the social and biological sciences. They apply to situations in which the data "cluster" naturally into groups of units that are more related to each other than they are the rest of the data. In the first part of the course we will review linear and generalized linear models. In the second part we will see how to generalize these to multilevel and hierarchical models and relate them to other areas of statistics, and in the third part of the course we will learn how Bayesian statistical methods can help us to build, estimate and diagnose problems with these models using a variety of data sets and examples.

Prerequisite: 36-401 Min. grade C

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-464 Special Topics: Applied Multivariate Methods
- Intermittent: 9 units

This course is an introduction to applied multivariate methods. Topics include a discussion of the multivariate normal distribution, the multivariate linear model, repeated measures designs and analysis, principle component and factor analysis. Emphasis is on the application and interpretation of these methods in practice. Students will use at least one statistical package. Prerequisites: 36-401 (C or better).

Prerequisite: 36-401 Min. grade C

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-465 Special Topics: An Introduction to Bayesian Inference
- Intermittent: 9 units

The aim of this course is to introduce students to theory and application of Bayesian statistical modeling and inference. The course starts with epistemological differences between the Bayesian and Frequentist paradigms and the treatment of simple models, such as those based on well-known distributions. Concepts of conjugate and non informative priors are illustrated, for single- and multi-parameters models. Basic treatment of hierarchical models and linear regression models are also covered. Bayesian computational methods such as the Gibbs sampler and Metropolis-Hastings algorithms, are briefly presented with an emphasis on their implementation and use on simple cases.

Prerequisite: 36-401 Min. grade C

- 36-466 Special Topics: Statistical Methods in Finance
- Intermittent: 9 units

Financial econometrics is the interdisciplinary area where we use statistical methods and economic theory to address a wide variety of quantitative problems in finance. These include building financial models, testing financial economics theory, simulating financial systems, volatility estimation, risk management, capital asset pricing, derivative pricing, portfolio allocation, proprietary trading, portfolio and derivative hedging, and so on and so forth. Financial econometrics is an active field of integration of finance, economics, probability, statistics, and applied mathematics. Financial activities generate many new problems and products, economics provides useful theoretical foundation and guidance, and quantitative methods such as statistics, probability and applied mathematics are essential tools to solve quantitative problems in finance. Professionals in finance now routinely use sophisticated statistical techniques and modern computation power in portfolio management, proprietary trading, derivative pricing, financial consulting, securities regulation, and risk management.

- 36-467 Special Topics: Data over Space & Time
- Intermittent: 9 units

This course is an introduction to the opportunities and challenges of analyzing data from processes unfolding over space and time. It will cover basic descriptive statistics for spatial and temporal patterns; linear methods for interpolating, extrapolating, and smoothing spatio-temporal data; basic nonlinear modeling; and statistical inference with dependent observations. Class work will combine practical exercises in R, a little mathematics on the underlying theory, and case studies analyzing real problems from various fields (economics, history, meteorology, ecology, etc.). Depending on available time and class interest, additional topics may include: statistics of Markov and hidden-Markov (state-space) models; statistics of point processes; simulation and simulation-based inference; agent-based modeling; dynamical systems theory. Co-requisite: For undergraduates taking the course as 36-467, 36-401. For graduate students taking the course as 36-667, consent of the professor.

- 36-468 Special Topics: Text Analysis
- Intermittent: 9 units

The analysis of language is concerned with how variables relate to people (their gender, age, and location, for example), how variables relate to use (such as writing in different academic disciplines), and how variables change over time. While we are surrounded by data that might potentially shed light on many of these questions, working with real-world linguistic data can present some unique challenges in sampling, in the distribution of features, and in their high dimensionality. In this course, we work through some of these issues, paying particular attention to the aligning of the statistical questions we want to investigate with the choice of statistical models, as well as focusing on the interpretation of results. Analysis will be carried out in R and students will develop a suite of tools as they work through their course projects.

- 36-490 Undergraduate Research
- Intermittent: 9 units

This course is designed to give undergraduate students experience using statistics in real research problems. Small groups of students will be matched with clients and do supervised research for a semester. Students will gain skills in approaching a research problem, critical thinking, statistical analysis, scientific writing, and conveying and defending their results to an audience.

Prerequisite: 36-401

Course Website: http://www.stat.cmu.edu/academics/courselist

- 36-492 Topic Detection and Document Clustering
- Intermittent: 6 units

Imagine if someone read all your email. Everything you sent, everything you received. What would they find? Do you have repeating topics? How do the topics change over time? The Enron Corporation was an energy, commodities, and services company in Houston, Texas that went spectacularly bankrupt in 2001 after it was revealed that it was engaging in systematic, planned accounting fraud. At its peak, it employed over 20,000 people with revenues over $100 billion. Its downfall was related to deregulation of California's energy commodity trading and a series of rolling power blackouts over months. For example, Enron traders encouraged the removal of power during the energy crisis by suggesting plant shutdowns. The resulting increase in the price for power made them a fortune. After Enron's collapse, journalists used the Freedom of Information Act to release the emails sent/received by the employees of Enron. Subsequently, the emails were analyzed to see who knew what and when. Every news article, email, letter, blog, tweet, etc can be thought of as an observation. We characterize these documents by their length, what words they use and how often, and possibly extra information like the time, the recipient, etc. Topic detection and document clustering methods are statistical and machine learning tools that extract and identify related documents, possibly over time. These methods need to be flexible enough to handle both very small and very large clusters of documents, topics that change in importance, and topics that appear and disappear. This class will emphasize application of methods and real-world data analysis. Class time will be split into lecture and "lab". (Bring your laptop.) Occasional homeworks and final project, but mostly we'll focus on the downfall of Enron as our overarching case study.

Prerequisite: 36-401

- 36-494 Astrostatistics
- Intermittent: 6 units

Since a young age, many of us have pondered the vastness and beauty of the Universe as we gazed up at the night sky. Planets, moons, stars, galaxies, and beyond have fascinated humanity for centuries. It turns out it also provides a plethora of interesting and complex statistical problems. In this course, problems in astronomy, cosmology, and astrophysics are going provide motivation for learning about some advanced statistical methodology. Possible topics include computational statistics, topological data analysis, nonparametric regression, spatial statistics, and statistical learning. While exploring newer statistical methodology, we will get to sample a variety of problems that appeal to astrostatisticians Statistical problems related to exoplanets (planets orbiting stars outside our Solar System), the large-scale structure of the Universe (the "Cosmic Web''), dark matter (over 80% of the matter in the Universe is thought to be invisible), Type Ia supernova (a dying star eats its companion star until explodes), cosmic microwave background (a.k.a. "baby pictures of the Universe'') are some possibilities. This course will be suitable for advanced undergraduate statistics majors through Ph.D. level statistics students, and astronomy Ph.D. students with some background in statistics.

Prerequisite: 36-401 Min. grade C

- 36-497 Corporate Capstone Project
- Fall and Spring: 9 units

This course is designed to give undergraduate students experience applying statistics & data science methodology to real industry projects. Small groups of students will be matched with industry clients and do supervised projects for a semester. Students will gain skills in approaching a real world problem, critical thinking, advanced statistical analysis, scientific writing, collaborating in an industry setting, communicating results, and meeting expectations with respect to deliverables and timelines. The industry clients will change and rotate each semester; available projects will be advertised prior to registration. The course size is limited, and students will submit an application including their project preferences. Students with skill sets matching project needs will be given priority. We will also take into consideration whether or not a student has had a recent prior corporate capstone experience with the goal of providing experiences to a broad group of qualified students.

- 36-601 Perspectives in Data Science I
- Fall: 6 units

This course covers the principles and practice of Data Science including data input and cleaning, exploratory data analysis, intermediate R programming, beginning SAS programming, beginning to intermediate python programming, and SQL. For Master's in Statistical Practice students only.

- 36-602 Perspectives in Data Science II
- Spring: 9 units

This course is a continuation of 36-601 and covers interactive data visualization with Shiny, advanced R programming techniques, intermediate SAS (macros), web scraping, Hadoop, and Spark. For Master's in Statistical Practice students only.

Prerequisite: 36-601 Min. grade C

- 36-611 Professional Skills for Statisticians I
- Fall: 6 units

This course covers a variety of professional skills including resumes and cover letters, writing reports, oral presentations, teamwork, and project planning. Consulting skills are developed in the form of a whole-class consulting project. For Master's in Statistical Practice students only.

- 36-612 Professional Skills for Statisticians II
- Spring: 6 units

This course is a continuation of 36-611 and covers additional writing and presentation skills, as well as interview skills. For Master's in Statistical Practice students only.

Prerequisite: 36-611 Min. grade C

- 36-617 Applied Linear Models
- Fall: 12 units

This course covers the theory and practice of linear models in matrix form with emphasis on practical skills for working with real data and communicating results to technical and non-technical audiences. For Master's in Statistical Practice students only.

- 36-618 Experimental Design & Time Series
- Spring: 12 units

This course covers fundamentals of experimental design including various ANOVA models, Latin squares and factorial and fractional factorial designs. The time series components covers exponential smoothing models and ARIMA, including seasonal models and transfer function models. Special topics are intermittent. For Master's in Statistical Practice students only.

Prerequisites: 36-601 Min. grade C and 36-617 Min. grade C

- 36-625 Probability and Mathematical Statistics I
- Fall: 12 units

This course is a rigorous introduction to the mathematical theory of probability, and it provides the necessary background for the study of mathematical statistics and probability modeling. A good working knowledge of calculus is required. Topics include combinatorial analysis, conditional probability, generating functions, sampling distributions, law of large numbers, and the central limit theorem. Undergraduate students studying Computer Science, or considering graduate work in Statistics or Operations Research, must receive permission from their advisor and from the instructor. Prerequisite: 21-122 and 21-241 and (21-256 or 21-259).

Prerequisites: 21-123 or 21-256 or 21-118 or 21-122

- 36-626 Probability and Mathematical Statistics II
- Intermittent: 12 units

An introduction to the mathematical theory of statistical inference. Topics include likelihood functions, estimation, confidence intervals, hypothesis testing, Bayesian inference, regression, and the analysis of variance. Not open to students who have received credit for 36-226. Students studying Computer Science should carefully consider taking this course instead of 36-220 or 36-226 after consultation with their advisor. Prerequisite: 36-625.

Prerequisite: 36-625

- 36-635 Applied Survival Analysis
- Intermittent: 6 units

TBD

- 36-636 Methods for Clinical Trials
- Intermittent: 6 units

TBD

- 36-650 Statistical Computing
- Spring: 9 units

A detailed introduction to elements of computing relating to statistical modeling, targeted to advanced undergraduates, masters students, and doctoral students in Statistics. Topics include important data structures and algorithms; numerical methods; databases; parallelism and concurrency; and coding practices, program design, and testing. Multiple programming languages will be supported (e.g., C, R, Python, etc.). Those with no previous programming experience are welcome but will be required to learn the basics of at least one language via self-study. There are very limited spots for undergraduates; special permission from both advisor and instructor required.

- 36-651 Advanced Statistical Computing
- Intermittent: 6 units

A project-based course in statistical computing. Students will choose individual projects on computing topics related to statistical modeling and practice, including databases, parallel and cluster programming, big data frameworks (e.g. Spark or Hadoop), algorithms and data structures, numerical methods, and other topics based on student interest. The course will include introductions to each topic as well as student presentations on the results of their projects. Multiple programming languages will be supported. Recommended prerequisite: 36-650 or 36-750

Prerequisite: 36-650 Min. grade B

- 36-661 Special Topics: Statistical Methods in Epidemiology
- Intermittent: 9 units

Epidemiology is concerned with understanding factors that cause, prevent, and reduce diseases by studying associations between disease outcomes and their suspected determinants in human populations. Epidemiologic research requires an understanding of statistical methods and design. Epidemiologic data is typically discrete, i.e., data that arise whenever counts are made instead of measurements. In this course, methods for the analysis of categorical data are discussed with the purpose of learning how to apply them to data. The central statistical themes are building models, assessing fit and interpreting results. There is a special emphasis on generating and evaluating evidence from observational studies. Case studies and examples will be primarily from the public health sciences.

- 36-663 Multilevel and Hierarchical Models
- Intermittent: 9 units

Multilevel and hierarchical models are among the most broadly applied "sophisticated" statistical models, especially in the social and biological sciences. They apply to situations in which the data "cluster" naturally into groups of units that are more related to each other than they are the rest of the data. In the first part of the course we will see how to generalize linear models to multilevel and hierarchical models and relate them to other areas of statistics, and in the last part of the course we will learn how Bayesian statistical methods can help us to build, estimate and diagnose problems with these models using a variety of data sets and examples.

- 36-665 Special Topics: Bayesian Methods
- Intermittent: 9 units

TBD

- 36-666 Special Topics: Statistical Methods in Finance
- Intermittent: 9 units

Financial econometrics is the interdisciplinary area where we use statistical methods and economic theory to address a wide variety of quantitative problems in finance. These include building financial models, testing financial economics theory, simulating financial systems, volatility estimation, risk management, capital asset pricing, derivative pricing, portfolio allocation, proprietary trading, portfolio and derivative hedging, and so on and so forth. Financial econometrics is an active field of integration of finance, economics, probability, statistics, and applied mathematics. Financial activities generate many new problems and products, economics provides useful theoretical foundation and guidance, and quantitative methods such as statistics, probability and applied mathematics are essential tools to solve quantitative problems in finance. Professionals in finance now routinely use sophisticated statistical techniques and modern computation power in portfolio management, proprietary trading, derivative pricing, financial consulting, securities regulation, and risk management.

- 36-667 Special Topics: Data over Space & Time
- Intermittent: 9 units

This course is an introduction to the opportunities and challenges of analyzing data from processes unfolding over space and time. It will cover basic descriptive statistics for spatial and temporal patterns; linear methods for interpolating, extrapolating, and smoothing spatio-temporal data; basic nonlinear modeling; and statistical inference with dependent observations. Class work will combine practical exercises in R, a little mathematics on the underlying theory, and case studies analyzing real problems from various fields (economics, history, meteorology, ecology, etc.). Depending on available time and class interest, additional topics may include: statistics of Markov and hidden-Markov (state-space) models; statistics of point processes; simulation and simulation-based inference; agent-based modeling; dynamical systems theory.

- 36-668 Special Topics: Text Analysis
- Intermittent: 9 units

TBD

- 36-675 Data Ethics & Responsible Conduct of Research
- Intermittent: 3 units

TBD

- 36-692 Topic Detection and Document Clustering
- Intermittent: 6 units

Imagine if someone read all your email. Everything you sent, everything you received. What would they find? Do you have repeating topics? How do the topics change over time? The Enron Corporation was an energy, commodities, and services company in Houston, Texas that went spectacularly bankrupt in 2001 after it was revealed that it was engaging in systematic, planned accounting fraud. At its peak, it employed over 20,000 people with revenues over $100 billion. Its downfall was related to deregulation of California's energy commodity trading and a series of rolling power blackouts over months. For example, Enron traders encouraged the removal of power during the energy crisis by suggesting plant shutdowns. The resulting increase in the price for power made them a fortune. After Enron's collapse, journalists used the Freedom of Information Act to release the emails sent/received by the employees of Enron. Subsequently, the emails were analyzed to see who knew what and when. Every news article, email, letter, blog, tweet, etc can be thought of as an observation. We characterize these documents by their length, what words they use and how often, and possibly extra information like the time, the recipient, etc. Topic detection and document clustering methods are statistical and machine learning tools that extract and identify related documents, possibly over time. These methods need to be flexible enough to handle both very small and very large clusters of documents, topics that change in importance, and topics that appear and disappear. This class will emphasize application of methods and real-world data analysis. Class time will be split into lecture and "lab". (Bring your laptop.) Occasional homeworks and final project, but mostly we'll focus on the downfall of Enron as our overarching case study.

- 36-699 Statistical Immigration
- Fall: 3 units

Students are introduced to the faculty and their interests, the field of statistics, and the facilities at Carnegie Mellon. Each faculty member gives at least one elementary lecture on some topic of his or her choice. In the past, topics have included: the field of statistics and its history, large-scale sample surveys, survival analysis, subjective probability, time series, robustness, multivariate analysis, psychiatric statistics, experimental design, consulting, decision-making, probability models, statistics and the law, and comparative inference. Students are also given information about the libraries at Carnegie Mellon and current bibliographic tools. In addition, students are instructed in the use of the Departmental and University computational facilities and available statistical program packages. THIS COURSE IS FOR PHD STUDENTS IN THE DEPT OF STATISTICS ONLY.

- 36-700 Probability and Mathematical Statistics
- Fall: 12 units

This is a one-semester course covering the basics of statistics. We will first provide a quick introduction to probability theory, and then cover fundamental topics in mathematical statistics such as point estimation, hypothesis testing, asymptotic theory, and Bayesian inference. If time permits, we will also cover more advanced and useful topics including nonparametric inference, regression and classification. Prerequisites: one- and two-variable calculus and matrix algebra.

- 36-705 Intermediate Statistics
- Fall: 12 units

This course covers the fundamentals of theoretical statistics. Topics include: probability inequalities, point and interval estimation, minimax theory, hypothesis testing, data reduction, convergence concepts, Bayesian inference, nonparametric statistics, bootstrap resampling, VC dimension, prediction and model selection.

- 36-707 Regression Analysis
- All Semesters: 12 units

This is a course in data analysis. Topics covered include: Simple and multiple linear regression, causation, weighted least-squares, global and case diagnostics, robust regression, exponential families, logistic regression and generalized linear models; Model selection: prediction risk, bias-variance tradeoff, risk estimation, model search, ridge regression and lasso, stepwise regression, maybe boosting; Smoothing and nonparametric regression: linear smoothers, kernels, local regression, penalized regression, regularization and splines, wavelets, variance estimation, confidence bands, local likelihood, additive models; Classification: parametric and nonparametric regression, LDA, QDA, trees. Practice in data analysis is obtained through course projects. This course is primarily for first year PhD students in Statistics & Data Science; it requires an appropriate background for entering that program.

- 36-708 Statistical Methods in Machine Learning
- All Semesters: 12 units

TBD

Prerequisite: 36-705 Min. grade A

- 36-709 Advanced Statistical Theory I
- All Semesters: 12 units

This is a core Ph.D. course in theoretical statistics. The class will cover a selection of modern topics in mathematical statistics, focussing on high-dimensional parametric models and non-parametric models. The main goal of the course is to provide the students with adequate theoretical background and mathematical tools to read and understand the current statistical literature on high-dimensional models. Topics will include: concentration inequalities, covariance estimation, principal component analysis, penalized linear regression, maximal inequalities for empirical processes, Rademacher and Gaussian complexities, non-parametric regression and minimax theory. This will be the first part of a two semester sequence.

Prerequisite: 36-705 Min. grade A

- 36-710 Advanced Statistical Theory
- All Semesters: 12 units

This is a core Ph.D. course in theoretical statistics. The class will cover a selection of modern topics in mathematical statistics, focussing on high-dimensional parametric models and non-parametric models. The main goal of the course is to provide the students with adequate theoretical background and mathematical tools to read and understand the current statistical literature on high-dimensional models. Topics will include: concentration inequalities, covariance estimation, principal component analysis, penalized linear regression, maximal inequalities for empirical processes, Rademacher and Gaussian complexities, non-parametric regression and minimax theory.

- 36-721 Statistical Graphics and Visualization
- Intermittent: 6 units

An effective statistical graphic is a powerful tool for analyzing data and communicating insights. This course will introduce students to creating, understanding, and critiquing such graphical displays, choosing the right visual tool for the task at hand. Students will learn how to produce legible, self-contained, informative graphics using statistical software, as well as how to plan effective statistical graphics by following the principles of human visual perception. Beyond the most commonly used graphs for univariate and bivariate data, we will cover useful visualizations for statistical model diagnostics; cartographic maps; network- and tree-structured data; and interactive exploration of high-dimensional datasets. Through project assignments, students will practice applying the principles of graphic design and interaction design. Course materials will primarily use R (including ggplot2 and Shiny), but we will also introduce Illustrator/Inkscape and Tableau, and students may complete assignments using other software if they wish (Python, MATLAB, etc.).

- 36-725 Convex Optimization
- Intermittent: 12 units

Nearly every problem in machine learning can be formulated as the optimization of some function, possibly under some set of constraints. This universal reduction may seem to suggest that such optimization tasks are intractable. Fortunately, many real world problems have special structure, such as convexity, smoothness, separability, etc., which allow us to formulate optimization problems that can often be solved efficiently. This course is designed to give a graduate-level student a thorough grounding in the formulation of optimization problems that exploit such structure, and in efficient solution methods for these problems. The main focus is on the formulation and solution of convex optimization problems. These general concepts will also be illustrated through applications in machine learning and statistics. Students entering the class should have a pre-existing working knowledge of algorithms, though the class has been designed to allow students with a strong numerate background to catch up and fully participate. Though not required, having taken 10-701 or an equivalent machine learning or statistics class is strongly encouraged, since we will use applications in machine learning and statistics to demonstrate the concepts we cover in class. Students will work on an extensive optimization-based project throughout the semester; those wanting to take the class without the project can register under the 9 unit option.

Course Website: http://www.stat.cmu.edu/~ryantibs/convexopt/

- 36-726 Statistical Practice
- Spring: 12 units

Students are taught how to structure a consulting session, elicit and diagnose a problem, manage a project, and report an analysis. The class will participate in meetings with industrial and academic clients. For Master's in Statistical Practice students only.

- 36-727 Modern Experimental Design
- Intermittent: 6 units

Designed experiments are crucial to draw causal conclusions with minimum expense and maximum precision. This course introduces the basic principles and theory of experimental design, including randomized designs, blocking, analysis of covariance, factorial designs, and power analysis, along with a selection of more advanced topics, which may include sequential and adaptive designs, A/B testing, the design of observational studies, or other topics depending on time and class interest. Students will learn to design appropriate experiments for a variety of research scenarios, and practice these skills through a course project. Coursework will primarily use R for analysis of experimental data. Students will be expected to have taken a graduate course in regression or being taking a graduate course in regression concurrently.

- 36-730 Graphical Models and its Applications
- Intermittent: 6 units

Probabilistic graphical models (PGMs) lie at the intersection of probability and graph theory. Its application to real world problems has served useful in the process of understanding, formulating and solving problems, and in particular as tools for making decisions and calculating the probability of a particular based on (often incomplete) collections of prior knowledge. This course will introduce the fundamentals of graphical models and probability propagation algorithms; demonstrate how to build and model (PGMs) using R, focusing on DAGs. The aim will be to learn and demonstrate the versatility of PGMs, through applications and methodology, including its use in decision support, causal and temporal problems. Applications will focus on areas of public policy including criminal justice/ forensic science, health/medical, environment, etc.

- 36-731 Foundations of Causal Inference
- Intermittent: 6 units

This course will provide an introduction to the fundamentals of causal inference. Causal inference is concerned with whether and how one can go beyond statistical associations to draw causal conclusions from observational data. Topics will include: counterfactuals (potential outcomes and graphs), identification and estimation of average treatment effects in experiments and observational studies, nonparametric bounds, sensitivity analysis, instrumental variables, effect modification, and longitudinal studies.

- 36-732 Modern Causal Inference
- Intermittent: 6 units

This course will provide an in-depth look at modern causal inference. Topics will include: optimal treatment regimes, mediation, principal stratification, stochastic interventions, accounting for complex confounding and exposures, and methods for efficient nonparametric estimation. Some background in mathematical statistics is advised.

- 36-733 Probability Models and Stochastic Processes
- Intermittent: 6 units

By the end of this course you will be able to handle basic discrete and continuous time stochastic processes, including random walks, branching processes, Markov chains, Markov chain Monte Carlo (MCMC), Poisson processes, birth and death processes, renewal processes, and queuing processes. This class is not overly mathematical, but techniques such as generating functions, difference and differential equations, linear systems of equations, are needed at a basic level. Students will be expected to have taken a graduate course in regression or being taking a graduate course in regression concurrently. Knowledge of R or similar statistical packages is needed.

- 36-736 Methods for Clinical Trials
- Intermittent: 6 units

TBD

- 36-741 Statistics meets Optimization: Randomized Sketching Methods
- All Semesters: 6 units

In this mini, we will discuss some aspects of the interface between statistics and optimization. The goal of these lectures is to touch on various evolving areas at this interface. The objectives of optimization can be influenced by underlying statistical objectives in many ways, for example, the statistics precision caused by not having enough sample size is often of higher order than the machine precision; worst-case instance can be too conservative compared to the random ensembles; polynomial-time complexity may still be too large to be tractable. To further discuss these issues, we will start with a dimension reduction technique based on random projections and analyze how this technique helps us achieve faster optimization convergence without hurting statistical precision.

- 36-742 Statistics meets Optimization: Approximate Message Passing Algorithm
- All Semesters: 6 units

In this mini, we focus our attention on the recent development of the approximate message passing algorithm. We follow a rigorous approach that builds upon ideas from statistical physics, information theory and graphical models, and is based on the analysis of an highly efficient re- construction algorithm. We start with some basics for the probability graphical model, introduce the message passing algorithm and motivate the AMP algorithm along the way. Then we will discuss the exact asymptotic characterization in terms of the so-called state evolution and talk about the applications in LASSO and more generally, high-dimensional robust M-estimation.

- 36-743 Statistical Methods for Reproduciblity and Replicability: Static Settings
- Intermittent: 6 units

See http://www.stat.cmu.edu/~aramdas/reproducibility19/

- 36-744 Statistical Methods for Reproducibility and Replicability: Dynamic Settings
- All Semesters: 6 units

See http://www.stat.cmu.edu/~aramdas/reproducibility19/

- 36-746 Statistical Methods for Neuroscience and Psychology
- Intermittent: 12 units

This course provides a survey of basic statistical methods, emphasizing motivation from underlying principles and interpretation in the context of neuroscience and psychology. Though 36-746 assumes only passing familiarity with school-level statistics, it moves faster than typical university-level first courses. Vectors and matrices will be used frequently, as will basic calculus. Topics include Probability, Random Variables, and Important Distributions (binomial, Poisson, and normal distributions; the Law of Large Numbers and the Central Limit Theorem); Estimation and Uncertainty (standard errors and confidence intervals; the bootstrap); Principles of Estimation (mean squared error; maximum likelihood); Models, Hypotheses, and Statistical Significance (goodness-of-fit, p-values; power); General methods for testing hypotheses (permutation, bootstrap, and likelihood ratio tests); Linear Regression (simple linear regression and multiple linear regression); Analysis of Variance (one-way and two-way designs; multiple comparisons); Generalized Linear and Nonlinear Regression (logistic and Poisson regression; generalized linear models); and Nonparametric regression (smoothing scatterplots; smoothing histograms).

- 36-750 Statistical Computing
- Fall: 9 units

A detailed introduction to elements of computing relating to statistical modeling, targeted to advanced undergraduates, masters students, and doctoral students in Statistics. Topics include important data structures and algorithms; numerical methods; databases; parallelism and concurrency; and coding practices, program design, and testing. Multiple programming languages will be supported (e.g., C, R, Python, etc.). Those with no previous programming experience are welcome but will be required to learn the basics of at least one language via self-study.

- 36-751 Advanced Statistical Computing
- Intermittent: 6 units

A project-based course in statistical computing. Students will choose individual projects on computing topics related to statistical modeling and practice, including databases, parallel and cluster programming, big data frameworks (e.g. Spark or Hadoop), algorithms and data structures, numerical methods, and other topics based on student interest. The course will include introductions to each topic as well as student presentations on the results of their projects. Multiple programming languages will be supported. Recommended prerequisite: 36-650 or 36-750

Prerequisite: 36-750 Min. grade B

- 36-759 Statistical Models of the Brain
- Intermittent: 12 units

This new course is intended for CNBC students, as an additional option for fulfilling the computational core course requirement, but it will also be open to Statistics and Machine Learning students. It should be of interest to anyone wishing to see the way statistical ideas play out within the brain sciences, and it will provide a series of case studies on the role of stochastic models in scientific investigation. Statistical ideas have been part of neurophysiology and the brainsciences since the first stochastic description of spike trains, and the quantal hypothesis of neurotransmitter release, more than 50 years ago. Many contemporary theories of neural system behavior are built with statistical models. For example, integrate-and-fire neurons are usually assumed to be driven in part by stochastic noise; the role of spike timing involves the distinction between Poisson and non-Poisson neurons; and oscillations are characterized by decomposing variation into frequency-based components. In the visual system, V1 simple cells are often described using linear-nonlinear Poisson models; in the motor system, neural response may involve direction tuning; and CA1 hippocampal receptive field plasticity has been characterized using dynamic place models. It has also been proposed that perceptions, decisions, and actions result from optimal (Bayesian) combination of sensory input with previously-learned regularities; and some investigators report new insights from viewing whole-brain pattern responses as analogous to statistical classifiers. Throughout the field of statistics, models incorporating random ``noise'' components are used as an effective vehicle for data analysis. In neuroscience, however, the models also help form a conceptual framework for understanding neural function. This course will examine some of the most important methods and claims that have come from applying statistical thinking

- 36-762 Data Privacy
- Fall: 6 units

Protection of individual data is a growing problem due to the large amount of sensitive and personal data being collected, stored, analyzed, and shared across multiple domains and stakeholders. Researchers are facing new policies and technical requirements imposed by funding agencies on accessing and sharing of the research data. This course will introduce students to (1) key principles associated with the concepts of confidentiality and privacy protection, and (2) techniques for data sharing that support useful statistical inference while minimizing the disclosure of sensitive personal information. Methodologies to be considered will include tools for disclosure limitation used by government statistical agencies and those associated with the approach known as differential privacy which provides a formal privacy guaranteed. Students will explore specific techniques using special tools in R.

- 36-763 Multilevel and Hierarchical Models
- Fall: 6 units

Multilevel and hierarchical models are among the most broadly applied "sophisticated" statistical models, especially in the social and biological sciences. They apply to situations in which the data "cluster" naturally into groups of units that are more related to each other than they are the rest of the data. In the first part of the course we will review linear and generalized linear models. In the second part we will see how to generalize these to multilevel and hierarchical models and relate them to other areas of statistics, and in the third part of the course we will learn how Bayesian statistical methods can help us to build, estimate and diagnose problems with these these models using a variety of data sets and examples.

- 36-765 Writing in Statistics
- Intermittent: 6 units

There is no one correct way to write. But there are things you can do that tend to make it difficult for a reader to absorb the ideas you are writing about, or make it easier for the reader. Thus, it is important to focus on the reader, and the constraints and habits of mind that most readers (even in the rarefied population of academics who can understand the technical details of your work) bring to the task of reading what you have written. The goals for students in this course are: to understand that writing requires an intellectual investment similar to the investment that you put into other areas of your research, from developing research questions, data collection, and data analysis, to writing and testing algorithms, and formulating and proving theorems; to understand ways of organizing your writing that make it more likely that the reader will interpret and understand your ideas in the way that you intend; and to gain experience writing with these ideas in mind. The course is most suitable for graduate students in statistics who are engaged in a writing project (ADA paper, journal article, thesis work, etc.).

- 36-771 Martingales 1: Concentration Inequalities, The Basics
- Intermittent: 6 units

Martingales are a central topic in statistics, but are even more relevant today due to modern applications to sequential learning and decision making problems. This course will present a unified derivation of a wide-variety of new and old concentration inequalities for martingales. We will prove inequalities for scalars and matrices, that hold under a wide variety of nonparametric assumptions. For example, we will encounter exponential concentration inequalities for martingales whose increments have heavy-tails, for continuous-time martingales, and for martingales in general Banach spaces. This course will be a pre-requisite for the second mini, which focuses more on applications.

- 36-772 Martingales 2: Concentration Inequalities, Applications to Sequential Analysis
- Intermittent: 6 units

This second mini will focus on deriving guarantees for a variety of important problems in sequential analysis using the tools developed in the first mini, as well as new tools such as uniform nonasymptotic versions of the law of the iterated logarithm for scalars and matrices. Applications include sequential analogs of the t-test, that are valid without a Gaussian assumption, best-arm identification in multi-armed bandits, average treatment effect estimation in sequential clinical trials, sequential covariance matrix estimation, and other such problems.

- 36-775 Data Ethics & Responsible Conduct of Research
- Intermittent: 3 units

TBD

- 36-777 Multivariate Analysis I
- Intermittent: 6 units

This is the first part of a semester long course on multivariate analysis. The aim of the class is to provide fundamental tools in understanding multivariate (including high dimensional) data. In this MINI we will study in detail the multivariate Gaussian distribution, the Wishart and Hotelling distributions. Time permitting we will cover principal component analysis (PCA) as well as discriminant analysis.

- 36-778 Multivariate Analysis II
- All Semesters: 6 units

This is the second part of the multivariate analysis class. This MINI will discuss asymptotic inequalities for eigenvalues of Gaussian matrices, quadratic form concentration inequalities, and matrix estimation (including multivariate regression, covariance matrix estimation, PCA). Time permitting the class might also cover dimension reduction and graphical models.

- 36-779 Topics in Modern Multivariate Analysis II
- Intermittent: 6 units

This is the second part of a semester-long course on modern multivariate analysis. In this MINI we will introduce recent research results focusing on high dimensional multivariate analysis. Topics include high dimensional mean and covariance testing, kernel based methods, structured high dimensional subspace estimation (sparse PCA, functional data), and network data.

- 36-791 Central Limit Theorem in High-Dimensions
- Intermittent: 6 units

TBD

- 36-792 Topic Detection and Document Clustering
- Intermittent: 6 units

Imagine if someone read all your email. Everything you sent, everything you received. What would they find? Do you have repeating topics? How do the topics change over time? The Enron Corporation was an energy, commodities, and services company in Houston, Texas that went spectacularly bankrupt in 2001 after it was revealed that it was engaging in systematic, planned accounting fraud. At its peak, it employed over 20,000 people with revenues over $100 billion. Its downfall was related to deregulation of California's energy commodity trading and a series of rolling power blackouts over months. For example, Enron traders encouraged the removal of power during the energy crisis by suggesting plant shutdowns. The resulting increase in the price for power made them a fortune. After Enron's collapse, journalists used the Freedom of Information Act to release the emails sent/received by the employees of Enron. Subsequently, the emails were analyzed to see who knew what and when. Every news article, email, letter, blog, tweet, etc can be thought of as an observation. We characterize these documents by their length, what words they use and how often, and possibly extra information like the time, the recipient, etc. Topic detection and document clustering methods are statistical and machine learning tools that extract and identify related documents, possibly over time. These methods need to be flexible enough to handle both very small and very large clusters of documents, topics that change in importance, and topics that appear and disappear. This class will emphasize application of methods and real-world data analysis. Class time will be split into lecture and "lab". (Bring your laptop.) Occasional homeworks and final project, but mostly we'll focus on the downfall of Enron as our overarching case study.

## Faculty

ZACHARY BRANSON, Assistant Teaching Professor – M.S. in Statistics, Harvard University; Carnegie Mellon, 2019–

DAVID CHOI, Assistant Professor of Statistics and Information Systems – Ph.D., Stanford University; Carnegie Mellon, 2004–

ALEXANDRA CHOULDECHOVA, Assistant Professor of Statistics and Public Policy – Ph.D. , Stanford University; Carnegie Mellon, 2014–

PETER FREEMAN, Assistant Teaching Faculty – Ph.D. , University of Chicago; Carnegie Mellon, 2004–

MAX G'SELL, Assistant Professor – Ph.D., Stanford University ; Carnegie Mellon, 2014–

CHRISTOPHER R. GENOVESE, Department Head and Professor of Statistics – Ph.D., University of California, Berkeley; Carnegie Mellon, 1994–

JOEL B. GREENHOUSE, Professor of Statistics – Ph.D., University of Michigan; Carnegie Mellon, 1982–

AMELIA HAVILAND, Anna Loomis McCandless Professorship of Statistics and Public Policy – Ph.D., Carnegie Mellon University; Carnegie Mellon, 2003–

JIASHUN JIN, Professor of Statistics – Ph.D., Stanford University; Carnegie Mellon, 2007–

BRIAN JUNKER, Associate Dean and Professor of Statistics – Ph.D., University of Illinois; Carnegie Mellon, 1990–

ROBERT E. KASS, Professor of Statistics – Ph.D., University of Chicago; Carnegie Mellon, 1981–

EDWARD KENNEDY, Assistant Professor – Ph.D., University of Pennsylvania; Carnegie Mellon, 2016–

ANN LEE, Associate Professor – Ph.D., Brown University; Carnegie Mellon, 2005–

JOHN P. LEHOCZKY, Thomas Lord Professor of Statistics – Ph.D., Stanford University; Carnegie Mellon, 1969–

JING LEI, Associate Professor – Ph.D., University of California, Berkeley; Carnegie Mellon, 2011–

ANJALI MAZUMDER, Assistant Research Professor

DANIEL NAGIN, Teresa and H. John Heinz III Professor of Public Policy – Ph.D., Carnegie Mellon University; Carnegie Mellon, 1976–

MATEY NEYKOV, Assistant Professor – Ph.D., Harvard University; Carnegie Mellon, 2017–

NYNKE NIEZINK, Assistant Professor – Ph.D., University of Groningen; Carnegie Mellon, 2017–

REBECCA NUGENT, Associate Department Head, Teaching Professor – Ph.D., University of Washington; Carnegie Mellon, 2006–

ALEX REINHART, Assistant Teaching Faculty – Ph.D., Carnegie Mellon University; Carnegie Mellon, 2018–

ALESSANDRO RINALDO, Professor – Ph.D., Carnegie Mellon; Carnegie Mellon, 2005–

KATHRYN ROEDER, Professor of Statistics – Ph.D., Pennsylvania State University; Carnegie Mellon, 1994–

CHAD M. SCHAFER, Associate Professor – Ph.D., University of California, Berkeley; Carnegie Mellon, 2004–

TEDDY SEIDENFELD, Herbert A. Simon Professor of Philosophy and Statistics – Ph.D., Columbia University; Carnegie Mellon, 1985–

COSMA SHALIZI, Associate Professor – Ph.D., University of Wisconsin, Madison; Carnegie Mellon, 2005–

RYAN TIBSHIRANI, Associate Professor – Ph.D., Stanford University; Carnegie Mellon, 2011–

VALERIE VENTURA, Associate Professor – Ph.D., University of Oxford; Carnegie Mellon, 1997–

ISABELLA VERDINELLI, Professor in Residence – Ph.D., Carnegie Mellon University; Carnegie Mellon, 1991–

LARRY WASSERMAN, Professor of Statistics – Ph.D., University of Toronto; Carnegie Mellon, 1988–

YUTING WEI, Assistant Professor – Ph.D. , University of California; Carnegie Mellon, 2019–

## Emeriti Faculty

GEORGE T. DUNCAN, Professor of Statistics and Public Policy – Ph.D., University of Minnesota; Carnegie Mellon, 1974–

WILLIAM F. EDDY, John C. Warner Professor of Statistics – Ph.D, Yale University; Carnegie Mellon, 1976–

JOSEPH B. KADANE, Leonard J. Savage Professor of Statistics and Social Sciences – Ph.D., Stanford University; Carnegie Mellon, 1969–

MARK J. SCHERVISH, Professor of Statistics – Ph.D., University of Illinois; Carnegie Mellon, 1979–

DALENE STANGL, Teaching Professor – Ph.D., Carnegie Mellon University; Carnegie Mellon, 2017–

## Adjunct Faculty

OLGA CHILINA, Lecturer – MS, University of Toronto; Carnegie Mellon, 2016–

APRIL GALYARDT – Ph.D., Carnegie Mellon University; Carnegie Mellon, 2017–

CHRISTOPHER PETER MAKRIS, Adjunct Lecturer – MSP, Carnegie Mellon University; Carnegie Mellon, 2018–

ROSS O'CONNELL – Ph.D., University of Michigan; Carnegie Mellon, 2016–

GORDON WEINBERG, Senior Lecturer – M.A. Mathematics, University of Pittsburgh; Carnegie Mellon, 2004–

## Special Faculty

ROBIN MEJIA

## Affiliated Faculty

ANTHONY BROCKWELL – Ph.D., Melbourne University; Carnegie Mellon, 1999–

BERNIE DEVLIN – Ph.D., Pennsylvania State University; Carnegie Mellon, 1994–

SAM VENTURA – Ph.D., Carnegie Mellon University; Carnegie Mellon, 2015–