Computational Biology Program

Russell Schwartz, PhD, Department Head
Location: GHC 7725

Phillip Compeau, PhD, Program Director & Assistant Dept. Head
Location: GHC 7403

Tara Seman, Academic Program Manager
Location: GHC 7721
cbd.cmu.edu

Bachelor of Science in Computational Biology

Success in computational biology requires significant technical knowledge of fundamental computer science as well as a broad biological intuition and general understanding of experimental biology.  However, most importantly, it requires students who can integrate their knowledge by making connections between the two fields.

There is significant industry demand for excellent computational biology students, in biotech, pharmaceuticals, and biomedical research.  Both established companies and startups struggle to find employees with the correct skillset, and our students will be able to take advantage of the fact that an undergraduate computational biology major has the rigorous training required to handle the challenges of modern research that is not provided by any of our peer institutions.

Students in the B.S. program in Computational Biology are expected to acquire the following skills upon graduation:

  • Understand the fundamentals of single and multi-variable calculus, as used to construct models of biological systems.
  • Construct their own logical mathematical proofs and later apply these proof techniques to theorems in algorithms and theoretical computer science.
  • Obtain a firm grounding in probability and statistics necessary for interpretation of biomedical research results.
  • Apply the fundamentals of modern chemistry and physics to biological molecules.
  • Learn the principles of organization of biological systems on the cellular and molecular level.
  • Interpret the connection of the principles of inheritance to the molecular level.
  • Understand the relationship between macro and micro in terms of biological structure and function and the connection to metabolic pathways.
  • Produce sound, stable, well-organized computer programs that scale well on large datasets.
  • Understand the theoretical basis of modern computer science and integrate the inherent limitations of any computing system.
  • Design algorithms based on efficient data structures to a variety of computational contexts to meet specified goals.
  • Apply machine learning methods by which computers can “learn” from experience and apply these methods to genomic and biomedical data.
  • Become familiar with structured biological databases and computational tools for operating on these databases.
  • Construct mathematical/computational models of biological systems at differing scales and analyze the strengths and weaknesses of these models.
  • Learn the fundamental laboratory techniques used in modern cell and molecular biology as well as the influence of computational methods on experimental design.
  • Acquire a skillset of canonical algorithms applied in modern biological research and understand how these algorithms are applied to solve biological problems.
  • Gain fluency in contemporary biomedical research topics and be able to interpret primary research results in computational biology.
  • Understand the role of computation in biotechnology, pharmaceutical development, and medicine.

Degree Requirements

(students entering Fall 2024)

Students completing the Bachelor of Science in Computational Biology follow certain policies that apply to all SCS students; please consult the SCS policies page for a complete listing of these expectations.

Students must complete a minimum of 360 units for the degree in computational biology.

Mathematics/Statistics Core

21-120Differential and Integral Calculus10
21-122Integration and Approximation10
15-151Mathematical Foundations for Computer Science
(or 21-127/21-128 if not offered)
12
36-218Probability Theory for Computer Scientists
(Students taking 15-259 should take 36-326 or 15-260 instead. 15-260 is only open to students who have taken 15-259.)
9
or 36-226 Introduction to Statistical Inference
or 36-326 Mathematical Statistics (Honors)
or 36-235 Probability and Statistical Inference I
or 15-260 Statistics and Computing
21-241Matrices and Linear Transformations11
or 21-242 Matrix Theory
Total Units52

General Science Core

09-105Introduction to Modern Chemistry I10
or 09-107 Honors Chemistry: Fundamentals, Concepts and Applications
33-121Physics I for Science Students12
or 33-141 Physics I for Engineering Students
Total Units22

Biological Core

03-151Honors Modern Biology10
or 03-121 Modern Biology
03-221Genomes, Evolution, and Disease: Introduction to Quantitative Genetic Analysis9
03-232Biochemistry I
(Students taking 03-231, including pre-med students, will take organic chemistry as a prerequisite, which will satisfy a biology elective requirement.)
9
or 03-231 Honors Biochemistry
03-320Cell Biology9
Total Units37

Computer Science Core 

07-128First Year Immigration Course
(This course may be replaced by 03-201 or 03-202 if and only if 07-128 is not offered)
3
15-122Principles of Imperative Computation12
15-251Great Ideas in Theoretical Computer Science12
15-451Algorithm Design and Analysis12
or 15-351 Algorithms and Advanced Data Structures
10-315Introduction to Machine Learning (SCS Majors)12
Total Units51

Computational Biology Core 

02-261Quantitative Cell and Molecular Biology Laboratory12
or 02-262 Computation and Biology Integrated Research Lab
02-180Great Ideas in Computational Biology I5
02-181Great Ideas in Computational Biology IIx5
*02-251 is allowed if 02-180 and 02-181 are not offered
02-402Computational Biology Seminar3
02-510Computational Genomics12
02-512Computational Methods for Biological Modeling and Simulation9
Total Units46

Major Electives 

02-3xxComputational Biology Electives at 300 level or above18-24
03-3xxBiology Electives at 300 level or above (09-217 or 42-202 also count as biology electives)9-12
xx-2xxSchool of Computer Science Electives at 200 level or above, at least 9 units each. 15-150 is an acceptable 100-level course counting in this category, but the following exceptions are not allowed in this category: 02-201, 02-223, 02-250, 02-261, 02-262, 11-423, 15-351, 16-223, 17-200, 17-333, 17-562.18-24
Total Units45-60

Humanities & Arts

All candidates for the bachelor's degree in Computer Science must complete a minimum of 63 units offered by the College of Humanities & Social Sciences and/or the College of Fine Arts. These courses offer students breadth in their education and perspectives and provide students with a better appreciation of social, artistic, cultural, political and economic issues that can influence their effectiveness as computer scientists upon graduation.

Requirements for this component of the degree are listed under the SCS main page under General Education Requirements.

Computing @ Carnegie Mellon (1 course)

The following course is required of all students to familiarize them with the campus computing environment:

99-101Core@CMU3

Free Electives 

A free elective is any Carnegie Mellon course. However, a maximum of nine (9) units of Physical Education and/or Military Science (ROTC) and/or Student-Led (StuCo) courses may be used toward fulfilling graduation requirements.

Summary of Degree Requirements

Area
Math/Stats Core52
General Science Core22
Biological Core37
Computer Science Core51
Computational Biology Core46
Major Electives45-60
General Education (Humanities & Arts)63
Computing at Carnegie Mellon3
Remaining Units42-57
Total Units360

Sample Course Sequence

The following is an example four-year course sequence for computational biology majors, assuming the student has credit for one semester of calculus.  Note that our suggested courses during the first year fall are aligned with the sample course sequence for Computer Science majors.  All students interested in computational biology should take 03-121 (Modern Biology) or 03-151 (Honors Modern Biology) in their first fall and 02-180 (Great Ideas in Computational Biology I) and 02-181 (Great Ideas in Computational Biology II) in their first spring.

Some suggestions listed below are quite flexible. For example, physics and chemistry can be taken at any point in the student's first three semesters, and some of the computer science courses below can be replaced by other courses within the School of Computer Science, depending on a student’s individual interests.

Other courses, such as cell biology, biochemistry, computational genomics, and biological modeling and simulation, are only offered in either the fall or the spring.

We discuss a tailored plan with our majors to ensure that courses are taken at the appropriate times, while affording each student the flexibility to explore their other interests at CMU.

Note: Before you arrive at CMU, you will take 99-101 Computing at Carnegie Mellon and 15-051, a Discrete Math primer, in your own time. These short courses are provided to incoming students for free.

First-YearSecond-Year
FallSpringFallSpring
07-128 First Year Immigration Course02-251 Great Ideas in Computational Biology02-261 Quantitative Cell and Molecular Biology Laboratory02-xxx Computational Biology Elective
15-112 Fundamentals of Programming and Computer Science15-122 Principles of Imperative Computation21-241 Matrices and Linear Transformations15-251 Great Ideas in Theoretical Computer Science
15-131 Great Practical Ideas for Computer Scientists09-105 Introduction to Modern Chemistry I33-121 Physics I for Science Students03-232 Biochemistry I
15-151 Mathematical Foundations for Computer Science21-259 Calculus in Three Dimensions36-218 Probability Theory for Computer Scientists03-221 Genomes, Evolution, and Disease: Introduction to Quantitative Genetic Analysis
03-151 Honors Modern Biology76-101 Interpretation and Argument15-150 Principles of Functional Programmingxx-xxx Humanities and Arts Elective
21-122 Integration and Approximation

Third-YearFourth-Year
FallSpringFallSpring
02-512 Computational Methods for Biological Modeling and Simulation02-402 Computational Biology Seminar02-xxx Computational Biology Elective xx-xxx Humanities and Arts Elective
03-320 Cell Biology02-510 Computational Genomicsxx-xxx Humanities and Arts Elective xx-xxx Free Elective
10-315 Introduction to Machine Learning (SCS Majors)03-xxx Biology Elective xx-xxx Free Elective xx-xxx Free Elective
15-210 Parallel and Sequential Data Structures and Algorithms15-451 Algorithm Design and Analysisxx-xxx Free Elective xx-xxx Free Elective
xx-xxx Humanities and Arts Electivexx-xxx Humanities and Arts Elective
 

Additional Major in Computational Biology

The Additional Major in Computational Biology is designed for undergraduate students wishing to study computational biology as a second field of study at Carnegie Mellon University in addition to their primary major.

The additional major is open to all students who complete the prerequisite coursework listed below, with the requirement that a student from outside SCS must have a 3.0 overall QPA when applying.

To prevent double-counting, students must complete at least seven courses of at least 9 units each as part of the additional major in computational biology (not including pre-requisites) that are unique to the additional major.

Students interested in the Additional Major in Computational Biology should contact the Computational Biology Undergrad Program Director.

Prerequisite Courses

02-250Introduction to Computational Biology12
02-180Great Ideas in Computational Biology I5
02-181Great Ideas in Computational Biology IIx5
*02-251 is allowed if 02-180 and 02-181 are not offered
03-151Honors Modern Biology10
or 03-121 Modern Biology
15-122Principles of Imperative Computation12
15-151Mathematical Foundations for Computer Science12
or 21-127 Concepts of Mathematics
or 21-128 Mathematical Concepts and Proofs
21-120Differential and Integral Calculus10
21-122Integration and Approximation10
Total Units76

Mathematics/Statistics Core

36-218Probability Theory for Computer Scientists9
or 36-226 Introduction to Statistical Inference
or 36-326 Mathematical Statistics (Honors)
or 36-235 Probability and Statistical Inference I
or 15-260 Statistics and Computing
21-241Matrices and Linear Transformations11
or 21-242 Matrix Theory
Total Units20

General Science Core

09-105Introduction to Modern Chemistry I10
or 09-107 Honors Chemistry: Fundamentals, Concepts and Applications
33-121Physics I for Science Students12
or 33-141 Physics I for Engineering Students
Total Units22

Biological Core

03-221Genomes, Evolution, and Disease: Introduction to Quantitative Genetic Analysis9
or 03-220 Genetics
03-232Biochemistry I
(Students taking 03-231, including pre-med students, will take organic chemistry as a prerequisite, which will satisfy a biology elective requirement.)
9
or 03-231 Honors Biochemistry
03-320Cell Biology9
Total Units27

Computer Science Core 

15-251Great Ideas in Theoretical Computer Science12
15-451Algorithm Design and Analysis12
or 15-351 Algorithms and Advanced Data Structures
10-315Introduction to Machine Learning (SCS Majors)12
Total Units36

Computational Biology Core 

02-261Quantitative Cell and Molecular Biology Laboratory12
or 02-262 Computation and Biology Integrated Research Lab
02-402Computational Biology Seminar3
02-510Computational Genomics12
02-512Computational Methods for Biological Modeling and Simulation9
Total Units36

Major Electives 

02-3xxComputational Biology Electives at 300 level or above18-24
03-3xxBiology Electives at 300 level or above (09-217 or 42-202 also count as biology electives)9-12
xx-2xxSchool of Computer Science Electives at 200 level or above, at least 9 units each. 15-150 is an acceptable 100-level course counting in this category, but the following exceptions are not allowed in this category: 02-201, 02-223, 02-250, 02-261, 02-262, 11-423, 15-351, 16-223, 17-200, 17-333, 17-562.18-24
Total Units45-60

General Education (Humanities & Arts)

For specific courses that may be used to satisfy each elective, please consult the General Education Requirements for your primary major.

Computational Biology Minor

SCS Majors: Please see the Computational Biology Concentration

Phillip Compeau, PhD, Director
Tara Seman, Program Manager

The computational biology minor is open to students in any major of any college at Carnegie Mellon outside the School of Computer Science.  The curriculum and course requirements are designed to maximize the participation of students from diverse academic disciplines. The program seeks to produce students with both basic computational skills and knowledge in biological sciences that are central to computational biology.

Students are encouraged to declare the minor as early as possible in their undergraduate careers and in all cases before their final semester so that the minor advisor can provide advice on their curriculum.

Why Minor in Computational Biology?

Computational Biology is concerned with solving biological and biomedical problems using mathematical and computational methods. It is recognized as an essential element in modern biological and biomedical research. There have been fundamental changes in biology and medicine over the past two decades due to spectacular advances in high throughput data collection for genomics, proteomics and biomedical imaging. The resulting availability of unprecedented amounts of biological data demands the application of advanced computational tools to build integrated models of biological systems, and to use them to devise methods of prevent or treat disease. Computational Biologists inhabit and expand the interface of computation and biology, making them integral to the future of biology and medicine.

Policy on Double Counting

No more than two courses may be double counted with your major's core requirements. Courses in the minor may not be counted towards another SCS minor. Consult the minor advisor for more information.

Curriculum Overview

The minor in computational biology requires a total of five courses: 3 core courses, 1 biology elective, and 1 computational biology elective, for a total of at least 45 units.

Prerequisites

Students must take two courses as prerequisites from the following: Units
One of:
03-151Honors Modern Biology10
03-121Modern Biology9
and one of:
15-112Fundamentals of Programming and Computer Science12
15-110Principles of Computing10

Core Classes 

Students must take two from the following courses:
One of:
02-250Introduction to Computational Biology12
*02-251 is allowed if 02-180 and 02-181 are not offered
One of:
02-261Quantitative Cell and Molecular Biology Laboratory
(03-343 Experimental Techniques in Molecular Biology may be substituted for 02-261 with permission of the minor advisor; 03-116 may be used to replace 02-261 if and only if the latter is not offered)
Var.
02-262Computation and Biology Integrated Research LabVar.

ElectiveS

Three computational biology electives (02-XXX) at the 300 level or higher.

Computational Biology Courses

About Course Numbers:

Each Carnegie Mellon course number begins with a two-digit prefix that designates the department offering the course (i.e., 76-xxx courses are offered by the Department of English). Although each department maintains its own course numbering practices, typically, the first digit after the prefix indicates the class level: xx-1xx courses are freshmen-level, xx-2xx courses are sophomore level, etc. Depending on the department, xx-6xx courses may be either undergraduate senior-level or graduate-level, and xx-7xx courses and higher are graduate-level. Consult the Schedule of Classes each semester for course offerings and for any necessary pre-requisites or co-requisites.


02-112 Programming for Scientists
Intermittent: 12 units
Provides a practical introduction to programming for students with little or no prior programming experience who are interested in science. Fundamental scientific algorithms will be introduced, and extensive programming assignments will be based on analytical tasks that might be faced by scientists, such as parsing, simulation, and optimization. Principles of good software engineering will also be stressed. The course will introduce students to the Go programming language, an industry-supported, modern programming language, the syntax of which will be covered in depth. Other assignments will be given in other programming languages such as Python and Java to highlight the commonalities and differences between languages. No prior programming experience is assumed, and no biology background is needed. Analytical skills and mathematical maturity are required.
02-180 Great Ideas in Computational Biology I
Spring: 5 units
This course introduces great ideas that have formed the foundation for the recent transformation of life sciences into a fully-fledged computational discipline. Extracting biological understanding from both large and small data sets now requires the use and design of novel algorithms, developed in the field of computational biology. The course is designed as a gateway exposure to computational biology for first-year undergraduates in the School of Computer Science, although it is open to other computationally minded students who are interested in exploring the field. This course is the first in a two-course sequence, showing students fundamental algorithmic techniques that are used in modern biological investigations. This first course focuses on algorithmic techniques for genomics, or the study of DNA, and it addresses questions like and amp;quot;How do we reconstruct the sequence of a genome? and amp;quot;, How do we compare genes and genomes?, and and amp;quot;How can we build evolutionary trees to infer relationships among many species? and amp;quot; Previous exposure to molecular biology is not required.
Prerequisites: (15-112 or 02-201) and (21-128 or 21-127 or 15-151)
02-181 Great Ideas in Computational Biology IIx
Spring: 5 units
This course introduces great ideas that have formed the foundation for the recent transformation of life sciences into a fully-fledged computational discipline. Extracting biological understanding from both large and small data sets now requires the use and design of novel algorithms, developed in the field of computational biology. This course is the second in a two-course sequence, offering a gateway exposure for students showing fundamental algorithmic techniques that are used in modern biological investigations. It is designed for first-year undergraduates in the School of Computer Science, although it is open to other computationally minded students who are interested in exploring the field. Whereas the first course in the sequence focuses on genomics, this second course largely introduces topics outside of DNA, from proteins, to neural networks, to a study of algorithms that nature has evolved to solve problems. Previous exposure to molecular biology is not required. After completion of the course, students will be well equipped to tackle advanced computational challenges in biology.
Prerequisites: (02-201 or 15-112) and (15-151 or 21-127 or 21-128)
02-201 Programming for Scientists
Fall and Spring: 10 units
Provides a practical introduction to programming for students with little or no prior programming experience who are interested in science. Fundamental scientific algorithms will be introduced, and extensive programming assignments will be based on analytical tasks that might be faced by scientists, such as parsing, simulation, and optimization. Principles of good software engineering will also be stressed. The course will introduce students to the Go programming language, an industry-supported, modern programming language, the syntax of which will be covered in depth. Other assignments will be given in other programming languages such as Python and Java to highlight the commonalities and differences between languages. No prior programming experience is assumed, and no biology background is needed. Analytical skills and mathematical maturity are required. Course not open to CS majors.
02-218 Introduction to Computational Medicine
All Semesters: 3 units
This course is an introduction to computational methods relevant to the diagnosis and treatment of human diseases. It is the microcourse version of 02-518, Computational Medicine. The course begins with an introduction to the field of Medicine, and an overview of the primary clinical tasks associated with Computational Medicine (phenotyping; biomarker discovery; predictive modeling). Next, we provide an introduction to several Machine Learning techniques, and how those techniques can be used to perform the clinical tasks. For the remainder of the course, students will be guided through the analysis of a clinical data set to gain experience with these techniques. No prior experience with Medicine, Machine Learning, or computer programming is required. Students will be graded based on quizzes and one homework.
02-223 Personalized Medicine: Understanding Your Own Genome
Fall: 9 units
Do you want to know how to discover the tendencies hidden in your genome? Since the first draft of a human genome sequence became available at the start of this century, the cost of genome sequencing has decreased dramatically. Personal genome sequencing will likely become a routine part of medical exams for patients for prognostic and diagnostic purposes. Personal genome information will also play an increasing role in lifestyle choices, as people take into account their own genetic tendencies. Commercial services such as 23andMe have already taken first steps in this direction. Computational methods for mining large-scale genome data are being developed to unravel the genetic basis of diseases and assist doctors in clinics. This course introduces students to biological, computational, and ethical issues concerning use of personal genome information in health maintenance, medical practice, biomedical research, and policymaking. We focus on practical issues, using individual genome sequences (such as that of Nobel prize winner James Watson) and other population-level genome data. Without requiring any background in biology or CS, we begin with an overview of topics from genetics, molecular biology, stats, and machine learning relevant to the modern personal genome era. We then cover scientific issues such as how to discover your genetic ancestry and how to learn from genomes about migration and evolution of human populations. We discuss medical aspects such as how to predict whether you will develop diseases such as diabetes based on your own genome, how to discover disease-causing genetic mutations, and how genetic information can be used to recommend clinical treatments.
02-250 Introduction to Computational Biology
Spring: 12 units
This class provides a general introduction to computational tools for biology. The course is divided into two halves. The first half covers computational molecular biology and genomics. It examines important sources of biological data, how they are archived and made available to researchers, and what computational tools are available to use them effectively in research. In the process, it covers basic concepts in statistics, mathematics, and computer science needed to effectively use these resources and understand their results. Specific topics covered include sequence data, searching and alignment, structural data, genome sequencing, genome analysis, genetic variation, gene and protein expression, and biological networks and pathways. The second half covers computational cell biology, including biological modeling and image analysis. It includes homework requiring modification of scripts to perform computational analyses. The modeling component includes computer models of population dynamics, biochemical kinetics, cell pathways, and neuron behavior. The imaging component includes the basics of machine vision, morphological image analysis, image classification and image-derived models. The course is taught under two different numbers. The lectures are the same for both but recitations and examinations are separate. 02-250 is intended primarily for computational biology, computer science, statistics or engineering majors at the undergraduate or graduate level who have had prior experience with computer science or programming. 03-250 is intended primarily for biological sciences or biomedical engineering majors who have had limited prior experience with computer science or programming. Students may not take both 02-250 and 03-250 for credit. Prerequisite: (02-201 or 15-110 or 15-112), or permission of the instructors.
Prerequisites: (02-201 or 15-112 or 15-110) and (03-131 or 03-151 or 03-121)

Course Website: http://www.cbd.cmu.edu/education/undergraduate-courses/introduction-to-computational-biology/
02-251 Great Ideas in Computational Biology
Spring: 12 units
This 12-unit course provides an introduction to many of the great ideas that have formed the foundation for the recent transformation of life sciences into a fully-fledged computational discipline. Extracting biological understanding from both large and small data sets now requires the use and design of novel algorithms, developed in the field of computational biology. This gateway course is intended as a first exposure to computational biology for first-year undergraduates in the School of Computer Science, although it is open to other computationally minded 6students who are interested in exploring the field. Students will learn fundamental algorithmic and machine learning techniques that are used in modern biological investigations, including algorithms to process string, graph, and image data. They will use these techniques to answer questions such as "How do we reconstruct the sequence of a genome?", "How do we infer evolutionary relationships among many species?", and "How can we predict each gene's biological role?" on biological data. Previous exposure to molecular biology is not required, as the instructors will provide introductory materials as needed. After completion of the course, students will be well equipped to tackle advanced computational challenges in biology.
Prerequisites: (15-112 or 02-201) and (21-128 or 15-151 or 21-127)
02-261 Quantitative Cell and Molecular Biology Laboratory
Fall and Spring
This is an introductory laboratory-based course designed to teach basic biological laboratory skills used in exploring the quantitative nature of biological systems and the reasoning required for performing research in computational biology. Over the course of the semester, students will design and perform multiple modern experiments and quantitatively analyze the results of these experiments. During this course students will also have an opportunity to use techniques learned during the course to experimentally answer an open question. Designing the experiments will require students to think critically about the biological context of the experiments as well as the necessary controls to ensure interpretable experimental results. During this course students will gain experience in many aspects of scientific research, including: sequencing DNA, designing and performing PCR for a variety of analyses, maintaining cell cultures, taking brightfield and fluorescent microscopy images, developing methods for automated analysis of cell images, communicating results to peers and colleagues. Course Outline: (1) 3-hour lab per week, (1) 1-hour lecture per week.
Prerequisites: 02-201 or 15-112
02-262 Computation and Biology Integrated Research Lab
All Semesters
Modern biological research is heavily interdisciplinary in nature requiring the use of a diverse set of experimental techniques and computational analysis. This course provides students with a modern research experience while training them to communicate and collaborate in an interdisciplinary setting to better prepare them to join the workforce as members of interdisciplinary teams. This will be accomplished by focusing efforts on a real research problem requiring sophisticated experimentation and computation for success. Class time will include both laboratory research time (wet lab and computational) and activities designed to teach and practice communication methods for interdisciplinary teams. Students are expected to have a strong background in biology or computation and an interest in both. Pre-requisites include either (03-117 or 03-124 or 03-343) or (15-112 or equivalent)
02-317 Algorithms in Nature
Intermittent: 9 units
Computer systems and biological processes often rely on networks of interacting entities to reach joint decisions, coordinate and respond to inputs. There are many similarities in the goals and strategies of biological and computational systems which suggest that each can learn from the other. These include the distributed nature of the networks (in biology molecules, cells, or organisms often operate without central control), the ability to successfully handle failures and attacks on a subset of the nodes, modularity and the ability to reuse certain components or sub-networks in multiple applications and the use of stochasticity in biology and randomized algorithms in computer science. In this course we will start by discussing classic biologically motivated algorithms including neural networks (inspired by the brain), genetic algorithms (sequence evolution), non-negative matrix factorization (signal processing in the brain), and search optimization (ant colony formation). We will then continue to discuss more recent bi-directional studies that have relied on biological processes to solve routing and synchronization problems, discover Maximal Independent Sets (MIS), and design robust and fault tolerant networks. In the second part of the class students will read and present new research in this area. Students will also work in groups on a final project in which they develop and test a new biologically inspired algorithm. No prior biological knowledge required.
Prerequisites: 15-210 and 15-251
Course Website: http://www.algorithmsinnature.org
02-319 Genomics and Epigenetics of the Brain
Fall: 9 units
This course will provide an introduction to genomics, epigenetics, and their application to problems in neuroscience. The rapid advances in single cell sequencing and other genomic technologies are revolutionizing how neuroscience research is conducted, providing tools to study how different cell types in the brain produce behavior and contribute to neurological disorders. Analyzing these powerful new datasets requires a foundation in molecular neuroscience as well as key computational biology techniques. In this course, we will cover the biology of epigenetics, how proteins sitting on DNA orchestrate the regulation of genes. In parallel, programming assignments and a project focusing on the analysis of a primary genomic dataset will teach principles of computational biology and their applications to neuroscience. The course material will also serve to demonstrate important concepts in neuroscience, including the diversity of neural cell types, neural plasticity, the role that epigenetics plays in behavior, and how the brain is influenced by neurological and psychiatric disorders. Although the course focuses on neuroscience, the material is accessible and applicable to a wide range of topics in biology.
Prerequisites: (03-121 or 03-151) and (03-221 or 03-220) and (15-112 or 15-121 or 02-201 or 15-110)
02-331 Modeling Evolution
Spring: 12 units
Some of the most serious public health problems we face today, from drug-resistant bacteria, to cancer, all arise from a fundamental property of living systems and #8212; their ability to evolve. Since Darwin's theory of natural selection was first proposed, we have begun to understand how heritable differences in reproductive success drive the adaptation of living systems. This makes it intuitive and tempting to view evolution from an optimization perspective. However, genetic drift, phenotypic trade-offs, constraints, and changing environments, are among the many factors that may limit the optimizing force of natural selection. This tug-of-war between selection and drift, between the forces that produce variation in a population, and the forces suppressing this variation, make evolutionary processes much more complex to model and understand than previously thought. The aim of this class is to provide an introduction into the theoretical formalism necessary to understand how biological systems are shaped by the forces and constraints driving evolutionary dynamics.
Prerequisites: 15-112 and 21-241 and (15-259 or 36-218 or 21-325 or 36-225)
02-402 Computational Biology Seminar
Fall and Spring: 3 units
This course consists of weekly invited presentations on current computational biology research topics by leading scientists. Attendance is mandatory for a passing grade. You must sign in and attend at least 80% of the seminars. See course website for seminar locations. Some will be at the University of Pittsburgh and some will be at Carnegie Mellon.
02-403 Special Topics: Graph Representation Learning in Biology
Intermittent: 12 units
Biological and cellular systems are often modeled as graphs (networks) of interacting elements. This approach has been highly successful owing to the theory, methodology and algorithms that support analysis and learning on graphs. However, recent advances in deep learning techniques have led to a surge in research on graph representation learning. In particular, these advances have led to new state-of-the-art results in biomedicine and healthcare. This course will provide a synthesis and overview of graph representation learning within systems biology and medicine. We will begin with a discussion of the goals of graph representation learning, as well as key methodological foundations in graph theory and network analysis. We next review traditional methods such as topological descriptors, graph kernels, and spectral graph theory. We then introduce techniques for learning node embeddings, including random-walk based methods and applications to knowledge graphs. We finally provide a technical synthesis and introduction to the highly successful graph neural network formalism as well as recent advancements in deep generative models for graphs.
Prerequisites: (02-250 Min. grade C or 03-250 Min. grade C) and 03-121 Min. grade C
02-414 String Algorithms
Intermittent: 12 units
Provides an in-depth look at modern algorithms used to process string data, particularly those relevant to genomics. The course will cover the design and analysis of efficient algorithms for processing enormous amounts of collections of strings. Topics will include string search; inexact matching; string compression; string data structures such as suffix trees, suffix arrays, and searchable compressed indices; and the Borrows-Wheeler transform. Applications of these techniques in genomics will be presented, including genome assembly, transcript assembly, whole-genome alignment, gene expression quantification, read mapping, and search of large sequence databases.
Prerequisites: 15-127 or 15-151 or 21-128
02-421 Algorithms for Computational Structural Biology
Intermittent: 12 units
Some of the most interesting and difficult challenges in computational biology and bioinformatics arise from the determination, manipulation, or exploitation of molecular structures. This course will survey these challenges and present a variety of computational methods for addressing them. Topics will include: molecular dynamics simulations, computer-aided drug design, and computer-aided protein design. The course is appropriate for both students with backgrounds in computer science and those in the life sciences.
02-425 Computational Methods for Proteogenomics and Metabolomics
Spring: 9 units
Proteomics and metabolomics are the large scale study of proteins and metabolites, respectively. In contrast to genomes, proteomes and metabolomes vary with time and the specific stress or conditions an organism is under. Applications of proteomics and metabolomics include determination of protein and metabolite functions (including in immunology and neurobiology) and discovery of biomarkers for disease. These applications require advanced computational methods to analyze experimental measurements, create models from them, and integrate with information from diverse sources. This course specifically covers computational mass spectrometry, structural proteomics, proteogenomics, metabolomics, genome mining and metagenomics.
Prerequisites: 02-250 or 02-604 or 02-251
02-450 Automation of Scientific Research
Spring: 9 units
Automated scientific instruments are used widely in research and engineering. Robots dramatically increase the reproducibility of scientific experiments, and are often cheaper and faster than humans, but are most often used to execute brute-force sweeps over experimental conditions. The result is that many experiments are "wasted" on conditions where the effect could have been predicted. Thus, there is a need for computational techniques capable of selecting the most informative experiments. This course will introduce students to techniques from Artificial Intelligence and Machine Learning for automatically selecting experiments to accelerate the pace of discovery and to reduce the overall cost of research. Real-world applications from Biology, Bioengineering, and Medicine will be studied. Grading will be based on homeworks and two exams. The course is intended to be self-contained, but students should have a basic knowledge of biology, programming, statistics, and machine learning.
Prerequisites: (10-701 or 10-315) and 15-122
02-499 Independent Study in Computational Biology
Fall and Spring
The student will, under the individual guidance of a faculty member, read and digest process papers or a textbook in an advanced area of computational biology not offered by an existing course at Carnegie Mellon. The student will demonstrate their mastery of the material by a combination of one or more of the following: oral discussions with the faculty member; exercises set by the faculty member accompanying the readings; and a written summary synthesizing the material that the student learned. Permission required.
02-500 Undergraduate Research in Computational Biology
Fall and Spring
This course is for undergraduate students who wish to do supervised research for academic credit with a Computational Biology faculty member. Interested students should first contact the Professor with whom they would like to work. If there is mutual interest, the Professor will direct you to the Academic Programs Coordinator who will enroll you in the course. 02-250 is a suggested pre-requisite.

Course Website: https://forms.gle/S1AJX65btkTxwNCw9
02-510 Computational Genomics
Spring: 12 units
Dramatic advances in experimental technology and computational analysis are fundamentally transforming the basic nature and goal of biological research. The emergence of new frontiers in biology, such as evolutionary genomics and systems biology is demanding new methodologies that can confront quantitative issues of substantial computational and mathematical sophistication. From the computational side this course focuses on modern machine learning methodologies for computational problems in molecular biology and genetics, including probabilistic modeling, inference and learning algorithms, data integration, time series analysis, active learning, etc. This course counts as a CSD Applications elective
Prerequisites: 15-122 Min. grade C and (36-235 or 36-218 or 15-259 or 36-225)
02-512 Computational Methods for Biological Modeling and Simulation
Fall: 9 units
This course covers a variety of computational methods important for modeling and simulation of biological systems. It is intended for graduates and advanced undergraduates with either biological or computational backgrounds who are interested in developing computer models and simulations of biological systems. The course will emphasize practical algorithms and algorithm design methods drawn from various disciplines of computer science and applied mathematics that are useful in biological applications. The general topics covered will be models for optimization problems, simulation and sampling, and parameter tuning. Course work will include problem sets with significant programming components and independent or group final projects.
Prerequisites: (15-259 or 36-225 or 36-235 or 36-219 or 21-325 or 36-218 or 36-217) and (21-240 or 21-242 or 21-241) and (02-201 or 15-110 or 15-112) and 21-121
02-514 String Algorithms
Fall: 12 units
Provides an in-depth look at modern algorithms used to process string data, particularly those relevant to genomics. The course will cover the design and analysis of efficient algorithms for processing enormous collections of strings. Topics will include string search; inexact matching; string compression; string data structures such as suffix trees, suffix arrays, and searchable compressed indices; and the Burrows-Wheeler transform. Applications of these techniques in biology will be presented, including genome assembly, transcript assembly, whole-genome alignment, gene expression quantification, read mapping, and search of large sequence databases. No knowledge of biology is assumed, and the topics covered will be of use in other fields involving large collections of strings. Programming proficiency is required.
Prerequisite: 15-251
02-515 Advanced Topics in Computational Genomics
Spring: 12 units
Research in biology and medicine is undergoing a revolution due to the availability of high-throughput technology for probing various aspects of a cell at a genome-wide scale. The next-generation sequencing technology is allowing researchers to inexpensively generate a large volume of genome sequence data. In combination with various other high-throughput techniques for epigenome, transcriptome, and proteome, we have unprecedented opportunities to answer fundamental questions in cell biology and understand the disease processes with the goal of finding treatments in medicine. The challenge in this new genomic era is to develop computational methods for integrating different data types and extracting complex patterns accurately and efficiently from a large volume of data. This course will discuss computational issues arising from high-throughput techniques recently introduced in biology, and cover very recent developments in computational genomics and population genetics, including genome structural variant discovery, association mapping, epigenome analysis, cancer genomics, and transcriptome analysis. The course material will be drawn from very recent literature. Grading will be based on weekly write-ups for ciritiques of the papers to be discussed in the class, class participation, and a final project. It assumes a basic knowledge of machine learning and computational genomics.
02-518 Computational Medicine
Fall: 12 units
Modern medical research increasingly relies on the analysis of large patient datasets to enhance our understanding of human diseases. This course will focus on the computational problems that arise from studies of human diseases and the translation of research to the bedside to improve human health. The topics to be covered include computational strategies for advancing personalized medicine, pharmacogenomics for predicting individual drug responses, metagenomics for learning the role of the microbiome in human health, mining electronic medical records to identify disease phenotypes, and case studies in complex human diseases such as cancer and asthma. We will discuss how machine learning methodologies such as regression, classification, clustering, semi-supervised learning, probabilistic modeling, and time-series modeling are being used to analyze a variety of datasets collected by clinicians. Class sessions will consist of lectures, discussions of papers from the literature, and guest presentations by clinicians and other domain experts. Grading will be based on homework assignments and a project. 02-250 is a suggested pre-requisite.

Course Website: https://sites.google.com/view/computationalmedicine/
02-530 Cell and Systems Modeling
Fall: 12 units
This course will introduce students to the theory and practice of modeling biological systems from the molecular to the organism level with an emphasis on intracellular processes. Topics covered include kinetic and equilibrium descriptions of biological processes, systematic approaches to model building and parameter estimation, analysis of biochemical circuits modeled as differential equations, modeling the effects of noise using stochastic methods, modeling spatial effects, and modeling at higher levels of abstraction or scale using logical or agent-based approaches. A range of biological models and applications will be considered including gene regulatory networks, cell signaling, and cell cycle regulation. Weekly lab sessions will provide students hands-on experience with methods and models presented in class. Course requirements include regular class participation, bi-weekly homework assignments, a take-home exam, and a final project. The course is designed for graduate and upper-level undergraduate students with a wide variety of backgrounds. The course is intended to be self-contained but students may need to do some additional work to gain fluency in core concepts. Students should have a basic knowledge of calculus, differential equations, and chemistry as well as some previous exposure to molecular biology and biochemistry. Experience with programming and numerical computation is useful but not mandatory. Laboratory exercises will use MATLAB as the primary modeling and computational tool augmented by additional software as needed.
Prerequisites: (33-121 or 03-151 or 03-121) and (03-231 or 03-232) and 21-112 and 09-105
02-540 Bioimage Informatics
Intermittent: 12 units
With the rapid advance of bioimaging techniques and fast accumulation of bioimage data, computational bioimage analysis and modeling are playing an increasingly important role in understanding of complex biological systems. The goals of this course are to provide students with the ability to understand a broad set of practical and cutting-edge computational techniques to extract knowledge from bioimages.
Prerequisites: 02-620 or 10-301 or 10-701 or 10-601 or 10-315
02-601 Programming for Scientists
Fall and Spring: 12 units
Provides a practical introduction to programming for students with little previous programming experience who are interested in science. Fundamental scientific algorithms will be introduced, and extensive programming assignments will be based on analytical tasks that might be faced by scientists, such as parsing, simulation, and optimization. Principles of good software engineering will also be stressed. The course will introduce students to the Go programming language, an industry-supported, modern programming language, the syntax of which will be covered in depth. Other assignments may be given in other programming languages to highlight the commonalities and differences between languages. No biology background is needed. Analytical skills, an understanding of programming basics, and mathematical maturity are required. A preparatory self-paced bootcamp on programming basics is provided to students to complete before beginning the course
02-602 Professional Issues for Computational and Automated Scientists
Fall and Spring: 3 units
This course gives Master's in Computational Biology and Master's in Automated Science students the opportunity to develop the professional skills necessary for a successful career in either academia or industry. This course, required in the first semester of both programs, will include assistance with elevator pitches, interview preparation, resume and cover letter writing, networking, and presentation skills. The course will also include opportunities to connect with computational biology professionals as part of industry outreach. The course will meet once a week and is pass/fail only.
02-604 Fundamentals of Bioinformatics
Spring: 12 units
How do we find potentially harmful mutations in your genome? How can we reconstruct the Tree of Life? How do we compare similar genes from different species? These are just three of the many central questions of modern biology that can only be answered using computational approaches. This 12-unit course will delve into some of the fundamental computational ideas used in biology and let students apply existing resources that are used in practice every day by thousands of biologists. The course offers an opportunity for students who possess an introductory programming background to become more experienced coders within a biological setting. As such, it presents a natural next course for students who have completed 02-601.
02-605 Professional Issues in Automated Science
Spring: 3 units
This course gives MS in Automated Science students an opportunity to develop professional skills necessary for a successful career in computational biology. This course will include assistance with resume writing, interview preparation, presentation skills, and job search techniques. This course will also include opportunities to network with computational biology professionals and academic researchers.
02-613 Algorithms and Advanced Data Structures
Fall and Spring: 12 units
The objective of this course is to study algorithms for general computational problems, with a focus on the principles used to design those algorithms. Efficient data structures will be discussed to support these algorithmic concepts. Topics include: Run time analysis, divide-and-conquer algorithms, dynamic programming algorithms, network flow algorithms, linear and integer programming, large-scale search algorithms and heuristics, efficient data storage and query, and NP-completeness. Although this course may have a few programming assignments, it is primarily not a programming course. Instead, it will focus on the design and analysis of algorithms for general classes of problems. This course is not open to CS graduate students who should consider taking 15-651 instead. 02-250 is a suggested prerequisite for undergraduates.
02-614 String Algorithms
Intermittent: 12 units
Provides an in-depth look at modern algorithms used to process string data, particularly those relevant to genomics. The course will cover the design and analysis of efficient algorithms for processing enormous amounts of collections of strings. Topics will include string search; inexact matching; string compression; string data structures such as suffix trees, suffix arrays, and searchable compressed indices; and the Borrows-Wheeler transform. Applications of these techniques in genomics will be presented, including genome assembly, transcript assembly, whole-genome alignment, gene expression quantification, read mapping, and search of large sequence databases.
Prerequisites: 15-151 or 15-127 or 21-128
02-620 Machine Learning for Scientists
Spring: 12 units
With advances in scientific instruments and high-throughput technology, scientific discoveries are increasingly made from analyzing large-scale data generated from experiments or collected from observational studies. Machine learning methods that have been widely used to extract complex patterns from large speech, text, and image data are now being routinely applied to answer scientific questions in biology, bioengineering, and medicine. This course is intended for graduate students interested in learning machine learning methods for scientific data analysis and modeling. It will cover classification and regression techniques such a logistic regression, random forest regression, Gaussian process regression, decision trees, and support vector machines; unsupervised learning methods such as clustering algorithms, mixture models, and hidden Markov models; probabilistic graphical models and deep learning methods; and learning theories such as PAC learning and VC dimension. The course will focus on applications of these methods in genomics and medicine. Programming skills and basic knowledge of linear algebra, probability, statistics are assumed.
Prerequisite: 02-680
02-651 New Technologies and Future Markets
Fall: 12 units
This course focuses on technological trends and how these trends can help shape or disrupt new and existing markets. Students will learn to identify, analyze, and synthesize emerging trends and perform detailed research on how these trends can influence and create markets. By understand the drivers behind these trends students will be able to identify key market opportunity inflection points in biotechnology as well as the relationship between business processes and information technology (IT). Students will also learn to assess some information technologies and the potential of applying them to solve problems and create commercially viable solutions. The course is designed for the student interested in finding new venture opportunities on the cutting edge of technology and finding and evaluating the opportunities for further development. For MS Biotechnology Innovation and Computation students only.
Prerequisite: 11-695
02-654 Biotechnology Enterprise Development
Fall: 12 units
In this course students learn how to develop a biotech start-up, create a Minimum Viable Product (MVP), business model and strategy for the product. Students will learn about business modeling, customer development, customer validation, proposal, product branding, and marketing for their product. The course will require students to spend most time to validate their start up concept and prototypes with potential customers and adapt to critical feedback and revise their respective value propositions accordingly. Students learn to balance technical product development with customer requirements, business strategy and budget constraints. This course provides real world, hands-on learning on what it is like to start a company. Different business modeling will be covered. By understand customer discovery and validation concepts will aid students to effectively modify their original concepts to meet market demands. Student teams will learn how to revise, improve their prototype by the end of the term. This is a fast paced course in which students are expected to spend most of the time outside of the classroom to interact with potential customers to validate, test, verify, and integrate essentials elements for their start-up business proposal. Up to now, students have been learning some technologies and methods for solving problems in the life science industry and build a prototype for their start-up. However, a new venture proposal is not a collection of isolated bits. It should be thorough validated via customer's inputs and market needs to tell a single story of how the venture will reach its end goals. Final deliverable is creation and presentation of a well explicated, business proposal in addition to a product prototype corresponding to the business proposal.
Prerequisites: 02-651 and 11-695
02-680 Essential Mathematics and Statistics for Scientists
Fall: 9 units
This course rigorously introduces fundamental topics in mathematics and statistics to first-year master's students as preparation for more advanced computational coursework. Topics are sampled from information theory, graph theory, proof techniques, phylogenetics, combinatorics, set theory, linear algebra, neural networks, probability distributions and densities, multivariate probability distributions, maximum likelihood estimation, statistical inference, hypothesis testing, Bayesian inference, and stochastic processes. Students completing this course will obtain a broad skillset of mathematical techniques and statistical inference as well as a deep understanding of mathematical proof. They will have the quantitative foundation to immediately step into an introductory master's level machine learning or automation course. This background will also serve students well in advanced courses that apply concepts in machine learning to scientific datasets, such as 02-710 (Computational Genomics) or 02-750 (Automation of Biological Research). The course grade will be computed as the result of homework assignments, midterm tests, and class participation.
02-699 Independent Study in Computational Biology
Fall and Spring
The student will, under the individual guidance of a faculty member, read and digest process papers or a textbook in an advanced area of computational biology not offered by an existing course at Carnegie Mellon. The student will demonstrate their mastery of the material by a combination of one or more of the following: oral discussions with the faculty member; exercises set by the faculty member accompanying the readings; and a written summary synthesizing the material that the student learned. Permission required.
02-700 M.S. Thesis Research
Fall and Spring
This course is for M.S. students who wish to do supervised research for academic credit with a Computational Biology faculty member. Interested students should first contact the Professor with whom they would like to work. If there is mutual interest, the Professor will direct you to the Academic Programs Coordinator, who will enroll you in the course.

Course Website: https://forms.gle/tDKDs1EujvXAApYK7
02-702 Computational Biology Seminar
Fall and Spring: 3 units
This course consists of weekly invited presentations on current computational biology research topics by leading scientists. Attendance is mandatory for a passing grade. You must sign in and attend at least 80 percent of the seminars. See course website for seminar locations. Some will be at the University of Pittsburgh and some will be at Carnegie Mellon.

Course Website: https://www.compbio.cmu.edu/seminar-series/index.html
02-703 Special Topics: Graph Representation Learning in Biology
Intermittent: 12 units
Biological and cellular systems are often modeled as graphs (networks) of interacting elements. This approach has been highly successful owing to the theory, methodology and algorithms that support analysis and learning on graphs. However, recent advances in deep learning techniques have led to a surge in research on graph representation learning. In particular, these advances have led to new state-of-the-art results in biomedicine and healthcare. This course will provide a synthesis and overview of graph representation learning within systems biology and medicine. We will begin with a discussion of the goals of graph representation learning, as well as key methodological foundations in graph theory and network analysis. We next review traditional methods such as topological descriptors, graph kernels, and spectral graph theory. We then introduce techniques for learning node embeddings, including random-walk based methods and applications to knowledge graphs. We finally provide a technical synthesis and introduction to the highly successful graph neural network formalism as well as recent advancements in deep generative models for graphs.
02-704 Special Topics: Introduction to Statistical Genetics
Intermittent: 12 units
This course will cover quantitative topics in human statistical genetics, including the HapMap project, linkage disequilibrium, population structure and stratification, natural selection, genome-wide association studies, estimating and partitioning heritability, association testing, statistical fine-mapping, disease gene mapping, expression quantitative trait loci, single-cell genomics, and polygenic risk prediction. The course will emphasize hands-on analysis of large empirical data sets, thus requiring prior experience with a general-purpose high-level programming language such as Python. After taking this course, each student will have the experience and skills to develop and apply statistical methods to population genetic data.
Prerequisites: (02-601 or 15-112) and (02-680 or 36-226 or 36-236 or 15-259 or 15-260 or 36-218)
02-710 Computational Genomics
Spring: 12 units
Dramatic advances in experimental technology and computational analysis are fundamentally transforming the basic nature and goal of biological research. The emergence of new frontiers in biology, such as evolutionary genomics and systems biology is demanding new methodologies that can confront quantitative issues of substantial computational and mathematical sophistication. From the computational side this course focuses on modern machine learning methodologies for computational problems in molecular biology and genetics, including probabilistic modeling, inference and learning algorithms, data integration, time series analysis, active learning, etc. This course counts as a CSD Applications elective
02-711 Computational Molecular Biology and Genomics
Spring: 12 units
An advanced introduction to computational molecular biology, using an applied algorithms approach. The first part of the course will cover established algorithmic methods, including pairwise sequence alignment and dynamic programming, multiple sequence alignment, fast database search heuristics, hidden Markov models for molecular motifs and phylogeny reconstruction. The second part of the course will explore emerging computational problems driven by the newest genomic research. Course work includes four to six problem sets, one midterm and final exam.
Prerequisites: (03-121 or 03-151) and 15-122
02-712 Computational Methods for Biological Modeling and Simulation
Fall: 12 units
This course covers a variety of computational methods important for modeling and simulation of biological systems. It is intended for graduates and advanced undergraduates with either biological or computational backgrounds who are interested in developing computer models and simulations of biological systems. The course will emphasize practical algorithms and algorithm design methods drawn from various disciplines of computer science and applied mathematics that are useful in biological applications. The general topics covered will be models for optimization problems, simulation and sampling, and parameter tuning. Course work will include problem sets with significant programming components and independent or group final projects.
Prerequisites: (02-601 or 15-112 or 15-110) and (15-259 or 21-325 or 36-218 or 36-219 or 02-680 or 36-217 or 36-235 or 36-225)
02-714 String Algorithms
Fall: 12 units
Provides an in-depth look at modern algorithms used to process string data, particularly those relevant to genomics. The course will cover the design and analysis of efficient algorithms for processing enormous collections of strings. Topics will include string search; inexact matching; string compression; string data structures such as suffix trees, suffix arrays, and searchable compressed indices; and the Borrows-Wheeler transform. Applications of these techniques in biology will be presented, including genome assembly, transcript assembly, whole-genome alignment, gene expression quantification, read mapping, and search of large sequence databases. No knowledge of biology is assumed, and the topics covered will be of use in other fields involving large collections of strings. Programming proficiency is required.
Prerequisite: 15-251
02-715 Advanced Topics in Computational Genomics
Spring: 12 units
Research in biology and medicine is undergoing a revolution due to the availability of high-throughput technology for probing various aspects of a cell at a genome-wide scale. The next-generation sequencing technology is allowing researchers to inexpensively generate a large volume of genome sequence data. In combination with various other high-throughput techniques for epigenome, transcriptome, and proteome, we have unprecedented opportunities to answer fundamental questions in cell biology and understand the disease processes with the goal of finding treatments in medicine. The challenge in this new genomic era is to develop computational methods for integrating different data types and extracting complex patterns accurately and efficiently from a large volume of data. This course will discuss computational issues arising from high-throughput techniques recently introduced in biology, and cover very recent developments in computational genomics and population genetics, including genome structural variant discovery, association mapping, epigenome analysis, cancer genomics, and transcriptome analysis. The course material will be drawn from very recent literature. Grading will be based on weekly write-ups for ciritiques of the papers to be discussed in the class, class participation, and a final project. It assumes a basic knowledge of machine learning and computational genomics.
02-717 Algorithms in Nature
Fall: 12 units
Computer systems and biological processes often rely on networks of interacting entities to reach joint decisions, coordinate and respond to inputs. There are many similarities in the goals and strategies of biological and computational systems which suggest that each can learn from the other. These include the distributed nature of the networks (in biology molecules, cells, or organisms often operate without central control), the ability to successfully handle failures and attacks on a subset of the nodes, modularity and the ability to reuse certain components or sub-networks in multiple applications and the use of stochasticity in biology and randomized algorithms in computer science. In this course we will start by discussing classic biologically motivated algorithms including neural networks (inspired by the brain), genetic algorithms (sequence evolution), non-negative matrix factorization (signal processing in the brain), and search optimization (ant colony formation). We will then continue to discuss more recent bi-directional studies that have relied on biological processes to solve routing and synchronization problems, discover Maximal Independent Sets (MIS), and design robust and fault tolerant networks. In the second part of the class students will read and present new research in this area. Students will also work in groups on a final project in which they develop and test a new biologically inspired algorithm. See also: www.algorithmsinnature.org no prior biological knowledge required.
02-718 Computational Medicine
Fall: 12 units
Modern medical research increasingly relies on the analysis of large patient datasets to enhance our understanding of human diseases. This course will focus on the computational problems that arise from studies of human diseases and the translation of research to the bedside to improve human health. The topics to be covered include computational strategies for advancing personalized medicine, pharmacogenomics for predicting individual drug responses, metagenomics for learning the role of the microbiome in human health, mining electronic medical records to identify disease phenotypes, and case studies in complex human diseases such as cancer and asthma. We will discuss how machine learning methodologies such as regression, classification, clustering, semi-supervised learning, probabilistic modeling, and time-series modeling are being used to analyze a variety of datasets collected by clinicians. Class sessions will consist of lectures, discussions of papers from the literature, and guest presentations by clinicians and other domain experts. Grading will be based on homework assignments and a project. 02-250 is a suggested pre-requisite.
Prerequisites: 10-315 or (10-701 and 10-601 and 10-401)

Course Website: https://sites.google.com/view/computationalmedicine/
02-719 Genomics and Epigenetics of the Brain
Fall: 12 units
This course will provide an introduction to genomics, epigenetics, and their application to problems in neuroscience. The rapid advances in single cell sequencing and other genomic technologies are revolutionizing how neuroscience research is conducted, providing tools to study how different cell types in the brain produce behavior and contribute to neurological disorders. Analyzing these powerful new datasets requires a foundation in molecular neuroscience as well as key computational biology techniques. In this course, we will cover the biology of epigenetics, how proteins sitting on DNA orchestrate the regulation of genes. In parallel, programming assignments and a project focusing on the analysis of a primary genomic dataset will teach principles of computational biology and their applications to neuroscience. The course material will also serve to demonstrate important concepts in neuroscience, including the diversity of neural cell types, neural plasticity, the role that epigenetics plays in behavior, and how the brain is influenced by neurological and psychiatric disorders. Although the course focuses on neuroscience, the material is accessible and applicable to a wide range of topics in biology.
Prerequisites: (03-121 or 03-151) and 03-220 and (15-110 or 15-121 or 02-201)
02-721 Algorithms for Computational Structural Biology
Intermittent: 12 units
Some of the most interesting and difficult challenges in computational biology and bioinformatics arise from the determination, manipulation, or exploitation of molecular structures. This course will survey these challenges and present a variety of computational methods for addressing them. Topics will include: molecular dynamics simulations, computer-aided drug design, and computer-aided protein design. The course is appropriate for both students with backgrounds in computer science and those in the life sciences.
02-725 Computational Methods for Proteogenomics and Metabolomics
Spring: 12 units
Proteomics and metabolomics are the large scale study of proteins and metabolites, respectively. In contrast to genomes, proteomes and metabolomes vary with time and the specific stress or conditions an organism is under. Applications of proteomics and metabolomics include determination of protein and metabolite functions (including in immunology and neurobiology) and discovery of biomarkers for disease. These applications require advanced computational methods to analyze experimental measurements, create models from them, and integrate with information from diverse sources. This course specifically covers computational mass spectrometry, structural proteomics, proteogenomics, metabolomics, genome mining and metagenomics.
Prerequisites: 02-250 or 02-251 or 02-604
02-730 Cell and Systems Modeling
Fall: 12 units
This course will introduce students to the theory and practice of modeling biological systems from the molecular to the organism level with an emphasis on intracellular processes. Topics covered include kinetic and equilibrium descriptions of biological processes, systematic approaches to model building and parameter estimation, analysis of biochemical circuits modeled as differential equations, modeling the effects of noise using stochastic methods, modeling spatial effects, and modeling at higher levels of abstraction or scale using logical or agent-based approaches. A range of biological models and applications will be considered including gene regulatory networks, cell signaling, and cell cycle regulation. Weekly lab sessions will provide students hands-on experience with methods and models presented in class. Course requirements include regular class participation, bi-weekly homework assignments, a take-home exam, and a final project. The course is designed for graduate and upper-level undergraduate students with a wide variety of backgrounds. The course is intended to be self-contained but students may need to do some additional work to gain fluency in core concepts. Students should have a basic knowledge of calculus, differential equations, and chemistry as well as some previous exposure to molecular biology and biochemistry. Experience with programming and numerical computation is useful but not mandatory. Laboratory exercises will use MATLAB as the primary modeling and computational tool augmented by additional software as needed. *THIS COURSE WILL BE AT PITT
Prerequisites: (33-121 or 03-121 or 03-151) and (03-231 or 03-232) and 21-112 and 09-105
Course Website: https://sites.google.com/site/cellandsystemsmodeling/
02-731 Modeling Evolution
Spring: 12 units
Some of the most serious public health problems we face today, from drug-resistant bacteria, to cancer, all arise from a fundamental property of living systems and #8212; their ability to evolve. Since Darwin's theory of natural selection was first proposed, we have begun to understand how heritable differences in reproductive success drive the adaptation of living systems. This makes it intuitive and tempting to view evolution from an optimization perspective. However, genetic drift, phenotypic trade-offs, constraints, and changing environments, are among the many factors that may limit the optimizing force of natural selection. This tug-of-war between selection and drift, between the forces that produce variation in a population, and the forces suppressing this variation, make evolutionary processes much more complex to model and understand than previously thought. The aim of this class is to provide an introduction into the theoretical formalism necessary to understand how biological systems are shaped by the forces and constraints driving evolutionary dynamics.
Prerequisites: 15-112 and 21-241 and (36-225 or 15-259 or 36-218 or 21-325)
02-740 Bioimage Informatics
Intermittent: 12 units
With the rapid advance of bioimaging techniques and fast accumulation of bioimage data, computational bioimage analysis and modeling are playing an increasingly important role in understanding of complex biological systems. The goals of this course are to provide students with the ability to understand a broad set of practical and cutting-edge computational techniques to extract knowledge from bioimages.
02-750 Automation of Scientific Research
Spring: 12 units
Automated scientific instruments are used widely in research and engineering. Robots dramatically increase the reproducibility of scientific experiments, and are often cheaper and faster than humans, but are most often used to execute brute-force sweeps over experimental conditions. The result is that many experiments are "wasted" on conditions where the effect could have been predicted. Thus, there is a need for computational techniques capable of selecting the most informative experiments. This course will introduce students to techniques from Artificial Intelligence and Machine Learning for automatically selecting experiments to accelerate the pace of discovery and to reduce the overall cost of research. Real-world applications from Biology, Bioengineering, and Medicine will be studied. Grading will be based on homeworks and two exams. The course is intended to be self-contained, but students should have a basic knowledge of biology, programming, statistics, and machine learning.
Prerequisites: 10-601 or 10-701 or 02-620
02-760 Laboratory Methods for Computational Biologists
Fall and Spring: 9 units
Computational biologists frequently focus on analyzing and modeling large amounts of biological data, often from high-throughput assays or diverse sources. It is therefore critical that students training in computational biology be familiar with the paradigms and methods of experimentation and measurement that lead to the production of these data. This one-semester laboratory course gives students a deeper appreciation of the principles and challenges of biological experimentation. Students learn a range of topics, including experimental design, structural biology, next generation sequencing, genomics, proteomics, bioimaging, and high-content screening. Class sessions are primarily devoted to designing and performing experiments in the lab using the above techniques. Students are required to keep a detailed laboratory notebook of their experiments and summarize their resulting data in written abstracts and oral presentations given in class-hosted lab meetings. With an emphasis on the basics of experimentation and broad views of multiple cutting-edge and high-throughput techniques, this course is appropriate for students who have never taken a traditional undergraduate biology lab course, as well as those who have and are looking for introductory training in more advanced approaches. Grading: Letter grade based on class participation, laboratory notebooks, experimental design assignments, and written and oral presentations. 02-250 is a suggested pre-requisite.
02-761 Laboratory Methods for Automated Biology I
Fall: 12 units
In order to rapidly generate reproducible experimental data, many modern biology labs leverage some form of laboratory automation to execute experiments. In the not so distant future, the use of laboratory automation will continue to increase in the biological lab to the point where many labs will be fully automated. Therefore, it is critical for automation scientists to be familiar with the principles, experimental paradigms, and techniques for automating biological experimentation with an eye toward the fully automated laboratory. In this laboratory course, students will learn about various automatable experimental methods, design of experiments, hardware for preparing samples and executing automated experiments, and software for controlling that hardware. These topics will be taught in lectures as well as through laboratory experience using multi-purpose laboratory robotics. During weekly laboratory time, students will complete and integrate parts of two larger projects. The first project will be focused on liquid handling, plate control, plate reading, and remote control of the automated system based on experimental data. The second project will be focused on the design, implementation, and analysis of a high content screening campaign using fluorescence microscopy, image analysis, and tissue culture methods.
02-762 Laboratory Methods for Automated Biology II
Spring: 12 units
This laboratory course provides a continuation and extension of experiences in 02-761. Instruction will consist of lectures and laboratory experience using multi-purpose laboratory robotics. During weekly laboratory time, students will complete and integrate parts of two larger projects. The first project will be focused on the execution of a molecular biology experiment requiring nucleic acid extraction, library preparation for sequencing, and quality control. The second project will be focused on the implementation and execution of automated methods using active learning techniques to direct the learning of a predictive model for a large experimental space (such as learning the effects of many possible drugs on many possible targets). Grading will be based on lab and project completion and quality.
Prerequisite: 02-761
02-764 Automated Science Capstone II
Spring: 12 units
This course consists of small group projects on development, implementation and/or execution of automated science campaigns in collaboration with industry and/or academic partners. This course may only be taken as part of a continuous sequence with 02-763. Enrollment is only open to M.S. in Automated Science students.
Back to top