Module Guide: Transcription Factor Binding Site Motifs

This module is intended to teach basic methods of modeling the DNA sequence specificity of transcription factors – proteins that bind to DNA and modulate the transcription rates of genes. This section will build on the probabilistic methods we covered and provide a specific application of the EM algorithm for parameter estimation.

The abilities you should come away from this module with are:

  1. To implement an algorithm for identifying short sequence patterns that are overrepresented within a set of longer sequences.
  2. To explain the probabilistic model and assumptions behind the algorithm, its limitations, and some alternative approaches.
  3. To explain the relationship between the probabilistic model and a simple biochemical model.
  4. To read and understand published papers on computational methods for identifying TF binding site motifs.

We do not have lecture notes for this section. We will rely instead on review articles and primary literature. There are papers to read before class and we will have short quizzes to motivate you to read the papers and get you thinking about them before the discussion. There are also programming exercises in which you will extend probabilistic modeling and expectation-maximization to the problem of learning motifs for short sequences that are over-represented in the promoters of co-regulated genes.

Useful brackground material. I highly recommend Gary Stormo's short monograph Introduction to Protein-DNA Interactions, published by Cold Spring Harbor Laboratory Press.

Day 0

In Class

Before the next class

Day 1

In Class

Before the next class

Read and think about: To turn in:

Day 2

In Class

Before the next class

Read and think about: To turn in:

Day 3

In Class

Before the next class

Day 4

In Class

Before the next class

Day 5

In Class

Before the next class