Module Guide: Transcription Factor Binding Site Motifs
This module is intended to teach basic methods of modeling the DNA
sequence specificity of transcription factors – proteins that bind to
DNA and modulate the transcription rates of genes. This section will
build on the probabilistic methods we covered and provide a specific
application of the EM algorithm for parameter estimation.
The abilities you should come away from this module with are:
 To implement an algorithm for identifying short sequence
patterns that are overrepresented within a set of longer
sequences.

To explain the probabilistic model and assumptions behind the
algorithm, its limitations, and some alternative approaches.

To explain the relationship between the probabilistic model and a
simple biochemical model.

To read and understand published papers on computational methods for
identifying TF binding site motifs.
We do not have lecture notes for this section. We will rely instead on
review articles and primary literature. There are papers to read
before class and we will have short quizzes to motivate you to read
the papers and get you thinking about them before the
discussion. There are also programming exercises in which you will
extend probabilistic modeling and expectationmaximization to the
problem of learning motifs for short sequences that are
overrepresented in the promoters of coregulated genes.
Useful brackground material. I highly recommend Gary Stormo's
short monograph Introduction to ProteinDNA Interactions,
published by Cold Spring Harbor Laboratory Press.
Day 0
In Class

This is the last day of the previous module. See the module guide for
this day’s inclass plan.

Introduction to the motif inference problem and general EM approach.
Before the next class

Read and think about the
MEME paper by Bailey and Elkan and be prepared to discuss
equations (1)(20). You may skip Section 3.2 which contains equations
(21)(44), but please read Section 4.1.
Day 1
In Class
Before the next class
Read and think about:

Reread the MEME paper by
Bailey and Elkan in light of class discussion. Make sure you
understand all the mathematical details through Equation (20) as you
will be implementing this. Also read Sections 4 and 5.
To turn in:
Day 2
In Class

Quick quiz on some details of Bailey and Elkin.

If needed, finish discussion of Bailey and Elkan equations (1)(20)

Posterior normalization and erasers hacks

Introduction to Mutual information
Before the next class
Read and think about:
To turn in:
Day 3
In Class

Quick quiz on assigned reading

Discussion of the FIRE paper and the associated supplementary methods
Before the next class

To turn in before class on Day 5:
Complete siteEMDist1.1.zip with your assigned lab partner and
Day 4
In Class
 Quick introduction to Liu paper on Gibbs sampling for motifs

Workshop on motif EM programming assignment. If you have already
completed and turned in the motif EM assignment you do not need to
attend.
Before the next class
Day 5
In Class

Gibbs sampling in motif discovery and its relationship to EM.
Before the next class

Complete any preparation described in the module guide for next module (Day 0).