Probability Module Guide
This module is intended to teach the basics of probabilistic thinking and estimation of
probabilities. We do not assume any prior exposure to probability theory, but
the review will be useful for those who do know something about it.
The abilities you should come away from this module with are:
 To create and reason about discrete probability models.
 To manipulate discrete probability expressions and to demonstrate the validity of the rules for
manipulating them.
 To estimate parameters of probability distributions using
maximum likelihood (ML) or Bayesian estimators and to explain
the difference between them.
 To define conjugate priors and explain their role in Bayesian
parameter estimation.
 To use the expectation maximization (EM) and Gibbs sampling
frameworks to estimate probabilities in the presence of hidden
variables and to explain the difference between them.
Before each class, you should study the assigned sections of the
printed lecture notes and read the corresponding sections of
Probability and Statistics by Morris de Groot. The lecture notes are not
meant as a substitute for a textbook, but rather to highlight some of
the most important and relevant material from the textbook.
There are exercises to do at home in advance of each class. These will
not be collected. Instead, we will have similar exercises for you to
do in class and turn in for a grade. The process of doing the
exercises in class is intended to be a learning experience, but there
won't be enough time for you to get much out of it unless you have
already studied the text and struggled with the homework problems.
Day 0
In Class
 This is the last day of the previous module.
Before the next class
Read and think about:
Day 1
In Class

Introduction to probability theory and Bayesian philosophy

Permutations and combinations

Set overlap problem
Before the next class
Read and think about:
To turn in:

Do
these exercises , referring to the course notes or text as
needed.

Download the Mathematica notebook CombinatoricsInMathematica.nb
from the same directory as this file. Read it, evalute all input
cells, do any practices or exercises, and turn it in.
Optional:

Extra help:The Khan Academy material in the sections
"Permutations and Combinations" and "Probability Using Combinatorics"
are relevant and may be useful for gaining mastery of these
concepts. To find these sections, click on "Subjects" near the top of
the page and select "Probability and Statistics". Near the top right
of the page, click on "View full list of Probability and statistics
content". From there click on "Probability and Combinatorics". There
you should a list of topics that includes the two I recommended.

Optional extra depth: The YouTube series "Probability Primer" can be
found at https://www.youtube.com/playlist?list=PL17567A1A3F5DB5E4. The
first chapter on measure theory is for those who want a little more
mathematical depth. Everyone else can skip to 1.S Measure Theory:
Summary. The first thing he does is to define a "sigma
algebra". Don't let this throw you. You can either ignore it and wait
for him to talk about probabilities or you can think of the collection
of all open intervals (a, b) on the real number line whenever you hear
the term "sigma algebra".
Day 2
In Class

In class exercises, similar to the previous homework

Conditional probability.

Independence

Discrete Random variables

Bernoulli and binomial distributions

Introduce the hypergeometric distribution and its role in bioinformatics
Before the next class
Read and think about

The probability course notes, through Section 8, "The Four Rules".

Sections 2.3, 3.1, and 3.2 of
de Groot.
To turn in:

Do these
exercises, referring to the course notes or text as needed.

Download the Mathematica notebook ProbabilityInMathematica.nb
from the same directory as this file. Read it, evalute all input
cells, do any practices or exercises, and turn it in.
Optional:

Extra help:The Khan Academy material in the sections
"Compound, Independent Events" and "Dependent Probability"
are relevant and may be useful for gaining mastery of these concepts.

Optional online lecture: In YouTube series "Probability
Primer",
lectures 2.12.3 are highly relevant. At this point he stops
talking about sigma algebras and uses terminology that matches what we
are using and what the de Groot book uses. Lecture 3.1 may also be
useful in understanding Section 7 of my course notes although lecture
3.1 goes into more depth than the course notes.
Day 3
In Class

In class exercises, similar to the previous homework.

Continuous random variables

Joint probability distributions

Four rules for manipulating probability expression
Before the next class
Read and think about

The probability course notes through Section 9.3

Sections of de Groot.

Do these exercises, referring to the course notes or text as needed.
To turn in:

Download file diceSampleAssignment.zip from the
directory this file is in. Import the zip archive as a project in
Wolfram Workbench. Open the notebook diceSample.nb and follow the
instructions. Turn in before next class.
Optional:

This document
from someone else's course notes
illustrates Bayesian inference using a binomial distribution with a
beta prior distribution. There is some overlap with my course notes
but also additional depth on the posterior predictive distribution and
topics related to priors, including methods of setting prior
distribution parameters from empirical data.

In YouTube series "Probability
Primer",
lectures 2.42.5 are highly relevant. Note: What he calls the
"Partition Rule" is the same as what I call "Summing Out".
Day 4
In Class

In class exercises, similar to the previous homework.

Parameter esimation: Maximum likelihood

Parameter esimation: Maximum a postiori

Conjugate priors

The beta distribution
Before the next class
Read and think about

Finish reading the probability course notes section "Parameter estimation".

Read Sections 5.8, 7.1, and 7.2
of
de Groot.
To turn in:
Optional:

The Khan Academy material in sections "Expected
value," "Expected value with empirical Probabilities," and "Expected
value with calculated probabilities" are relevant. Some of the
questions are not very clear, but even if that leads you to a wrong
answer looking at a few of the hints should make it clear what they
mean.
Day 5
In Class

In class exercises, similar to the previous homework

Catch up if behind on topics from Days 14

Workshop: Problems and issues with dicePosterior assigment
Before the next class
Read and think about

The probability course notes sections "Expectation" and "Expectation Maximization".

Sections 4.1 and 4.2 of de Groot.
To turn in:

Reminder: Complete dicePosteriorAssignment.zip and turn it in before class on Day 6.
Day 6
In Class

Discussion/Presentation: Expectation Maximization algorithm

If there's time, a solution for calculating dicePosterior
Before the next class
Turn in before class on Day 8.
Day 7
In Class

Workshop: Problems and issues with diceEM.
Before the next class

Reminder: Complete diceEMAssignment.zip and turn it in before
class on Day 8.
Day 8
In Class

The relationships among normalization, integration, optimization, and sampling.

Introduction to the position weight matrix model of the DNA binding
affinity of transcription factors.

Introduction to the motif inference problem and EM approach.
Before the next class

See Day 0 of next module guide.

Optional: If you turned in a diceEMAssignment.zip on time, you may
optionally turn in a revised version before the next class.
Revisions turned in late will not be accepted.