Due Friday, January 15 at 11:59 PM via e-mail to BOTH jiangxu2011 at u.northwestern.edu and ddowney at eecs.northwestern.edu. Use EECS 395/495 Homework 1 as the e-mail subject line. PDF format preferred, though Plaintext, Word, and HTML are also acceptable.

1. (1 point) Exercise 2.1, part a. That is, prove that P(empty event) = 0, using the basic axioms of probability.

2. (2 points) Let's say the Chicago Bulls are playing a game tonight, but a star player Derrick Rose has a mild ankle sprain and may or may not play. Say Rose has a 50 percent chance of being able to play, the Bulls have a 50 percent chance of winning overall, and the probability that Rose plays *and* the Bulls win is 40%. Write out the conditional probability table P(Bulls Win | Rose Plays).

3. (1 point) Prove the Contraction property:
(X _|_ W | Z, Y) and (X _|_ Y | Z) => (X _|_ Y, W | Z)

Hint: start with P(X, Y, W | Z) = P(X, Y | W, Z)P(W | Z)


4. 

a. (1 point) Before reading the rest of this exercise: quickly type out a
random sequence of 50 coin flips, using your own brain as a random
generator (that is, don't use a coin, computer or other
device, just mentally pick a sequence of Hs and Ts, trying to be as random 
as possible).

b. (3 points) Now, write a computer program to generate a sequence of random coin flips,
where both heads and tails are equally probable. Include your code and
a output sequence of 300 flips as a single string (see part (c) below).

c. (1 point) Consider the following string of 300 coin flips:
TTHTHTHHTTHTTHTHTHHHTHTHTHTHTTTHTTTTHTTHHHTTTHTHTTTHTHHHTHTTHHHHHHHTHTTTTH

HHHTHHTHTHHTTHHTTHTTTTHTTHTHHTHHHTHHTHHTHTTTHTHHTHTHTTHTHHTHHTTHTHTHTTHHH

THTHTHTHTHTHTHHHTHHTTTHTHHHTHHHTTTHHTTHTTHTHTTHTHHTHTHHHHTTHHTHTTHTHHTHHT

HTHTTHTHHTTTHTTHTTHTHHTTTHTHHTTHHHTHHTHTTHHTTTTHTHHHTHTTHTTTTTHTHHTTHHTHTH

HTHTHT

Before you move on to part (d), in one or two sentences describe important differences you see, if any, between your sequence of flips in part (b) and the one above.

d. (1 point) Your sequence will surely have multiple different "runs" of the same letter, of varying length. For example, the sequence "HHHTHTT" has one run of length 3, one of length 2, and two of length one. Count in your sequence the number of runs of length i for i={1, 2, 3, 4, 5 or more}. For example, the answer for the sequence in part (c) is [115, 47, 18, 6, 2]. What important differences exist between the counts for your sequence and those for the sequence in part (c)?

e. (4 points) Assume the sequence in part (c) was generated by the following process: first, a fair coin generates the first flip. Thereafter, each flip i is the same as the previous flip with probability gamma, and is otherwise the opposite. NOTE: the gamma parameter has a very different role in generating a sequence of flips than does the theta parameter from the lecture notes, so be aware.
(i) Is there a setting of gamma that corresponds to using a fair coin for the entire sequence?
(ii) What is the maximum likelihood estimate of gamma for the sequence in part (c)?
(iii) Assume the gamma parameter was chosen for the sequence in part (c) to be 0.6 with probability 0.01, or to be 0.4 with probability 0.99. What's the MAP estimate of gamma given the sequence in part (c) and this prior knowledge?
(iv) Assume the gamma parameter is Beta(3,3) distributed. What is the MAP estimate for gamma given the sequence in part (c) and this prior knowledge?
(v) Now the fun part. One might hypothesize that a person's mental randomness is less like a fair coin than it is like the process described above, with a gamma value slightly lower than 0.5. What does this hypothesis mean, in a single sentence of plain english?
(vi) To test the above hypothesis, compute the likelihood of your sequence from part (a) under the process with gamma=0.45. Is this higher or lower than that of a fair coin (i.e. (1/2)^50)? Do your results support or contradict the hypothesis?

5. Project Exercise

a. (1 point) Pick a task you'd like to perform machine learning on -- it can be any task of your choice (some possibilities will be discussed in class). You may also partner with an arbitrary number of other students, although each student will write up solutions for the homework exercises independently. In about one paragraph, describe the task you've chosen and why you think it's interesting. Also, list anyone you have partnered with for the project. (In future homeworks, you'll construct and analyze a probabilistic graphical model for the task.)

b. (1 point) What do you think are the most important features (i.e. input variables) for your task, and why?

c. (4 points) Compile an initial dataset for your task (working together with your group, if applicable). Describe the nature of your data set, the kinds of features it includes, its total size (in terms of MB, instances, and features), and how you acquired it.