Homework 4

Due Friday February 14 at 5:00pm

Exercise 1

Let \(\theta\) be the rate of mutation for a certain cluster of cancer cells. Biologists encode their prior uncertainty about \(\theta\) with the following density:

\[ p(\theta) = \frac{4}{10}e^{-4\theta} + \frac{9}{10 \Gamma(8)} \theta^7 e^{-\theta} \] a. Make a plot of this prior density and explain what it means, in context, to the biologists.

Let \(Y_i\) be the number of mutations produced in the \(i\)th cell, such that

\[ Y_i | \theta \sim \text{iid }Poisson(\theta) \]

Write out the posterior distribution of \(\theta\) given \(y_1,\ldots y_n\) (up to a proportionality constant) and simplify as much as possible. Hint: Be careful when writing your proportionality statement!

The posterior is a mixture (weighted average) of two distributions that you know. Identify these two distributions, including their parameters.
Assume you obtain mutation data from two cells, \(y_1 = 3, y_2 = 1\). Compute the posterior exactly (i.e. find the appropriate integration constant) and plot the posterior density.
In part (c) you identified the posterior as a mixture (weighted average) of two distributions. Given the data in part (d), compute the weights of each density in the mixture.

Exercise 2

A group of scientists have mutation data,

y = c(0,3,0,1,5,2,0,4,1,1)

and are interested in assessing how well the Poisson model from exercise 1 fits their data. Using the data generative model and prior from exercise 1, generate posterior predictive datasets \(y^{(1)}, y^{(2)}, \ldots y^{(S)}\), where each data set \(y^{(s)}\) is a vector of length 10 whose entries are sampled from the Poisson distribution with parameters \(\theta^{(s)}\). Each \(\theta^{(s)}\) itself is a sample from the posterior \(p(\theta | y_1, \ldots y_{10})\). For each \(s\), let \(t(s)\) be the sample average of the 10 values of \(y^{(s)}\) divided by the sample standard deviation of \(y^{(s)}\). Make a histogram of \(t(s)\) and compare to the observed value of this statistic. Based on this statistic, assess the fit of the Poisson model for these data.

Exercise 3

Chimowitz et al. (2011) https://doi.org/10.1056/NEJMoa1105335 investigate if stents are effective treatment to manage strokes in patients with atherosclerotic intracranial arterial stenosis. You can load the data using the code below. Data sourced from openintro package.

stents = readr::read_csv("https://sta602-sp25.github.io/data/stent365.csv")

Each row of the data set is an individual patient. The group column indicates whether the patient was treated with a stent or not. The outcome column reports whether the patient had a stroke or not within a year.

Write down a data generative model for this data. Hint: you might write down two data generative models, one for patients in the treatment group and one for patients in the control group.
Write down your prior beliefs about unknown parameters in your model above using a conjugate prior. Choose parameters for the priors. Explain your choices.
Report (using Monte Carlo sampling or otherwise) the posterior mean of the relative risk, i.e. the posterior mean of the probability of stroke in the treatment group versus in the control, (think \(E~[\frac{\theta_t}{\theta_c}~|~\text{data}]\)). Additionally, include a 95% posterior confidence interval for the relative risk.
Plot the posterior of the relative risk from part c. Do you believe the treatment is effective?