# 代写assignment  | STA代写 | 统计代写 | R语言 – ISTA 311 Programming assignment 2: Bayesian inference and the German

### tank problem

#### Due: Friday, March 15, 11:59 PM

You are encouraged to collaborate with other students on this assignment. If you do so, please note the name of your collaborator in the header comment of your script.

#### 1.1 Problem Statement

1.1.1 A class for Bayesian inference

In the first programming assignment, we used dictionaries to represent probability distributions, in the format

``````{outcome1: p1, outcome2: p2, ...,}
``````

so that the set of keys is the universe (sample space) and the corresponding values are the probabilities of the outcomes. In this assignment, we would like to expand this functionality to allow updating of the probabilities according to Bayes theorem.

In this assignment, youll define a class that encloses a dictionary describing this sort of probability distri- bution. A template file is on D2L to get you started. The template file has the class definition, an init method which you donotneed to modify, but you should read and understand, and signatures (but no code) for the methods listed below.

You must fill in the following methods (remember, all class methods should take the parameterselfas the first parameter, in addition to parameters specified below):

• normalize, which should take no parameters and modify the dictionary in place to ensure that the probabilities sum to 1 (feel free to reuse code from assignment 1 for this)
• sample, which should take a single integer parameter and return that many samples drawn from the probability distribution with replacement (feel free to reuse code from assignment 1 for this it is only a small extension ofsimulatefrom assignment 1)
• map, which should return theoutcomeassociated to the highest probability. If there is a tie, return the first outcome that appears in the dictionary with that probability.
• mean, which should return the mean of the random variable associated with the probability distribution. (Of course, this only makes sense if the outcomes are numerical. Youre not required to do any type checking or error handling for this, but you can if you want.)
• update, which should take a single parameter which represents an observation or piece of evidence. updateshould iterate through the dictionary and update the probabilities using Bayes theorem to incorporate the evidence. Note that the evidence does not necessarily have to be one of the outcomes from the dictionary. (Hint: iterate through the dictionary and updateP(H) toP(E|H)P(H) (i.e., likelihoodprior) for each outcome in the dictionary, then call yournormalizemethod.) In order forupdateto work correctly, it must call a likelihood function. However, since the likelihood function may be different for different situations, it is left unimplemented in the baseDistribution class. It will be implemented in subclasses.
• Do not implementlikelihood, but for the purposes of writing yourupdatemethod, assume that it takes two parameters: a hypothesis and a piece of data, in that order. The hypothesis comes from the distribution, but the data may not.

1.1.2 Simple Bayesian inference: the cookie problem

Define a subclass of theDistributionclass to solve the following version of the cookie problem.

We have two bowls of cookies, which have a mixture of chocolate and vanilla cookies. We know the proportion of vanilla cookies in each bowl, but we dont know which bowl we are drawing from. We wish to estimate which bowl we are drawing from

• replace the init method with a method that takes two parameterschoco1, choco2which represent the proportions of chocolate cookies in each bowl. Initialize the dictionary to {1: 0.5, 2: 0.5} representing a uniform prior distribution on the two bowls. Store the proportions as instance variables in your class so that yourlikelihoodmethod can use them.
• define alikelihoodmethod which takes two parameters: a hypothesis (i.e. one of the outcomes from the dictionary, representing Bowl 1 or Bowl 2, and a piece of evidence which is eitherv(vanilla) or c(chocolate).

1.1.3 More inference: the German tank problem

Now well solve a version of the German tank problem.

You are given a text file that contains a collection of serial numbers: 2-5 numbers from each 100-number monthly block. Each number is on a line by itself to make it easy to read the numbers in from the file, but note that you will have to separate the numbers by their blocks yourself. Your task is to estimate the number of tanks that was produced each month.

Implement aestimatetanksfunction, which takes the filename as a parameter and returns a list containing the estimated number of tanks produced each month.

In order to do this, you should define a subclass of the Distribution class that has an appropriatelikelihood function for modeling this problem. Then, for each 100-number block, create an instance of your class, use the numbers from the file to update probabilities, and then use the result to produce asingle numberestimating the number of tanks produced in that month.

The details of how you implement this estimation are up to you, because the test script only calls your estimatetanksfunction. As long asestimatetankstakes the filename as a parameter and returns a list of 24 numbers (intorfloat it doesnt matter) then the test script will be able to work with it. The correctness will be judged based on the accuracy of the estimates.

The test script knows the true numbers of tanks that were used to generate the serial numbers in the text file. The script will display the RMS (root mean square) error in your collection of monthly estimates. Lower is better!

This function is graded 50% on general correct functionality and 50% on the accuracy of your prediction. RMS error10 will receive 1 accuracy points; RMS error20 will receive 0 accuracy points; scores

interpolate linearly between these endpoints, rounded up to a whole number. For reference, the simplest model for inference I implemented acheived RMS error of about 18.9, while a simple change improved it to about 13.7.

In order to improve your accuracy, you can try experimenting with different priors or different ways to extract an estimate from the posterior distribution. However, priors must be generic, which is to say they should not be hand-tuned to the values in the test script.