subject
Mathematics, 21.02.2020 17:58 deonceee4671

In a coin game, you repeatedly toss a biased coin (0.4 for head, 0.6 for tail). Each head represents 3 points and tail represents 1 point. You can either Toss or Stop if the total number of points you have tossed is no more than 7. Otherwise, you must Stop. When you Stop, your utility is equal to your total points (up to 7), or 0 if you get a total of 8 points or higher. When you Toss, you receive no utility. There is no discounting (= 1).

(a) What are the states and the actions for this MDP? Which states are terminal?
(b) What is the transition function and the reward function for this MDP? Hint: The problem may be simpler to formulate using the general version of rewards: R(s, a, s')
(c) Run value iteration to find the optimal value function V* for the MDP. Show each Vk step (starting from Vo(s) = 0 for all states s). For a reasonable MDP formulation, this should converge in fewer than 10 steps. If you find it too tedious to do by hand, you may write a program to do this for you; however, there may be some benefit in seeing the calculation unfolding in front of you.
(d) Using the V* you found, determine the optimal policy for this MDP.

ansver
Answers: 3

Another question on Mathematics

question
Mathematics, 21.06.2019 16:00
Asap for seven months an ice creams vendor collects data showing the number of cones sold a day (c) and the degrees above 70 degrees f the vendor finds that there is a positive linear association between c and t that is the best modeled by this equation c=3.7t+98 which statement is true? the model predicts that the average number of cones sold in a day is 3.7 the model predicts that the average number of cones sold i a day is 98 a model predicts that for each additional degree that the high temperature is above 70 degres f the total of number of cones sold increases by 3.7 the model predicts for each additional degree that the high temperature is above 70 degrees f the total number of cones sold increasingly by 98
Answers: 3
question
Mathematics, 21.06.2019 17:00
Solve 2x+y=3 x+y=5 in substitution method
Answers: 1
question
Mathematics, 21.06.2019 19:30
Now max recorded the heights of 500 male humans. he found that the heights were normally distributed around a mean of 177 centimeters. which statements about max’s data must be true? a) the median of max’s data is 250 b) more than half of the data points max recorded were 177 centimeters. c) a data point chosen at random is as likely to be above the mean as it is to be below the mean. d) every height within three standard deviations of the mean is equally likely to be chosen if a data point is selected at random.
Answers: 1
question
Mathematics, 21.06.2019 21:30
Nine people are going to share a 128-ounces bottle of soda. how many ounces will each person get drink? choose the correct equation and answer for this situation
Answers: 1
You know the right answer?
In a coin game, you repeatedly toss a biased coin (0.4 for head, 0.6 for tail). Each head represents...
Questions
question
Mathematics, 09.07.2019 23:00
question
Biology, 09.07.2019 23:00
question
English, 09.07.2019 23:00
question
Chemistry, 09.07.2019 23:00
Questions on the website: 13722363