subject

Recall from lecture the value iteration update rule: Vk+1(s) = max a[Σs'T(s, a, s') (R(s, a, s') + lamdaVk*(s'))]
where Vk(s) is the expected reward from state s after acting optimally for k steps.
Recall the example discussed in the lecture.
Agent's starting state +1
An agent is trying to navigate a one-dimensional grid consisting of 5 cells. At each step, the agent has only one action to choose from, i. e.it moves to the cell on the immediate right.
Note: The reward function is defined to be R (s, a, s') = R(S), Rs = 5) = 1 and R(S) = 0 otherwise. Note that we get the reward when we are leaving from the current state. When it reaches the rightmost cell, it stays for one more time step and then receives a reward of +1 and comes to a halt.
Let V* (i) denote the value function of state i, the įth cell starting from left.
Let Vk* (i) denote the value function estimate at state i at the kth step of the value iteration algorithm. Let V0* (i) denote the initialization of this estimate.
Use the discount factory lamda = 0.5.
We will write the functions Vk* as arrays below, i. e. as [Vk* (1) Vk* (2) Vk* (3) Vk* (4) Vk* (5)].
Initialize by setting V0* (i) = 0 for all i:
V0* = [0 0 0 0 0)
V0* = [0 0 0 0 0]
Then, using the value iteration update rule, we get
V1* = {0 0 0 0 1]
V2* = [0 0 0 0.5 1]
Note that as soon as the agent takes the first action to reach cell 5, it stays for one more step and halts and does not take any more action, so we set (5) = V (5) for all k > 1.
Another Example of Value Iteration (Software Implementation)
Consider the same one-dimensional grid with reward values as in the first few problems in this vertical. However, consider the following change to the transition probabilities: At any given grid location the agent can choose to either stay at the location or move to an adjacent grid location. If the agent chooses to stay at the location, such an action is successful with probability and
. if the agent is at the leftmost or rightmost grid location it ends up at its neighboring grid location with probability 1/2,
. if the agent is at any of the inner grid locations it has a probability 1/4 each of ending up at either of the neighboring locations.
If the agent chooses to move (either left or right) at any of the inner grid locations, such an action is successful with probability 1/3 and with probability 2/3 it fails to move, and
. if the agent chooses to move left at the leftmost grid location, then the action ends up exactly the same as choosing to stay, i. e., staying at the leftmost grid location with probability1/2 and ends up at its neighboring grid location with probability 1/2,
. if the agent chooses to move right at the rightmost grid location, then the action ends up exactly the same as choosing to stay, i. e., staying at the rightmost grid location with probability 1/2, and ends up at its neighboring grid location with probability 1/2.
Let lamda = 0.5.
Run the value iteration algorithm for 100 iterations. Use any computational software of your choice.
Enter the value of Vigo as an array (V100 (1) Við0 (2) V100 (3) V180 (4) Vibo (5)].
(For example, type [0,2,0,3,4] for the array [O 2 0 3 4]. Type at least 4 decimal digits.)

ansver
Answers: 2

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 14:30
The “rule of 72” is used to approximate the time required for prices to double due to inflation. if the inflation rate is r%, then the rule of 72 estimates that prices will double in 72/r years. for instance, at an inflation rate of 6%, prices double in about 72/6 or 12 years. write a program to test the accuracy of this rule. for each interest rate from 1% to 20%, the program should display the rounded value of 72/r and the actual number of years required for prices to double at an r% inflation rate. (assume prices increase at the end of each year.)
Answers: 1
question
Computers and Technology, 22.06.2019 17:00
1. so if i wanted to build a linux server for web services(apache) with 1cpu and 2 gb of memory.-operating at 75% of memory capacity2. a windows server with 2 cpu/ 4gb memory- operating at 85% of memory capacity3. a storage server with 1 cpu/ 2gb memory- operating at 85% of memory capacityhow much memory do i have to add for each server.so that the utilization rate for both cpu and memory is at a baseline of 60%."the details for the cpu like its processor or the memory's speed isnt to be concerned" yeah i kept asking my teacher if he's even sure about the but the whole class seems to be confused and the project is due in 3 days..this is a virtualization project where i have to virtualize a typical server into an exsi hypervisor.
Answers: 2
question
Computers and Technology, 22.06.2019 18:30
Which of the following commands is more recommended while creating a bot?
Answers: 1
question
Computers and Technology, 23.06.2019 18:00
Freya realizes she does not have enough in her bank account to use the debit card. she decides to use a credit card instead. which questions should freya answer before using a credit card? check all that apply. can i pay at least the minimum payment each month? can i make payments on time and avoid late fees? will i have to take out a loan? how much in finance charges can i afford to pay? should i talk to a consumer credit counseling service?
Answers: 1
You know the right answer?
Recall from lecture the value iteration update rule: Vk+1(s) = max a[Σs'T(s, a, s') (R(s, a, s') +...
Questions
question
Mathematics, 01.12.2020 02:40
question
Physics, 01.12.2020 02:40
question
Social Studies, 01.12.2020 02:40
question
Mathematics, 01.12.2020 02:40
Questions on the website: 13722363