subject

In Problem 5.12, we were able to reduce the CPE for the prefix-sum computation to 3.00, limited by the latency of floating-point addition on this machine. Simple loop unrolling does not improve things. Using a combination of loop unrolling and reassociation, write code for a prefix sum that achieves a CPE less than the latency of floating-point addition on your machine. Doing this requires actually increasing the number of additions performed. For example, our version with two-way unrolling requires three additions per iteration, while our version with four-way unrolling requires five. Our best implementation achieves a CPE of 1.67 on our reference machine.
Determine how the throughput and latency limits of your machine limit the minimum CPE you can achieve for the prefix-sum operation.

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 23.06.2019 16:00
An english teacher would like to divide 8 boys and 10 girls into groups, each with the same combination of boys and girls and nobody left out. what is the greatest number of groups that can be formed?
Answers: 2
question
Computers and Technology, 24.06.2019 06:30
Some peer-to-peer networks have a server and some don't. true false
Answers: 2
question
Computers and Technology, 24.06.2019 15:00
When a presentation is being planned, it is important to ensure that it covers all available information. appeals to the audience. uses multimedia tools. entertains the audience.
Answers: 1
question
Computers and Technology, 24.06.2019 23:30
True or false when a host gets an ip address from a dhcp server it is said to be configured manually
Answers: 1
You know the right answer?
In Problem 5.12, we were able to reduce the CPE for the prefix-sum computation to 3.00, limited by t...
Questions
question
Mathematics, 02.02.2021 18:00
question
Mathematics, 02.02.2021 18:00
question
English, 02.02.2021 18:00
question
Mathematics, 02.02.2021 18:00
Questions on the website: 13722366