Why pipelining cannot operate at its maximum theoretical speed?
First of all, what is the maximum theoretical speed/speed up?
Can anyone explain why pipelining cannot operate at its maximum theoretical speed?
pipeline cpu-architecture
add a comment |
First of all, what is the maximum theoretical speed/speed up?
Can anyone explain why pipelining cannot operate at its maximum theoretical speed?
pipeline cpu-architecture
1
hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out.
– Peter Cordes
Dec 31 '18 at 21:38
The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.
– Hadi Brais
Jan 1 at 1:00
add a comment |
First of all, what is the maximum theoretical speed/speed up?
Can anyone explain why pipelining cannot operate at its maximum theoretical speed?
pipeline cpu-architecture
First of all, what is the maximum theoretical speed/speed up?
Can anyone explain why pipelining cannot operate at its maximum theoretical speed?
pipeline cpu-architecture
pipeline cpu-architecture
edited Jan 1 at 1:01
Hadi Brais
9,82412039
9,82412039
asked Dec 31 '18 at 12:43
Alina malikAlina malik
134
134
1
hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out.
– Peter Cordes
Dec 31 '18 at 21:38
The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.
– Hadi Brais
Jan 1 at 1:00
add a comment |
1
hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out.
– Peter Cordes
Dec 31 '18 at 21:38
The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.
– Hadi Brais
Jan 1 at 1:00
1
1
hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out.
– Peter Cordes
Dec 31 '18 at 21:38
hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out.
– Peter Cordes
Dec 31 '18 at 21:38
The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.
– Hadi Brais
Jan 1 at 1:00
The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.
– Hadi Brais
Jan 1 at 1:00
add a comment |
1 Answer
1
active
oldest
votes
The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.
The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.
Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).
As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.
Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).
Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53987671%2fwhy-pipelining-cannot-operate-at-its-maximum-theoretical-speed%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.
The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.
Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).
As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.
Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).
Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.
add a comment |
The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.
The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.
Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).
As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.
Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).
Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.
add a comment |
The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.
The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.
Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).
As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.
Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).
Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.
The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.
The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.
Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).
As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.
Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).
Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.
answered Jan 1 at 1:40
Paul A. ClaytonPaul A. Clayton
3,32011022
3,32011022
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53987671%2fwhy-pipelining-cannot-operate-at-its-maximum-theoretical-speed%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out.
– Peter Cordes
Dec 31 '18 at 21:38
The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.
– Hadi Brais
Jan 1 at 1:00