Why pipelining cannot operate at its maximum theoretical speed?












0















First of all, what is the maximum theoretical speed/speed up?



Can anyone explain why pipelining cannot operate at its maximum theoretical speed?










share|improve this question




















  • 1





    hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out.

    – Peter Cordes
    Dec 31 '18 at 21:38











  • The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.

    – Hadi Brais
    Jan 1 at 1:00
















0















First of all, what is the maximum theoretical speed/speed up?



Can anyone explain why pipelining cannot operate at its maximum theoretical speed?










share|improve this question




















  • 1





    hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out.

    – Peter Cordes
    Dec 31 '18 at 21:38











  • The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.

    – Hadi Brais
    Jan 1 at 1:00














0












0








0








First of all, what is the maximum theoretical speed/speed up?



Can anyone explain why pipelining cannot operate at its maximum theoretical speed?










share|improve this question
















First of all, what is the maximum theoretical speed/speed up?



Can anyone explain why pipelining cannot operate at its maximum theoretical speed?







pipeline cpu-architecture






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 1 at 1:01









Hadi Brais

9,82412039




9,82412039










asked Dec 31 '18 at 12:43









Alina malikAlina malik

134




134








  • 1





    hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out.

    – Peter Cordes
    Dec 31 '18 at 21:38











  • The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.

    – Hadi Brais
    Jan 1 at 1:00














  • 1





    hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out.

    – Peter Cordes
    Dec 31 '18 at 21:38











  • The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.

    – Hadi Brais
    Jan 1 at 1:00








1




1





hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out.

– Peter Cordes
Dec 31 '18 at 21:38





hazards like data dependencies through instructions with latency > 1, like a load, is one major reason. en.wikipedia.org/wiki/Hazard_(computer_architecture) Especially if a load misses in cache. Control dependencies (branches) also create bubbles in the front-end. A superscalar pipeline (more than one instruction per cycle) also needs to find instruction-level parallelism to max out.

– Peter Cordes
Dec 31 '18 at 21:38













The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.

– Hadi Brais
Jan 1 at 1:00





The maximum theoretical speed of a scalar pipelined processor is 1 instruction per cycle (IPC), where a cycle is the latency of a pipe stage. This assumes that the memory subsystem can deliver at least one instruction per cycle. The hazards mentioned in Peter's comment cause the performance of the processor to degrade below 1 IPC. In addition to the hazards, an instruction that loads a value from memory or requires multiple cycles to execute (in the EX stage) may take a number of cycles to execute that is larger than the number of pipe stages.

– Hadi Brais
Jan 1 at 1:00












1 Answer
1






active

oldest

votes


















2














The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.



The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.



Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).



As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.



Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).



Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53987671%2fwhy-pipelining-cannot-operate-at-its-maximum-theoretical-speed%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.



    The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.



    Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).



    As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.



    Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).



    Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.






    share|improve this answer




























      2














      The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.



      The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.



      Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).



      As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.



      Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).



      Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.






      share|improve this answer


























        2












        2








        2







        The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.



        The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.



        Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).



        As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.



        Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).



        Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.






        share|improve this answer













        The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.



        The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.



        Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).



        As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.



        Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).



        Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 1 at 1:40









        Paul A. ClaytonPaul A. Clayton

        3,32011022




        3,32011022
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53987671%2fwhy-pipelining-cannot-operate-at-its-maximum-theoretical-speed%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas