Slow performance when copying from HTTP source to blob sink












0















I use a copy activity to call an HTTP API and store the json response as a file in Azure blob storage. The copy activity is executed in a ForEach loop and each activity run takes 16 seconds, but when I look at the run details it says the copy duration is only 3 seconds. Then why does the activity take 16 seconds to complete? The source dataset is an Http File with an HttpServer linked service and the sink dataset is a blob storage json file. Both the source and sink datasets are configured with Binary Copy and it's a GET request to an HTTPS URL with anonymous authentication.



I would like to speed up this acticity since it is run multiple times inside the ForEach loop. Is there some way to improve the performance?










share|improve this question



























    0















    I use a copy activity to call an HTTP API and store the json response as a file in Azure blob storage. The copy activity is executed in a ForEach loop and each activity run takes 16 seconds, but when I look at the run details it says the copy duration is only 3 seconds. Then why does the activity take 16 seconds to complete? The source dataset is an Http File with an HttpServer linked service and the sink dataset is a blob storage json file. Both the source and sink datasets are configured with Binary Copy and it's a GET request to an HTTPS URL with anonymous authentication.



    I would like to speed up this acticity since it is run multiple times inside the ForEach loop. Is there some way to improve the performance?










    share|improve this question

























      0












      0








      0








      I use a copy activity to call an HTTP API and store the json response as a file in Azure blob storage. The copy activity is executed in a ForEach loop and each activity run takes 16 seconds, but when I look at the run details it says the copy duration is only 3 seconds. Then why does the activity take 16 seconds to complete? The source dataset is an Http File with an HttpServer linked service and the sink dataset is a blob storage json file. Both the source and sink datasets are configured with Binary Copy and it's a GET request to an HTTPS URL with anonymous authentication.



      I would like to speed up this acticity since it is run multiple times inside the ForEach loop. Is there some way to improve the performance?










      share|improve this question














      I use a copy activity to call an HTTP API and store the json response as a file in Azure blob storage. The copy activity is executed in a ForEach loop and each activity run takes 16 seconds, but when I look at the run details it says the copy duration is only 3 seconds. Then why does the activity take 16 seconds to complete? The source dataset is an Http File with an HttpServer linked service and the sink dataset is a blob storage json file. Both the source and sink datasets are configured with Binary Copy and it's a GET request to an HTTPS URL with anonymous authentication.



      I would like to speed up this acticity since it is run multiple times inside the ForEach loop. Is there some way to improve the performance?







      azure-data-factory azure-data-factory-2






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 2 at 12:24









      Magnus JohannessonMagnus Johannesson

      21429




      21429
























          1 Answer
          1






          active

          oldest

          votes


















          0














          There is always a few seconds of overhead when starting an activity. Also consider that the http server might be also responsible for some of those seconds you are seeing there.



          If you are using a for each loop and want to speed up the process, you can uncheck the Sequential check in the settings tab of the foreach activity.



          Hope this helped!






          share|improve this answer
























          • Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.

            – Magnus Johannesson
            Jan 2 at 13:50













          • Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.

            – Martin Esteban Zurita
            Jan 2 at 15:31











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54006355%2fslow-performance-when-copying-from-http-source-to-blob-sink%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          There is always a few seconds of overhead when starting an activity. Also consider that the http server might be also responsible for some of those seconds you are seeing there.



          If you are using a for each loop and want to speed up the process, you can uncheck the Sequential check in the settings tab of the foreach activity.



          Hope this helped!






          share|improve this answer
























          • Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.

            – Magnus Johannesson
            Jan 2 at 13:50













          • Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.

            – Martin Esteban Zurita
            Jan 2 at 15:31
















          0














          There is always a few seconds of overhead when starting an activity. Also consider that the http server might be also responsible for some of those seconds you are seeing there.



          If you are using a for each loop and want to speed up the process, you can uncheck the Sequential check in the settings tab of the foreach activity.



          Hope this helped!






          share|improve this answer
























          • Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.

            – Magnus Johannesson
            Jan 2 at 13:50













          • Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.

            – Martin Esteban Zurita
            Jan 2 at 15:31














          0












          0








          0







          There is always a few seconds of overhead when starting an activity. Also consider that the http server might be also responsible for some of those seconds you are seeing there.



          If you are using a for each loop and want to speed up the process, you can uncheck the Sequential check in the settings tab of the foreach activity.



          Hope this helped!






          share|improve this answer













          There is always a few seconds of overhead when starting an activity. Also consider that the http server might be also responsible for some of those seconds you are seeing there.



          If you are using a for each loop and want to speed up the process, you can uncheck the Sequential check in the settings tab of the foreach activity.



          Hope this helped!







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 2 at 13:36









          Martin Esteban ZuritaMartin Esteban Zurita

          878517




          878517













          • Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.

            – Magnus Johannesson
            Jan 2 at 13:50













          • Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.

            – Martin Esteban Zurita
            Jan 2 at 15:31



















          • Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.

            – Magnus Johannesson
            Jan 2 at 13:50













          • Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.

            – Martin Esteban Zurita
            Jan 2 at 15:31

















          Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.

          – Magnus Johannesson
          Jan 2 at 13:50







          Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.

          – Magnus Johannesson
          Jan 2 at 13:50















          Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.

          – Martin Esteban Zurita
          Jan 2 at 15:31





          Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.

          – Martin Esteban Zurita
          Jan 2 at 15:31




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54006355%2fslow-performance-when-copying-from-http-source-to-blob-sink%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Monofisismo

          Angular Downloading a file using contenturl with Basic Authentication

          Olmecas