Slow performance when copying from HTTP source to blob sink
I use a copy activity to call an HTTP API and store the json response as a file in Azure blob storage. The copy activity is executed in a ForEach loop and each activity run takes 16 seconds, but when I look at the run details it says the copy duration is only 3 seconds. Then why does the activity take 16 seconds to complete? The source dataset is an Http File with an HttpServer linked service and the sink dataset is a blob storage json file. Both the source and sink datasets are configured with Binary Copy and it's a GET request to an HTTPS URL with anonymous authentication.
I would like to speed up this acticity since it is run multiple times inside the ForEach loop. Is there some way to improve the performance?
azure-data-factory azure-data-factory-2
add a comment |
I use a copy activity to call an HTTP API and store the json response as a file in Azure blob storage. The copy activity is executed in a ForEach loop and each activity run takes 16 seconds, but when I look at the run details it says the copy duration is only 3 seconds. Then why does the activity take 16 seconds to complete? The source dataset is an Http File with an HttpServer linked service and the sink dataset is a blob storage json file. Both the source and sink datasets are configured with Binary Copy and it's a GET request to an HTTPS URL with anonymous authentication.
I would like to speed up this acticity since it is run multiple times inside the ForEach loop. Is there some way to improve the performance?
azure-data-factory azure-data-factory-2
add a comment |
I use a copy activity to call an HTTP API and store the json response as a file in Azure blob storage. The copy activity is executed in a ForEach loop and each activity run takes 16 seconds, but when I look at the run details it says the copy duration is only 3 seconds. Then why does the activity take 16 seconds to complete? The source dataset is an Http File with an HttpServer linked service and the sink dataset is a blob storage json file. Both the source and sink datasets are configured with Binary Copy and it's a GET request to an HTTPS URL with anonymous authentication.
I would like to speed up this acticity since it is run multiple times inside the ForEach loop. Is there some way to improve the performance?
azure-data-factory azure-data-factory-2
I use a copy activity to call an HTTP API and store the json response as a file in Azure blob storage. The copy activity is executed in a ForEach loop and each activity run takes 16 seconds, but when I look at the run details it says the copy duration is only 3 seconds. Then why does the activity take 16 seconds to complete? The source dataset is an Http File with an HttpServer linked service and the sink dataset is a blob storage json file. Both the source and sink datasets are configured with Binary Copy and it's a GET request to an HTTPS URL with anonymous authentication.
I would like to speed up this acticity since it is run multiple times inside the ForEach loop. Is there some way to improve the performance?
azure-data-factory azure-data-factory-2
azure-data-factory azure-data-factory-2
asked Jan 2 at 12:24
Magnus JohannessonMagnus Johannesson
21429
21429
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
There is always a few seconds of overhead when starting an activity. Also consider that the http server might be also responsible for some of those seconds you are seeing there.
If you are using a for each loop and want to speed up the process, you can uncheck the Sequential check in the settings tab of the foreach activity.
Hope this helped!
Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.
– Magnus Johannesson
Jan 2 at 13:50
Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.
– Martin Esteban Zurita
Jan 2 at 15:31
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54006355%2fslow-performance-when-copying-from-http-source-to-blob-sink%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There is always a few seconds of overhead when starting an activity. Also consider that the http server might be also responsible for some of those seconds you are seeing there.
If you are using a for each loop and want to speed up the process, you can uncheck the Sequential check in the settings tab of the foreach activity.
Hope this helped!
Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.
– Magnus Johannesson
Jan 2 at 13:50
Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.
– Martin Esteban Zurita
Jan 2 at 15:31
add a comment |
There is always a few seconds of overhead when starting an activity. Also consider that the http server might be also responsible for some of those seconds you are seeing there.
If you are using a for each loop and want to speed up the process, you can uncheck the Sequential check in the settings tab of the foreach activity.
Hope this helped!
Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.
– Magnus Johannesson
Jan 2 at 13:50
Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.
– Martin Esteban Zurita
Jan 2 at 15:31
add a comment |
There is always a few seconds of overhead when starting an activity. Also consider that the http server might be also responsible for some of those seconds you are seeing there.
If you are using a for each loop and want to speed up the process, you can uncheck the Sequential check in the settings tab of the foreach activity.
Hope this helped!
There is always a few seconds of overhead when starting an activity. Also consider that the http server might be also responsible for some of those seconds you are seeing there.
If you are using a for each loop and want to speed up the process, you can uncheck the Sequential check in the settings tab of the foreach activity.
Hope this helped!
answered Jan 2 at 13:36
Martin Esteban ZuritaMartin Esteban Zurita
878517
878517
Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.
– Magnus Johannesson
Jan 2 at 13:50
Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.
– Martin Esteban Zurita
Jan 2 at 15:31
add a comment |
Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.
– Magnus Johannesson
Jan 2 at 13:50
Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.
– Martin Esteban Zurita
Jan 2 at 15:31
Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.
– Magnus Johannesson
Jan 2 at 13:50
Thanks Martin! I don't think the http server is the bottleneck in my case. It's a paid Google API with a limit of 50 requests per second, and when I call it from Postman, it takes approx. 100 ms. I have set the batch count for the ForEach loop to 50, so it is already running in parallel. It seems a lot that there is a 13 second overhead when starting an activity.
– Magnus Johannesson
Jan 2 at 13:50
Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.
– Martin Esteban Zurita
Jan 2 at 15:31
Oh I see, then I'm afraid you won't be able to lower that time with Data Factory. If you really need something faster, you may want to consider another technology like event hub or stream analytics, but I've never used any of them.
– Martin Esteban Zurita
Jan 2 at 15:31
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54006355%2fslow-performance-when-copying-from-http-source-to-blob-sink%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown