Event hub handling faults
For event hub if we face a fault and the consumer crashes, then next time when it comes up how does it get to query what checkpoint it was on for the partition it gets hold of from the storage so that it can compare the reference sequence id of that message and incoming messages and process only the ones that come after that sequence id?
To save the checkpoint there is an API, but how to retrieve it?
azure azure-eventhub
add a comment |
For event hub if we face a fault and the consumer crashes, then next time when it comes up how does it get to query what checkpoint it was on for the partition it gets hold of from the storage so that it can compare the reference sequence id of that message and incoming messages and process only the ones that come after that sequence id?
To save the checkpoint there is an API, but how to retrieve it?
azure azure-eventhub
It depends on the consumer. What is your consumer?
– Peter Bons
Dec 21 '18 at 22:56
@Peter Bons I am using something based on EventProcessorHost.
– tariq zafar
Dec 22 '18 at 0:28
add a comment |
For event hub if we face a fault and the consumer crashes, then next time when it comes up how does it get to query what checkpoint it was on for the partition it gets hold of from the storage so that it can compare the reference sequence id of that message and incoming messages and process only the ones that come after that sequence id?
To save the checkpoint there is an API, but how to retrieve it?
azure azure-eventhub
For event hub if we face a fault and the consumer crashes, then next time when it comes up how does it get to query what checkpoint it was on for the partition it gets hold of from the storage so that it can compare the reference sequence id of that message and incoming messages and process only the ones that come after that sequence id?
To save the checkpoint there is an API, but how to retrieve it?
azure azure-eventhub
azure azure-eventhub
asked Dec 21 '18 at 19:40
tariq zafartariq zafar
345214
345214
It depends on the consumer. What is your consumer?
– Peter Bons
Dec 21 '18 at 22:56
@Peter Bons I am using something based on EventProcessorHost.
– tariq zafar
Dec 22 '18 at 0:28
add a comment |
It depends on the consumer. What is your consumer?
– Peter Bons
Dec 21 '18 at 22:56
@Peter Bons I am using something based on EventProcessorHost.
– tariq zafar
Dec 22 '18 at 0:28
It depends on the consumer. What is your consumer?
– Peter Bons
Dec 21 '18 at 22:56
It depends on the consumer. What is your consumer?
– Peter Bons
Dec 21 '18 at 22:56
@Peter Bons I am using something based on EventProcessorHost.
– tariq zafar
Dec 22 '18 at 0:28
@Peter Bons I am using something based on EventProcessorHost.
– tariq zafar
Dec 22 '18 at 0:28
add a comment |
1 Answer
1
active
oldest
votes
As you know that Event Hub Check pointing is purely client side,i.e., you can store the current offset in the storage account linked with your event hub using the method
await context.CheckpointAsync();
in your client code. This will be converted to a storage account call. This is not related to any EventHub Service call.
Whenever there is a failure in your Event hub, you can read the latest(updated) offset from the storage account to avoid duplication of events.This must be handled by you on your client side code and it will not be handled by the event hub on its own.
If a reader disconnects from a partition, when it reconnects it begins reading at the checkpoint that was previously submitted by the last reader of that partition in that consumer group. When the reader connects, it passes the offset to the event hub to specify the location at which to start reading. In this way, you can use checkpointing to both mark events as "complete" by downstream applications, and to provide resiliency if a failover between readers running on different machines occurs. It is possible to return to older data by specifying a lower offset from this checkpointing process. Through this mechanism, checkpointing enables both failover resiliency and event stream replay.
Moreover, failure in an event hub is rare and duplicate events are less frequent. For more details on building a work flow with no duplicate events refer this stack overflow answer
The details of the checkpoint will be saved in the storage account linked to event hub in the format give below. This can be read using WindowsAzure.Storage client to do custom validation of sequence number of the last event received.
Thanks for answering. But if I do interval based checkpointing and my time interval is 3 minutes and Event Hubs crashes in between I am re-reading those 2 or so minutes of data, isnt it? In the link that you posted its mentioned "every time the EventProcessorImpl starts - query your downstream for the last sequence no. it got and keep discarding events until the current sequence no." I sense that I need to query from storage but whats the api that I can use to get last sequence number?
– tariq zafar
Jan 1 at 4:38
As far I explored, the checkpoints for each event hub partition are updated within a storage container in json format including details like owner,token,sequence number and offset. So you can directly read the last sequence number from your storage account using WindowsAzure.Storage client in a custom method as the blob will have only the updated(last) sequence number of the event received.
– Ranjith eswaran
Jan 2 at 3:39
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53889981%2fevent-hub-handling-faults%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
As you know that Event Hub Check pointing is purely client side,i.e., you can store the current offset in the storage account linked with your event hub using the method
await context.CheckpointAsync();
in your client code. This will be converted to a storage account call. This is not related to any EventHub Service call.
Whenever there is a failure in your Event hub, you can read the latest(updated) offset from the storage account to avoid duplication of events.This must be handled by you on your client side code and it will not be handled by the event hub on its own.
If a reader disconnects from a partition, when it reconnects it begins reading at the checkpoint that was previously submitted by the last reader of that partition in that consumer group. When the reader connects, it passes the offset to the event hub to specify the location at which to start reading. In this way, you can use checkpointing to both mark events as "complete" by downstream applications, and to provide resiliency if a failover between readers running on different machines occurs. It is possible to return to older data by specifying a lower offset from this checkpointing process. Through this mechanism, checkpointing enables both failover resiliency and event stream replay.
Moreover, failure in an event hub is rare and duplicate events are less frequent. For more details on building a work flow with no duplicate events refer this stack overflow answer
The details of the checkpoint will be saved in the storage account linked to event hub in the format give below. This can be read using WindowsAzure.Storage client to do custom validation of sequence number of the last event received.
Thanks for answering. But if I do interval based checkpointing and my time interval is 3 minutes and Event Hubs crashes in between I am re-reading those 2 or so minutes of data, isnt it? In the link that you posted its mentioned "every time the EventProcessorImpl starts - query your downstream for the last sequence no. it got and keep discarding events until the current sequence no." I sense that I need to query from storage but whats the api that I can use to get last sequence number?
– tariq zafar
Jan 1 at 4:38
As far I explored, the checkpoints for each event hub partition are updated within a storage container in json format including details like owner,token,sequence number and offset. So you can directly read the last sequence number from your storage account using WindowsAzure.Storage client in a custom method as the blob will have only the updated(last) sequence number of the event received.
– Ranjith eswaran
Jan 2 at 3:39
add a comment |
As you know that Event Hub Check pointing is purely client side,i.e., you can store the current offset in the storage account linked with your event hub using the method
await context.CheckpointAsync();
in your client code. This will be converted to a storage account call. This is not related to any EventHub Service call.
Whenever there is a failure in your Event hub, you can read the latest(updated) offset from the storage account to avoid duplication of events.This must be handled by you on your client side code and it will not be handled by the event hub on its own.
If a reader disconnects from a partition, when it reconnects it begins reading at the checkpoint that was previously submitted by the last reader of that partition in that consumer group. When the reader connects, it passes the offset to the event hub to specify the location at which to start reading. In this way, you can use checkpointing to both mark events as "complete" by downstream applications, and to provide resiliency if a failover between readers running on different machines occurs. It is possible to return to older data by specifying a lower offset from this checkpointing process. Through this mechanism, checkpointing enables both failover resiliency and event stream replay.
Moreover, failure in an event hub is rare and duplicate events are less frequent. For more details on building a work flow with no duplicate events refer this stack overflow answer
The details of the checkpoint will be saved in the storage account linked to event hub in the format give below. This can be read using WindowsAzure.Storage client to do custom validation of sequence number of the last event received.
Thanks for answering. But if I do interval based checkpointing and my time interval is 3 minutes and Event Hubs crashes in between I am re-reading those 2 or so minutes of data, isnt it? In the link that you posted its mentioned "every time the EventProcessorImpl starts - query your downstream for the last sequence no. it got and keep discarding events until the current sequence no." I sense that I need to query from storage but whats the api that I can use to get last sequence number?
– tariq zafar
Jan 1 at 4:38
As far I explored, the checkpoints for each event hub partition are updated within a storage container in json format including details like owner,token,sequence number and offset. So you can directly read the last sequence number from your storage account using WindowsAzure.Storage client in a custom method as the blob will have only the updated(last) sequence number of the event received.
– Ranjith eswaran
Jan 2 at 3:39
add a comment |
As you know that Event Hub Check pointing is purely client side,i.e., you can store the current offset in the storage account linked with your event hub using the method
await context.CheckpointAsync();
in your client code. This will be converted to a storage account call. This is not related to any EventHub Service call.
Whenever there is a failure in your Event hub, you can read the latest(updated) offset from the storage account to avoid duplication of events.This must be handled by you on your client side code and it will not be handled by the event hub on its own.
If a reader disconnects from a partition, when it reconnects it begins reading at the checkpoint that was previously submitted by the last reader of that partition in that consumer group. When the reader connects, it passes the offset to the event hub to specify the location at which to start reading. In this way, you can use checkpointing to both mark events as "complete" by downstream applications, and to provide resiliency if a failover between readers running on different machines occurs. It is possible to return to older data by specifying a lower offset from this checkpointing process. Through this mechanism, checkpointing enables both failover resiliency and event stream replay.
Moreover, failure in an event hub is rare and duplicate events are less frequent. For more details on building a work flow with no duplicate events refer this stack overflow answer
The details of the checkpoint will be saved in the storage account linked to event hub in the format give below. This can be read using WindowsAzure.Storage client to do custom validation of sequence number of the last event received.
As you know that Event Hub Check pointing is purely client side,i.e., you can store the current offset in the storage account linked with your event hub using the method
await context.CheckpointAsync();
in your client code. This will be converted to a storage account call. This is not related to any EventHub Service call.
Whenever there is a failure in your Event hub, you can read the latest(updated) offset from the storage account to avoid duplication of events.This must be handled by you on your client side code and it will not be handled by the event hub on its own.
If a reader disconnects from a partition, when it reconnects it begins reading at the checkpoint that was previously submitted by the last reader of that partition in that consumer group. When the reader connects, it passes the offset to the event hub to specify the location at which to start reading. In this way, you can use checkpointing to both mark events as "complete" by downstream applications, and to provide resiliency if a failover between readers running on different machines occurs. It is possible to return to older data by specifying a lower offset from this checkpointing process. Through this mechanism, checkpointing enables both failover resiliency and event stream replay.
Moreover, failure in an event hub is rare and duplicate events are less frequent. For more details on building a work flow with no duplicate events refer this stack overflow answer
The details of the checkpoint will be saved in the storage account linked to event hub in the format give below. This can be read using WindowsAzure.Storage client to do custom validation of sequence number of the last event received.
edited Jan 2 at 3:47
answered Jan 1 at 4:02
Ranjith eswaranRanjith eswaran
146111
146111
Thanks for answering. But if I do interval based checkpointing and my time interval is 3 minutes and Event Hubs crashes in between I am re-reading those 2 or so minutes of data, isnt it? In the link that you posted its mentioned "every time the EventProcessorImpl starts - query your downstream for the last sequence no. it got and keep discarding events until the current sequence no." I sense that I need to query from storage but whats the api that I can use to get last sequence number?
– tariq zafar
Jan 1 at 4:38
As far I explored, the checkpoints for each event hub partition are updated within a storage container in json format including details like owner,token,sequence number and offset. So you can directly read the last sequence number from your storage account using WindowsAzure.Storage client in a custom method as the blob will have only the updated(last) sequence number of the event received.
– Ranjith eswaran
Jan 2 at 3:39
add a comment |
Thanks for answering. But if I do interval based checkpointing and my time interval is 3 minutes and Event Hubs crashes in between I am re-reading those 2 or so minutes of data, isnt it? In the link that you posted its mentioned "every time the EventProcessorImpl starts - query your downstream for the last sequence no. it got and keep discarding events until the current sequence no." I sense that I need to query from storage but whats the api that I can use to get last sequence number?
– tariq zafar
Jan 1 at 4:38
As far I explored, the checkpoints for each event hub partition are updated within a storage container in json format including details like owner,token,sequence number and offset. So you can directly read the last sequence number from your storage account using WindowsAzure.Storage client in a custom method as the blob will have only the updated(last) sequence number of the event received.
– Ranjith eswaran
Jan 2 at 3:39
Thanks for answering. But if I do interval based checkpointing and my time interval is 3 minutes and Event Hubs crashes in between I am re-reading those 2 or so minutes of data, isnt it? In the link that you posted its mentioned "every time the EventProcessorImpl starts - query your downstream for the last sequence no. it got and keep discarding events until the current sequence no." I sense that I need to query from storage but whats the api that I can use to get last sequence number?
– tariq zafar
Jan 1 at 4:38
Thanks for answering. But if I do interval based checkpointing and my time interval is 3 minutes and Event Hubs crashes in between I am re-reading those 2 or so minutes of data, isnt it? In the link that you posted its mentioned "every time the EventProcessorImpl starts - query your downstream for the last sequence no. it got and keep discarding events until the current sequence no." I sense that I need to query from storage but whats the api that I can use to get last sequence number?
– tariq zafar
Jan 1 at 4:38
As far I explored, the checkpoints for each event hub partition are updated within a storage container in json format including details like owner,token,sequence number and offset. So you can directly read the last sequence number from your storage account using WindowsAzure.Storage client in a custom method as the blob will have only the updated(last) sequence number of the event received.
– Ranjith eswaran
Jan 2 at 3:39
As far I explored, the checkpoints for each event hub partition are updated within a storage container in json format including details like owner,token,sequence number and offset. So you can directly read the last sequence number from your storage account using WindowsAzure.Storage client in a custom method as the blob will have only the updated(last) sequence number of the event received.
– Ranjith eswaran
Jan 2 at 3:39
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53889981%2fevent-hub-handling-faults%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It depends on the consumer. What is your consumer?
– Peter Bons
Dec 21 '18 at 22:56
@Peter Bons I am using something based on EventProcessorHost.
– tariq zafar
Dec 22 '18 at 0:28