Using kafka stream state store to hold over 500 million message

Multi tool use
I am trying to evaluate if kafka stream with rocksdb state store and be used in production with 500 million message in the changelog (state topic).
Use case
I have about 8 different topic's data I about, that are owned by different teams, from the these topic I care about certain data point, so I am using rocksdb to hold the state of the object, to which I add the required data from different topic.
Has kafka stream with state store being used this way ?
apache-kafka apache-kafka-streams
add a comment |
I am trying to evaluate if kafka stream with rocksdb state store and be used in production with 500 million message in the changelog (state topic).
Use case
I have about 8 different topic's data I about, that are owned by different teams, from the these topic I care about certain data point, so I am using rocksdb to hold the state of the object, to which I add the required data from different topic.
Has kafka stream with state store being used this way ?
apache-kafka apache-kafka-streams
1
The cardinality of the keyset is more important than the total volume of messages. For example, if you have one unique key, and 500 billion messages with that same key, then it's only compacted to store one record.
– cricket_007
Dec 28 '18 at 23:53
what if there are 500 million different keys, and updates to those 500 million keys come in at random time ?
– user3822232
Dec 30 '18 at 0:06
Time doesn't matter either. If you want to store every unique entry, then you'll need lots of disk space to store the RocksDB database
– cricket_007
Dec 30 '18 at 5:25
add a comment |
I am trying to evaluate if kafka stream with rocksdb state store and be used in production with 500 million message in the changelog (state topic).
Use case
I have about 8 different topic's data I about, that are owned by different teams, from the these topic I care about certain data point, so I am using rocksdb to hold the state of the object, to which I add the required data from different topic.
Has kafka stream with state store being used this way ?
apache-kafka apache-kafka-streams
I am trying to evaluate if kafka stream with rocksdb state store and be used in production with 500 million message in the changelog (state topic).
Use case
I have about 8 different topic's data I about, that are owned by different teams, from the these topic I care about certain data point, so I am using rocksdb to hold the state of the object, to which I add the required data from different topic.
Has kafka stream with state store being used this way ?
apache-kafka apache-kafka-streams
apache-kafka apache-kafka-streams
asked Dec 28 '18 at 21:12
user3822232user3822232
1
1
1
The cardinality of the keyset is more important than the total volume of messages. For example, if you have one unique key, and 500 billion messages with that same key, then it's only compacted to store one record.
– cricket_007
Dec 28 '18 at 23:53
what if there are 500 million different keys, and updates to those 500 million keys come in at random time ?
– user3822232
Dec 30 '18 at 0:06
Time doesn't matter either. If you want to store every unique entry, then you'll need lots of disk space to store the RocksDB database
– cricket_007
Dec 30 '18 at 5:25
add a comment |
1
The cardinality of the keyset is more important than the total volume of messages. For example, if you have one unique key, and 500 billion messages with that same key, then it's only compacted to store one record.
– cricket_007
Dec 28 '18 at 23:53
what if there are 500 million different keys, and updates to those 500 million keys come in at random time ?
– user3822232
Dec 30 '18 at 0:06
Time doesn't matter either. If you want to store every unique entry, then you'll need lots of disk space to store the RocksDB database
– cricket_007
Dec 30 '18 at 5:25
1
1
The cardinality of the keyset is more important than the total volume of messages. For example, if you have one unique key, and 500 billion messages with that same key, then it's only compacted to store one record.
– cricket_007
Dec 28 '18 at 23:53
The cardinality of the keyset is more important than the total volume of messages. For example, if you have one unique key, and 500 billion messages with that same key, then it's only compacted to store one record.
– cricket_007
Dec 28 '18 at 23:53
what if there are 500 million different keys, and updates to those 500 million keys come in at random time ?
– user3822232
Dec 30 '18 at 0:06
what if there are 500 million different keys, and updates to those 500 million keys come in at random time ?
– user3822232
Dec 30 '18 at 0:06
Time doesn't matter either. If you want to store every unique entry, then you'll need lots of disk space to store the RocksDB database
– cricket_007
Dec 30 '18 at 5:25
Time doesn't matter either. If you want to store every unique entry, then you'll need lots of disk space to store the RocksDB database
– cricket_007
Dec 30 '18 at 5:25
add a comment |
1 Answer
1
active
oldest
votes
You can always use the State store to store millions of keys. It requires the disk storage to store all the entries as @cricket007 also mentioned. As states are flushed to the file system.
Usually millions of keys causes the storage or memory issues. As long as you have storage available, it will work.Also, you need to make sure that states are not in memory.
On a personal experience, I have around 100 millions of keys in several state stores, I ran into disk space problem first but after adding more disks, it works fine.
Also, you can read more about Capacity planning to get some fair idea :
https://docs.confluent.io/current/streams/sizing.html
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53964330%2fusing-kafka-stream-state-store-to-hold-over-500-million-message%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can always use the State store to store millions of keys. It requires the disk storage to store all the entries as @cricket007 also mentioned. As states are flushed to the file system.
Usually millions of keys causes the storage or memory issues. As long as you have storage available, it will work.Also, you need to make sure that states are not in memory.
On a personal experience, I have around 100 millions of keys in several state stores, I ran into disk space problem first but after adding more disks, it works fine.
Also, you can read more about Capacity planning to get some fair idea :
https://docs.confluent.io/current/streams/sizing.html
add a comment |
You can always use the State store to store millions of keys. It requires the disk storage to store all the entries as @cricket007 also mentioned. As states are flushed to the file system.
Usually millions of keys causes the storage or memory issues. As long as you have storage available, it will work.Also, you need to make sure that states are not in memory.
On a personal experience, I have around 100 millions of keys in several state stores, I ran into disk space problem first but after adding more disks, it works fine.
Also, you can read more about Capacity planning to get some fair idea :
https://docs.confluent.io/current/streams/sizing.html
add a comment |
You can always use the State store to store millions of keys. It requires the disk storage to store all the entries as @cricket007 also mentioned. As states are flushed to the file system.
Usually millions of keys causes the storage or memory issues. As long as you have storage available, it will work.Also, you need to make sure that states are not in memory.
On a personal experience, I have around 100 millions of keys in several state stores, I ran into disk space problem first but after adding more disks, it works fine.
Also, you can read more about Capacity planning to get some fair idea :
https://docs.confluent.io/current/streams/sizing.html
You can always use the State store to store millions of keys. It requires the disk storage to store all the entries as @cricket007 also mentioned. As states are flushed to the file system.
Usually millions of keys causes the storage or memory issues. As long as you have storage available, it will work.Also, you need to make sure that states are not in memory.
On a personal experience, I have around 100 millions of keys in several state stores, I ran into disk space problem first but after adding more disks, it works fine.
Also, you can read more about Capacity planning to get some fair idea :
https://docs.confluent.io/current/streams/sizing.html
answered Jan 2 at 13:18
Nishu TayalNishu Tayal
11.8k73481
11.8k73481
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53964330%2fusing-kafka-stream-state-store-to-hold-over-500-million-message%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
K7PCU XfSkb3npli 3BDiG1iV Z9q5QFv3GfDiPA0qC,dAhlqabQS f5r 8,wCl5ABvhnYvgZhNYNGjrbE7AZbsPKYx b6EF Uz7
1
The cardinality of the keyset is more important than the total volume of messages. For example, if you have one unique key, and 500 billion messages with that same key, then it's only compacted to store one record.
– cricket_007
Dec 28 '18 at 23:53
what if there are 500 million different keys, and updates to those 500 million keys come in at random time ?
– user3822232
Dec 30 '18 at 0:06
Time doesn't matter either. If you want to store every unique entry, then you'll need lots of disk space to store the RocksDB database
– cricket_007
Dec 30 '18 at 5:25