Using kafka stream state store to hold over 500 million message

Multi tool use
Multi tool use












0















I am trying to evaluate if kafka stream with rocksdb state store and be used in production with 500 million message in the changelog (state topic).



Use case
I have about 8 different topic's data I about, that are owned by different teams, from the these topic I care about certain data point, so I am using rocksdb to hold the state of the object, to which I add the required data from different topic.



Has kafka stream with state store being used this way ?










share|improve this question


















  • 1





    The cardinality of the keyset is more important than the total volume of messages. For example, if you have one unique key, and 500 billion messages with that same key, then it's only compacted to store one record.

    – cricket_007
    Dec 28 '18 at 23:53













  • what if there are 500 million different keys, and updates to those 500 million keys come in at random time ?

    – user3822232
    Dec 30 '18 at 0:06











  • Time doesn't matter either. If you want to store every unique entry, then you'll need lots of disk space to store the RocksDB database

    – cricket_007
    Dec 30 '18 at 5:25


















0















I am trying to evaluate if kafka stream with rocksdb state store and be used in production with 500 million message in the changelog (state topic).



Use case
I have about 8 different topic's data I about, that are owned by different teams, from the these topic I care about certain data point, so I am using rocksdb to hold the state of the object, to which I add the required data from different topic.



Has kafka stream with state store being used this way ?










share|improve this question


















  • 1





    The cardinality of the keyset is more important than the total volume of messages. For example, if you have one unique key, and 500 billion messages with that same key, then it's only compacted to store one record.

    – cricket_007
    Dec 28 '18 at 23:53













  • what if there are 500 million different keys, and updates to those 500 million keys come in at random time ?

    – user3822232
    Dec 30 '18 at 0:06











  • Time doesn't matter either. If you want to store every unique entry, then you'll need lots of disk space to store the RocksDB database

    – cricket_007
    Dec 30 '18 at 5:25
















0












0








0








I am trying to evaluate if kafka stream with rocksdb state store and be used in production with 500 million message in the changelog (state topic).



Use case
I have about 8 different topic's data I about, that are owned by different teams, from the these topic I care about certain data point, so I am using rocksdb to hold the state of the object, to which I add the required data from different topic.



Has kafka stream with state store being used this way ?










share|improve this question














I am trying to evaluate if kafka stream with rocksdb state store and be used in production with 500 million message in the changelog (state topic).



Use case
I have about 8 different topic's data I about, that are owned by different teams, from the these topic I care about certain data point, so I am using rocksdb to hold the state of the object, to which I add the required data from different topic.



Has kafka stream with state store being used this way ?







apache-kafka apache-kafka-streams






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 28 '18 at 21:12









user3822232user3822232

1




1








  • 1





    The cardinality of the keyset is more important than the total volume of messages. For example, if you have one unique key, and 500 billion messages with that same key, then it's only compacted to store one record.

    – cricket_007
    Dec 28 '18 at 23:53













  • what if there are 500 million different keys, and updates to those 500 million keys come in at random time ?

    – user3822232
    Dec 30 '18 at 0:06











  • Time doesn't matter either. If you want to store every unique entry, then you'll need lots of disk space to store the RocksDB database

    – cricket_007
    Dec 30 '18 at 5:25
















  • 1





    The cardinality of the keyset is more important than the total volume of messages. For example, if you have one unique key, and 500 billion messages with that same key, then it's only compacted to store one record.

    – cricket_007
    Dec 28 '18 at 23:53













  • what if there are 500 million different keys, and updates to those 500 million keys come in at random time ?

    – user3822232
    Dec 30 '18 at 0:06











  • Time doesn't matter either. If you want to store every unique entry, then you'll need lots of disk space to store the RocksDB database

    – cricket_007
    Dec 30 '18 at 5:25










1




1





The cardinality of the keyset is more important than the total volume of messages. For example, if you have one unique key, and 500 billion messages with that same key, then it's only compacted to store one record.

– cricket_007
Dec 28 '18 at 23:53







The cardinality of the keyset is more important than the total volume of messages. For example, if you have one unique key, and 500 billion messages with that same key, then it's only compacted to store one record.

– cricket_007
Dec 28 '18 at 23:53















what if there are 500 million different keys, and updates to those 500 million keys come in at random time ?

– user3822232
Dec 30 '18 at 0:06





what if there are 500 million different keys, and updates to those 500 million keys come in at random time ?

– user3822232
Dec 30 '18 at 0:06













Time doesn't matter either. If you want to store every unique entry, then you'll need lots of disk space to store the RocksDB database

– cricket_007
Dec 30 '18 at 5:25







Time doesn't matter either. If you want to store every unique entry, then you'll need lots of disk space to store the RocksDB database

– cricket_007
Dec 30 '18 at 5:25














1 Answer
1






active

oldest

votes


















0














You can always use the State store to store millions of keys. It requires the disk storage to store all the entries as @cricket007 also mentioned. As states are flushed to the file system.
Usually millions of keys causes the storage or memory issues. As long as you have storage available, it will work.Also, you need to make sure that states are not in memory.



On a personal experience, I have around 100 millions of keys in several state stores, I ran into disk space problem first but after adding more disks, it works fine.



Also, you can read more about Capacity planning to get some fair idea :
https://docs.confluent.io/current/streams/sizing.html






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53964330%2fusing-kafka-stream-state-store-to-hold-over-500-million-message%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    You can always use the State store to store millions of keys. It requires the disk storage to store all the entries as @cricket007 also mentioned. As states are flushed to the file system.
    Usually millions of keys causes the storage or memory issues. As long as you have storage available, it will work.Also, you need to make sure that states are not in memory.



    On a personal experience, I have around 100 millions of keys in several state stores, I ran into disk space problem first but after adding more disks, it works fine.



    Also, you can read more about Capacity planning to get some fair idea :
    https://docs.confluent.io/current/streams/sizing.html






    share|improve this answer




























      0














      You can always use the State store to store millions of keys. It requires the disk storage to store all the entries as @cricket007 also mentioned. As states are flushed to the file system.
      Usually millions of keys causes the storage or memory issues. As long as you have storage available, it will work.Also, you need to make sure that states are not in memory.



      On a personal experience, I have around 100 millions of keys in several state stores, I ran into disk space problem first but after adding more disks, it works fine.



      Also, you can read more about Capacity planning to get some fair idea :
      https://docs.confluent.io/current/streams/sizing.html






      share|improve this answer


























        0












        0








        0







        You can always use the State store to store millions of keys. It requires the disk storage to store all the entries as @cricket007 also mentioned. As states are flushed to the file system.
        Usually millions of keys causes the storage or memory issues. As long as you have storage available, it will work.Also, you need to make sure that states are not in memory.



        On a personal experience, I have around 100 millions of keys in several state stores, I ran into disk space problem first but after adding more disks, it works fine.



        Also, you can read more about Capacity planning to get some fair idea :
        https://docs.confluent.io/current/streams/sizing.html






        share|improve this answer













        You can always use the State store to store millions of keys. It requires the disk storage to store all the entries as @cricket007 also mentioned. As states are flushed to the file system.
        Usually millions of keys causes the storage or memory issues. As long as you have storage available, it will work.Also, you need to make sure that states are not in memory.



        On a personal experience, I have around 100 millions of keys in several state stores, I ran into disk space problem first but after adding more disks, it works fine.



        Also, you can read more about Capacity planning to get some fair idea :
        https://docs.confluent.io/current/streams/sizing.html







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 2 at 13:18









        Nishu TayalNishu Tayal

        11.8k73481




        11.8k73481






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53964330%2fusing-kafka-stream-state-store-to-hold-over-500-million-message%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            K7PCU XfSkb3npli 3BDiG1iV Z9q5QFv3GfDiPA0qC,dAhlqabQS f5r 8,wCl5ABvhnYvgZhNYNGjrbE7AZbsPKYx b6EF Uz7
            4Ckw8d,fgORrcf0GqqR9YKUUrA xnah8gcZp,Z CZK4wM8cmMsuxoGtfM,A,Z6hBbNGpJsKjjqdoRzLFK

            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas