Limiting Elasticsearch data retention below disk space












0















Scenario:




  • We use Elasticsearch & logstash to do application logging for a moderately high traffic system

  • This system generates ~200gb of logs every single day

  • We use 4 instances sharded; and want to retain roughly last 3 days worth of logs

  • So, we implemented a "cleanup" system, running daily, which removes all data older than 3 days


So far so good. However, a few days ago, some subsystem generated a persistent spike of data logs, resulting in filling up all available disk space within a few hours, which turned the cluster red. This also meant, that the cleanup system wasn't able to connect to ES, as the entire cluster was down -on account of disk being full. This is extremely problematic, as it limits our visibility into what's going on -and blocks our ability to see what caused this in the first place.



Doing root cause analysis here, a few questions pop out:




  • How can we look at the system in eg Kibana when the cluster status is red?

  • How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

  • In what ways can we make sure this does not happen ever again?










share|improve this question



























    0















    Scenario:




    • We use Elasticsearch & logstash to do application logging for a moderately high traffic system

    • This system generates ~200gb of logs every single day

    • We use 4 instances sharded; and want to retain roughly last 3 days worth of logs

    • So, we implemented a "cleanup" system, running daily, which removes all data older than 3 days


    So far so good. However, a few days ago, some subsystem generated a persistent spike of data logs, resulting in filling up all available disk space within a few hours, which turned the cluster red. This also meant, that the cleanup system wasn't able to connect to ES, as the entire cluster was down -on account of disk being full. This is extremely problematic, as it limits our visibility into what's going on -and blocks our ability to see what caused this in the first place.



    Doing root cause analysis here, a few questions pop out:




    • How can we look at the system in eg Kibana when the cluster status is red?

    • How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

    • In what ways can we make sure this does not happen ever again?










    share|improve this question

























      0












      0








      0








      Scenario:




      • We use Elasticsearch & logstash to do application logging for a moderately high traffic system

      • This system generates ~200gb of logs every single day

      • We use 4 instances sharded; and want to retain roughly last 3 days worth of logs

      • So, we implemented a "cleanup" system, running daily, which removes all data older than 3 days


      So far so good. However, a few days ago, some subsystem generated a persistent spike of data logs, resulting in filling up all available disk space within a few hours, which turned the cluster red. This also meant, that the cleanup system wasn't able to connect to ES, as the entire cluster was down -on account of disk being full. This is extremely problematic, as it limits our visibility into what's going on -and blocks our ability to see what caused this in the first place.



      Doing root cause analysis here, a few questions pop out:




      • How can we look at the system in eg Kibana when the cluster status is red?

      • How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

      • In what ways can we make sure this does not happen ever again?










      share|improve this question














      Scenario:




      • We use Elasticsearch & logstash to do application logging for a moderately high traffic system

      • This system generates ~200gb of logs every single day

      • We use 4 instances sharded; and want to retain roughly last 3 days worth of logs

      • So, we implemented a "cleanup" system, running daily, which removes all data older than 3 days


      So far so good. However, a few days ago, some subsystem generated a persistent spike of data logs, resulting in filling up all available disk space within a few hours, which turned the cluster red. This also meant, that the cleanup system wasn't able to connect to ES, as the entire cluster was down -on account of disk being full. This is extremely problematic, as it limits our visibility into what's going on -and blocks our ability to see what caused this in the first place.



      Doing root cause analysis here, a few questions pop out:




      • How can we look at the system in eg Kibana when the cluster status is red?

      • How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

      • In what ways can we make sure this does not happen ever again?







      elasticsearch elastic-stack






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Dec 31 '18 at 23:04









      Silver DragonSilver Dragon

      3,60853265




      3,60853265
























          2 Answers
          2






          active

          oldest

          votes


















          1














          Date based index patterns are tricky with spiky loads. There are two things to combine this for a smooth setup without needing manual intervention:




          1. Switch to rollover indices. You can then define that you want to create a new index once your existing one has reached X GB. Then you don't care about the log volume per day any more, but you can simply keep as many indices around as you have disk space (and leave some buffer / fine tune the watermarks).

          2. To automate the rollover, removal of indices, and optionally setting of an alias, we have Elastic Curator:


            • Example for rollover


            • Example for delete index, but you want to combine this with the count filtertype




          PS: There will be another solution soon, called Index Lifecycle Management. It's built into Elasticsearch directly and can be configured through Kibana, but it's only around the corner at the moment.






          share|improve this answer































            0














            How can we look at the system in eg Kibana when the cluster status is red?



            Kibana can't connect to ES if it's already down. Best to poll Cluster health API to get cluster's current state.



            How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?



            This option is not inbuilt within Elasticsearch. Best way is to monitor disk space using Watcher or some other tool and have your monitoring send out an alert + trigger a job that cleansup old logs if the disk usage goes below a specified threshold.



            In what ways can we make sure this does not happen ever again?



            Monitor the disk space of your cluster nodes.






            share|improve this answer

























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53992031%2flimiting-elasticsearch-data-retention-below-disk-space%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1














              Date based index patterns are tricky with spiky loads. There are two things to combine this for a smooth setup without needing manual intervention:




              1. Switch to rollover indices. You can then define that you want to create a new index once your existing one has reached X GB. Then you don't care about the log volume per day any more, but you can simply keep as many indices around as you have disk space (and leave some buffer / fine tune the watermarks).

              2. To automate the rollover, removal of indices, and optionally setting of an alias, we have Elastic Curator:


                • Example for rollover


                • Example for delete index, but you want to combine this with the count filtertype




              PS: There will be another solution soon, called Index Lifecycle Management. It's built into Elasticsearch directly and can be configured through Kibana, but it's only around the corner at the moment.






              share|improve this answer




























                1














                Date based index patterns are tricky with spiky loads. There are two things to combine this for a smooth setup without needing manual intervention:




                1. Switch to rollover indices. You can then define that you want to create a new index once your existing one has reached X GB. Then you don't care about the log volume per day any more, but you can simply keep as many indices around as you have disk space (and leave some buffer / fine tune the watermarks).

                2. To automate the rollover, removal of indices, and optionally setting of an alias, we have Elastic Curator:


                  • Example for rollover


                  • Example for delete index, but you want to combine this with the count filtertype




                PS: There will be another solution soon, called Index Lifecycle Management. It's built into Elasticsearch directly and can be configured through Kibana, but it's only around the corner at the moment.






                share|improve this answer


























                  1












                  1








                  1







                  Date based index patterns are tricky with spiky loads. There are two things to combine this for a smooth setup without needing manual intervention:




                  1. Switch to rollover indices. You can then define that you want to create a new index once your existing one has reached X GB. Then you don't care about the log volume per day any more, but you can simply keep as many indices around as you have disk space (and leave some buffer / fine tune the watermarks).

                  2. To automate the rollover, removal of indices, and optionally setting of an alias, we have Elastic Curator:


                    • Example for rollover


                    • Example for delete index, but you want to combine this with the count filtertype




                  PS: There will be another solution soon, called Index Lifecycle Management. It's built into Elasticsearch directly and can be configured through Kibana, but it's only around the corner at the moment.






                  share|improve this answer













                  Date based index patterns are tricky with spiky loads. There are two things to combine this for a smooth setup without needing manual intervention:




                  1. Switch to rollover indices. You can then define that you want to create a new index once your existing one has reached X GB. Then you don't care about the log volume per day any more, but you can simply keep as many indices around as you have disk space (and leave some buffer / fine tune the watermarks).

                  2. To automate the rollover, removal of indices, and optionally setting of an alias, we have Elastic Curator:


                    • Example for rollover


                    • Example for delete index, but you want to combine this with the count filtertype




                  PS: There will be another solution soon, called Index Lifecycle Management. It's built into Elasticsearch directly and can be configured through Kibana, but it's only around the corner at the moment.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jan 2 at 3:28









                  xeraaxeraa

                  6,53232354




                  6,53232354

























                      0














                      How can we look at the system in eg Kibana when the cluster status is red?



                      Kibana can't connect to ES if it's already down. Best to poll Cluster health API to get cluster's current state.



                      How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?



                      This option is not inbuilt within Elasticsearch. Best way is to monitor disk space using Watcher or some other tool and have your monitoring send out an alert + trigger a job that cleansup old logs if the disk usage goes below a specified threshold.



                      In what ways can we make sure this does not happen ever again?



                      Monitor the disk space of your cluster nodes.






                      share|improve this answer






























                        0














                        How can we look at the system in eg Kibana when the cluster status is red?



                        Kibana can't connect to ES if it's already down. Best to poll Cluster health API to get cluster's current state.



                        How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?



                        This option is not inbuilt within Elasticsearch. Best way is to monitor disk space using Watcher or some other tool and have your monitoring send out an alert + trigger a job that cleansup old logs if the disk usage goes below a specified threshold.



                        In what ways can we make sure this does not happen ever again?



                        Monitor the disk space of your cluster nodes.






                        share|improve this answer




























                          0












                          0








                          0







                          How can we look at the system in eg Kibana when the cluster status is red?



                          Kibana can't connect to ES if it's already down. Best to poll Cluster health API to get cluster's current state.



                          How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?



                          This option is not inbuilt within Elasticsearch. Best way is to monitor disk space using Watcher or some other tool and have your monitoring send out an alert + trigger a job that cleansup old logs if the disk usage goes below a specified threshold.



                          In what ways can we make sure this does not happen ever again?



                          Monitor the disk space of your cluster nodes.






                          share|improve this answer















                          How can we look at the system in eg Kibana when the cluster status is red?



                          Kibana can't connect to ES if it's already down. Best to poll Cluster health API to get cluster's current state.



                          How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?



                          This option is not inbuilt within Elasticsearch. Best way is to monitor disk space using Watcher or some other tool and have your monitoring send out an alert + trigger a job that cleansup old logs if the disk usage goes below a specified threshold.



                          In what ways can we make sure this does not happen ever again?



                          Monitor the disk space of your cluster nodes.







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Jan 1 at 8:52

























                          answered Jan 1 at 8:12









                          ben5556ben5556

                          1,9522310




                          1,9522310






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53992031%2flimiting-elasticsearch-data-retention-below-disk-space%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Monofisismo

                              Angular Downloading a file using contenturl with Basic Authentication

                              Olmecas