Limiting Elasticsearch data retention below disk space

Scenario:

We use Elasticsearch & logstash to do application logging for a moderately high traffic system

This system generates ~200gb of logs every single day

We use 4 instances sharded; and want to retain roughly last 3 days worth of logs

So, we implemented a "cleanup" system, running daily, which removes all data older than 3 days

So far so good. However, a few days ago, some subsystem generated a persistent spike of data logs, resulting in filling up all available disk space within a few hours, which turned the cluster red. This also meant, that the cleanup system wasn't able to connect to ES, as the entire cluster was down -on account of disk being full. This is extremely problematic, as it limits our visibility into what's going on -and blocks our ability to see what caused this in the first place.

Doing root cause analysis here, a few questions pop out:

How can we look at the system in eg Kibana when the cluster status is red?

How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

In what ways can we make sure this does not happen ever again?

asked Dec 31 '18 at 23:04

Silver Dragon

3,60853265

add a comment |

Scenario:

We use Elasticsearch & logstash to do application logging for a moderately high traffic system

This system generates ~200gb of logs every single day

We use 4 instances sharded; and want to retain roughly last 3 days worth of logs

So, we implemented a "cleanup" system, running daily, which removes all data older than 3 days

Doing root cause analysis here, a few questions pop out:

How can we look at the system in eg Kibana when the cluster status is red?

How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

In what ways can we make sure this does not happen ever again?

asked Dec 31 '18 at 23:04

Silver Dragon

3,60853265

add a comment |

Scenario:

We use Elasticsearch & logstash to do application logging for a moderately high traffic system

This system generates ~200gb of logs every single day

We use 4 instances sharded; and want to retain roughly last 3 days worth of logs

So, we implemented a "cleanup" system, running daily, which removes all data older than 3 days

Doing root cause analysis here, a few questions pop out:

How can we look at the system in eg Kibana when the cluster status is red?

How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

In what ways can we make sure this does not happen ever again?

asked Dec 31 '18 at 23:04

Silver Dragon

3,60853265

Scenario:

We use Elasticsearch & logstash to do application logging for a moderately high traffic system

This system generates ~200gb of logs every single day

We use 4 instances sharded; and want to retain roughly last 3 days worth of logs

So, we implemented a "cleanup" system, running daily, which removes all data older than 3 days

Doing root cause analysis here, a few questions pop out:

How can we look at the system in eg Kibana when the cluster status is red?

How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

In what ways can we make sure this does not happen ever again?

elasticsearch elastic-stack

asked Dec 31 '18 at 23:04

Silver Dragon

3,60853265

asked Dec 31 '18 at 23:04

Silver Dragon

3,60853265

asked Dec 31 '18 at 23:04

Silver Dragon

3,60853265

asked Dec 31 '18 at 23:04

Silver Dragon

3,60853265

asked Dec 31 '18 at 23:04

Silver Dragon

3,60853265

add a comment |

2 Answers
2

active

oldest

votes

Date based index patterns are tricky with spiky loads. There are two things to combine this for a smooth setup without needing manual intervention:

Switch to rollover indices. You can then define that you want to create a new index once your existing one has reached X GB. Then you don't care about the log volume per day any more, but you can simply keep as many indices around as you have disk space (and leave some buffer / fine tune the watermarks).

To automate the rollover, removal of indices, and optionally setting of an alias, we have Elastic Curator:
- Example for rollover
- Example for delete index, but you want to combine this with the count filtertype

PS: There will be another solution soon, called Index Lifecycle Management. It's built into Elasticsearch directly and can be configured through Kibana, but it's only around the corner at the moment.

answered Jan 2 at 3:28

xeraa

6,53232354

add a comment |

How can we look at the system in eg Kibana when the cluster status is red?

Kibana can't connect to ES if it's already down. Best to poll Cluster health API to get cluster's current state.

How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

This option is not inbuilt within Elasticsearch. Best way is to monitor disk space using Watcher or some other tool and have your monitoring send out an alert + trigger a job that cleansup old logs if the disk usage goes below a specified threshold.

In what ways can we make sure this does not happen ever again?

Monitor the disk space of your cluster nodes.

edited Jan 1 at 8:52

answered Jan 1 at 8:12

ben5556

1,9522310

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53992031%2flimiting-elasticsearch-data-retention-below-disk-space%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Date based index patterns are tricky with spiky loads. There are two things to combine this for a smooth setup without needing manual intervention:

Switch to rollover indices. You can then define that you want to create a new index once your existing one has reached X GB. Then you don't care about the log volume per day any more, but you can simply keep as many indices around as you have disk space (and leave some buffer / fine tune the watermarks).

To automate the rollover, removal of indices, and optionally setting of an alias, we have Elastic Curator:
- Example for rollover
- Example for delete index, but you want to combine this with the count filtertype

PS: There will be another solution soon, called Index Lifecycle Management. It's built into Elasticsearch directly and can be configured through Kibana, but it's only around the corner at the moment.

answered Jan 2 at 3:28

xeraa

6,53232354

add a comment |

Date based index patterns are tricky with spiky loads. There are two things to combine this for a smooth setup without needing manual intervention:

Switch to rollover indices. You can then define that you want to create a new index once your existing one has reached X GB. Then you don't care about the log volume per day any more, but you can simply keep as many indices around as you have disk space (and leave some buffer / fine tune the watermarks).

To automate the rollover, removal of indices, and optionally setting of an alias, we have Elastic Curator:
- Example for rollover
- Example for delete index, but you want to combine this with the count filtertype

PS: There will be another solution soon, called Index Lifecycle Management. It's built into Elasticsearch directly and can be configured through Kibana, but it's only around the corner at the moment.

answered Jan 2 at 3:28

xeraa

6,53232354

add a comment |

Date based index patterns are tricky with spiky loads. There are two things to combine this for a smooth setup without needing manual intervention:

Switch to rollover indices. You can then define that you want to create a new index once your existing one has reached X GB. Then you don't care about the log volume per day any more, but you can simply keep as many indices around as you have disk space (and leave some buffer / fine tune the watermarks).

To automate the rollover, removal of indices, and optionally setting of an alias, we have Elastic Curator:
- Example for rollover
- Example for delete index, but you want to combine this with the count filtertype

PS: There will be another solution soon, called Index Lifecycle Management. It's built into Elasticsearch directly and can be configured through Kibana, but it's only around the corner at the moment.

answered Jan 2 at 3:28

xeraa

6,53232354

Date based index patterns are tricky with spiky loads. There are two things to combine this for a smooth setup without needing manual intervention:

Switch to rollover indices. You can then define that you want to create a new index once your existing one has reached X GB. Then you don't care about the log volume per day any more, but you can simply keep as many indices around as you have disk space (and leave some buffer / fine tune the watermarks).

To automate the rollover, removal of indices, and optionally setting of an alias, we have Elastic Curator:
- Example for rollover
- Example for delete index, but you want to combine this with the count filtertype

PS: There will be another solution soon, called Index Lifecycle Management. It's built into Elasticsearch directly and can be configured through Kibana, but it's only around the corner at the moment.

answered Jan 2 at 3:28

xeraa

6,53232354

answered Jan 2 at 3:28

xeraa

6,53232354

answered Jan 2 at 3:28

xeraa

6,53232354

answered Jan 2 at 3:28

xeraa

6,53232354

add a comment |

How can we look at the system in eg Kibana when the cluster status is red?

Kibana can't connect to ES if it's already down. Best to poll Cluster health API to get cluster's current state.

How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

In what ways can we make sure this does not happen ever again?

Monitor the disk space of your cluster nodes.

edited Jan 1 at 8:52

answered Jan 1 at 8:12

ben5556

1,9522310

add a comment |

How can we look at the system in eg Kibana when the cluster status is red?

Kibana can't connect to ES if it's already down. Best to poll Cluster health API to get cluster's current state.

How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

In what ways can we make sure this does not happen ever again?

Monitor the disk space of your cluster nodes.

edited Jan 1 at 8:52

answered Jan 1 at 8:12

ben5556

1,9522310

add a comment |

How can we look at the system in eg Kibana when the cluster status is red?

Kibana can't connect to ES if it's already down. Best to poll Cluster health API to get cluster's current state.

How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

In what ways can we make sure this does not happen ever again?

Monitor the disk space of your cluster nodes.

edited Jan 1 at 8:52

answered Jan 1 at 8:12

ben5556

1,9522310

How can we look at the system in eg Kibana when the cluster status is red?

Kibana can't connect to ES if it's already down. Best to poll Cluster health API to get cluster's current state.

How can we tell ES to throw away (oldest-first) logs if there is no more space, rather than going status=red?

In what ways can we make sure this does not happen ever again?

Monitor the disk space of your cluster nodes.

edited Jan 1 at 8:52

answered Jan 1 at 8:12

ben5556

1,9522310

edited Jan 1 at 8:52

answered Jan 1 at 8:12

ben5556

1,9522310

answered Jan 1 at 8:12

ben5556

1,9522310

answered Jan 1 at 8:12

ben5556

1,9522310

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk