Solr: Reload query-time synonyms without reloading collections
I have a solrcloud setup with multiple Collections based on the same configset.
One of the features I have is that the user can define their own synonyms in order to improve their search experience which has worked fine until now. I use a custom way to do this, which basically ends writing the new synonyms.txt to Zookeeper.
Lately the platform has grown and the user has several dozen Collections, must of them with 200k or more documents of non-trivial size.
The problem is that when the user changes the synonyms, the custom code in the system automatically triggers a sequential reload of all the Collections affected by the change of synonyms. This now (Solr 5.2) always causes problems, to a point where the platform becomes unstable and may need a restart of Solr, which means we have to access the platform and manually stabilize it. For now, I had to disable the reloading of the collections to avoid Solr from hanging.
I have upgraded to Solr 7.6 hoping that the changes since 5.2 help with this problem.
The synonyms are only used at query time, so there is no need to reindex anything and it seems like overkill to reload the Collections to have the new synonyms take effect.
Knowing that I will not use synonyms at index time, I tried creating a QueryTimeSynonymGraphFilterFactory that reloads the synonyms every N seconds and this does not quite work as the searches only apply the new synonyms sometimes. My feeling is that maybe there are a number of searchers and I was only able to change the Dictionary for one of them. On average, it seems that 1 in 4 searches uses the new synonyms.
Is there a way to have a SynonymMap that is shared globally? Is there a way force the "searchers" to be recreated?
solr solrcloud
add a comment |
I have a solrcloud setup with multiple Collections based on the same configset.
One of the features I have is that the user can define their own synonyms in order to improve their search experience which has worked fine until now. I use a custom way to do this, which basically ends writing the new synonyms.txt to Zookeeper.
Lately the platform has grown and the user has several dozen Collections, must of them with 200k or more documents of non-trivial size.
The problem is that when the user changes the synonyms, the custom code in the system automatically triggers a sequential reload of all the Collections affected by the change of synonyms. This now (Solr 5.2) always causes problems, to a point where the platform becomes unstable and may need a restart of Solr, which means we have to access the platform and manually stabilize it. For now, I had to disable the reloading of the collections to avoid Solr from hanging.
I have upgraded to Solr 7.6 hoping that the changes since 5.2 help with this problem.
The synonyms are only used at query time, so there is no need to reindex anything and it seems like overkill to reload the Collections to have the new synonyms take effect.
Knowing that I will not use synonyms at index time, I tried creating a QueryTimeSynonymGraphFilterFactory that reloads the synonyms every N seconds and this does not quite work as the searches only apply the new synonyms sometimes. My feeling is that maybe there are a number of searchers and I was only able to change the Dictionary for one of them. On average, it seems that 1 in 4 searches uses the new synonyms.
Is there a way to have a SynonymMap that is shared globally? Is there a way force the "searchers" to be recreated?
solr solrcloud
So how do you update synonyms now? Do you use the Managed resources API?
– MatsLindh
Dec 30 '18 at 9:56
I came with a way before the managed resources API existed which consists of creating the synonyms.txt contents and then upload it to Zookeeper. Finally I call the RELOAD action on the Collections API. I could just as well switch to the managed resources API, but that also needs a RELOAD, which is what I want to avoid.
– s1m3n
Dec 30 '18 at 10:35
The Managed Resources documentation mentions the issue about not reloading the collections - you'll get different results depending on which node the query hits while the new data is imported to each node. If you can live with that discrepancy while it lasts, it should be possible to create a clone of the SynonymGraphFilter that looks up its map in local memory - but using something like memcache or redis for the synonyms and retrieve them for each query (depending on your query performance and number of queries) could be a possible solution as well. Or add a ZK listener and retrieve asap.
– MatsLindh
Dec 30 '18 at 15:10
Query performance should not be an issue. I've read the docs and I am quite sure that if I do not explicitly reload the collections, the new synonyms will never que applied, so it does not work for me. The user expects to be able to define a synonym and be able to use it rather soon. I can totally warn the user that this is an expensive operation and it will take some time to the new synonym "replicated", but that is about it.
– s1m3n
Dec 31 '18 at 9:13
add a comment |
I have a solrcloud setup with multiple Collections based on the same configset.
One of the features I have is that the user can define their own synonyms in order to improve their search experience which has worked fine until now. I use a custom way to do this, which basically ends writing the new synonyms.txt to Zookeeper.
Lately the platform has grown and the user has several dozen Collections, must of them with 200k or more documents of non-trivial size.
The problem is that when the user changes the synonyms, the custom code in the system automatically triggers a sequential reload of all the Collections affected by the change of synonyms. This now (Solr 5.2) always causes problems, to a point where the platform becomes unstable and may need a restart of Solr, which means we have to access the platform and manually stabilize it. For now, I had to disable the reloading of the collections to avoid Solr from hanging.
I have upgraded to Solr 7.6 hoping that the changes since 5.2 help with this problem.
The synonyms are only used at query time, so there is no need to reindex anything and it seems like overkill to reload the Collections to have the new synonyms take effect.
Knowing that I will not use synonyms at index time, I tried creating a QueryTimeSynonymGraphFilterFactory that reloads the synonyms every N seconds and this does not quite work as the searches only apply the new synonyms sometimes. My feeling is that maybe there are a number of searchers and I was only able to change the Dictionary for one of them. On average, it seems that 1 in 4 searches uses the new synonyms.
Is there a way to have a SynonymMap that is shared globally? Is there a way force the "searchers" to be recreated?
solr solrcloud
I have a solrcloud setup with multiple Collections based on the same configset.
One of the features I have is that the user can define their own synonyms in order to improve their search experience which has worked fine until now. I use a custom way to do this, which basically ends writing the new synonyms.txt to Zookeeper.
Lately the platform has grown and the user has several dozen Collections, must of them with 200k or more documents of non-trivial size.
The problem is that when the user changes the synonyms, the custom code in the system automatically triggers a sequential reload of all the Collections affected by the change of synonyms. This now (Solr 5.2) always causes problems, to a point where the platform becomes unstable and may need a restart of Solr, which means we have to access the platform and manually stabilize it. For now, I had to disable the reloading of the collections to avoid Solr from hanging.
I have upgraded to Solr 7.6 hoping that the changes since 5.2 help with this problem.
The synonyms are only used at query time, so there is no need to reindex anything and it seems like overkill to reload the Collections to have the new synonyms take effect.
Knowing that I will not use synonyms at index time, I tried creating a QueryTimeSynonymGraphFilterFactory that reloads the synonyms every N seconds and this does not quite work as the searches only apply the new synonyms sometimes. My feeling is that maybe there are a number of searchers and I was only able to change the Dictionary for one of them. On average, it seems that 1 in 4 searches uses the new synonyms.
Is there a way to have a SynonymMap that is shared globally? Is there a way force the "searchers" to be recreated?
solr solrcloud
solr solrcloud
edited Jan 15 at 14:16
s1m3n
asked Dec 29 '18 at 8:49
s1m3ns1m3n
470416
470416
So how do you update synonyms now? Do you use the Managed resources API?
– MatsLindh
Dec 30 '18 at 9:56
I came with a way before the managed resources API existed which consists of creating the synonyms.txt contents and then upload it to Zookeeper. Finally I call the RELOAD action on the Collections API. I could just as well switch to the managed resources API, but that also needs a RELOAD, which is what I want to avoid.
– s1m3n
Dec 30 '18 at 10:35
The Managed Resources documentation mentions the issue about not reloading the collections - you'll get different results depending on which node the query hits while the new data is imported to each node. If you can live with that discrepancy while it lasts, it should be possible to create a clone of the SynonymGraphFilter that looks up its map in local memory - but using something like memcache or redis for the synonyms and retrieve them for each query (depending on your query performance and number of queries) could be a possible solution as well. Or add a ZK listener and retrieve asap.
– MatsLindh
Dec 30 '18 at 15:10
Query performance should not be an issue. I've read the docs and I am quite sure that if I do not explicitly reload the collections, the new synonyms will never que applied, so it does not work for me. The user expects to be able to define a synonym and be able to use it rather soon. I can totally warn the user that this is an expensive operation and it will take some time to the new synonym "replicated", but that is about it.
– s1m3n
Dec 31 '18 at 9:13
add a comment |
So how do you update synonyms now? Do you use the Managed resources API?
– MatsLindh
Dec 30 '18 at 9:56
I came with a way before the managed resources API existed which consists of creating the synonyms.txt contents and then upload it to Zookeeper. Finally I call the RELOAD action on the Collections API. I could just as well switch to the managed resources API, but that also needs a RELOAD, which is what I want to avoid.
– s1m3n
Dec 30 '18 at 10:35
The Managed Resources documentation mentions the issue about not reloading the collections - you'll get different results depending on which node the query hits while the new data is imported to each node. If you can live with that discrepancy while it lasts, it should be possible to create a clone of the SynonymGraphFilter that looks up its map in local memory - but using something like memcache or redis for the synonyms and retrieve them for each query (depending on your query performance and number of queries) could be a possible solution as well. Or add a ZK listener and retrieve asap.
– MatsLindh
Dec 30 '18 at 15:10
Query performance should not be an issue. I've read the docs and I am quite sure that if I do not explicitly reload the collections, the new synonyms will never que applied, so it does not work for me. The user expects to be able to define a synonym and be able to use it rather soon. I can totally warn the user that this is an expensive operation and it will take some time to the new synonym "replicated", but that is about it.
– s1m3n
Dec 31 '18 at 9:13
So how do you update synonyms now? Do you use the Managed resources API?
– MatsLindh
Dec 30 '18 at 9:56
So how do you update synonyms now? Do you use the Managed resources API?
– MatsLindh
Dec 30 '18 at 9:56
I came with a way before the managed resources API existed which consists of creating the synonyms.txt contents and then upload it to Zookeeper. Finally I call the RELOAD action on the Collections API. I could just as well switch to the managed resources API, but that also needs a RELOAD, which is what I want to avoid.
– s1m3n
Dec 30 '18 at 10:35
I came with a way before the managed resources API existed which consists of creating the synonyms.txt contents and then upload it to Zookeeper. Finally I call the RELOAD action on the Collections API. I could just as well switch to the managed resources API, but that also needs a RELOAD, which is what I want to avoid.
– s1m3n
Dec 30 '18 at 10:35
The Managed Resources documentation mentions the issue about not reloading the collections - you'll get different results depending on which node the query hits while the new data is imported to each node. If you can live with that discrepancy while it lasts, it should be possible to create a clone of the SynonymGraphFilter that looks up its map in local memory - but using something like memcache or redis for the synonyms and retrieve them for each query (depending on your query performance and number of queries) could be a possible solution as well. Or add a ZK listener and retrieve asap.
– MatsLindh
Dec 30 '18 at 15:10
The Managed Resources documentation mentions the issue about not reloading the collections - you'll get different results depending on which node the query hits while the new data is imported to each node. If you can live with that discrepancy while it lasts, it should be possible to create a clone of the SynonymGraphFilter that looks up its map in local memory - but using something like memcache or redis for the synonyms and retrieve them for each query (depending on your query performance and number of queries) could be a possible solution as well. Or add a ZK listener and retrieve asap.
– MatsLindh
Dec 30 '18 at 15:10
Query performance should not be an issue. I've read the docs and I am quite sure that if I do not explicitly reload the collections, the new synonyms will never que applied, so it does not work for me. The user expects to be able to define a synonym and be able to use it rather soon. I can totally warn the user that this is an expensive operation and it will take some time to the new synonym "replicated", but that is about it.
– s1m3n
Dec 31 '18 at 9:13
Query performance should not be an issue. I've read the docs and I am quite sure that if I do not explicitly reload the collections, the new synonyms will never que applied, so it does not work for me. The user expects to be able to define a synonym and be able to use it rather soon. I can totally warn the user that this is an expensive operation and it will take some time to the new synonym "replicated", but that is about it.
– s1m3n
Dec 31 '18 at 9:13
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53968086%2fsolr-reload-query-time-synonyms-without-reloading-collections%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53968086%2fsolr-reload-query-time-synonyms-without-reloading-collections%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
So how do you update synonyms now? Do you use the Managed resources API?
– MatsLindh
Dec 30 '18 at 9:56
I came with a way before the managed resources API existed which consists of creating the synonyms.txt contents and then upload it to Zookeeper. Finally I call the RELOAD action on the Collections API. I could just as well switch to the managed resources API, but that also needs a RELOAD, which is what I want to avoid.
– s1m3n
Dec 30 '18 at 10:35
The Managed Resources documentation mentions the issue about not reloading the collections - you'll get different results depending on which node the query hits while the new data is imported to each node. If you can live with that discrepancy while it lasts, it should be possible to create a clone of the SynonymGraphFilter that looks up its map in local memory - but using something like memcache or redis for the synonyms and retrieve them for each query (depending on your query performance and number of queries) could be a possible solution as well. Or add a ZK listener and retrieve asap.
– MatsLindh
Dec 30 '18 at 15:10
Query performance should not be an issue. I've read the docs and I am quite sure that if I do not explicitly reload the collections, the new synonyms will never que applied, so it does not work for me. The user expects to be able to define a synonym and be able to use it rather soon. I can totally warn the user that this is an expensive operation and it will take some time to the new synonym "replicated", but that is about it.
– s1m3n
Dec 31 '18 at 9:13