Solr: Reload query-time synonyms without reloading collections












0















I have a solrcloud setup with multiple Collections based on the same configset.



One of the features I have is that the user can define their own synonyms in order to improve their search experience which has worked fine until now. I use a custom way to do this, which basically ends writing the new synonyms.txt to Zookeeper.



Lately the platform has grown and the user has several dozen Collections, must of them with 200k or more documents of non-trivial size.



The problem is that when the user changes the synonyms, the custom code in the system automatically triggers a sequential reload of all the Collections affected by the change of synonyms. This now (Solr 5.2) always causes problems, to a point where the platform becomes unstable and may need a restart of Solr, which means we have to access the platform and manually stabilize it. For now, I had to disable the reloading of the collections to avoid Solr from hanging.



I have upgraded to Solr 7.6 hoping that the changes since 5.2 help with this problem.



The synonyms are only used at query time, so there is no need to reindex anything and it seems like overkill to reload the Collections to have the new synonyms take effect.



Knowing that I will not use synonyms at index time, I tried creating a QueryTimeSynonymGraphFilterFactory that reloads the synonyms every N seconds and this does not quite work as the searches only apply the new synonyms sometimes. My feeling is that maybe there are a number of searchers and I was only able to change the Dictionary for one of them. On average, it seems that 1 in 4 searches uses the new synonyms.



Is there a way to have a SynonymMap that is shared globally? Is there a way force the "searchers" to be recreated?










share|improve this question

























  • So how do you update synonyms now? Do you use the Managed resources API?

    – MatsLindh
    Dec 30 '18 at 9:56











  • I came with a way before the managed resources API existed which consists of creating the synonyms.txt contents and then upload it to Zookeeper. Finally I call the RELOAD action on the Collections API. I could just as well switch to the managed resources API, but that also needs a RELOAD, which is what I want to avoid.

    – s1m3n
    Dec 30 '18 at 10:35













  • The Managed Resources documentation mentions the issue about not reloading the collections - you'll get different results depending on which node the query hits while the new data is imported to each node. If you can live with that discrepancy while it lasts, it should be possible to create a clone of the SynonymGraphFilter that looks up its map in local memory - but using something like memcache or redis for the synonyms and retrieve them for each query (depending on your query performance and number of queries) could be a possible solution as well. Or add a ZK listener and retrieve asap.

    – MatsLindh
    Dec 30 '18 at 15:10











  • Query performance should not be an issue. I've read the docs and I am quite sure that if I do not explicitly reload the collections, the new synonyms will never que applied, so it does not work for me. The user expects to be able to define a synonym and be able to use it rather soon. I can totally warn the user that this is an expensive operation and it will take some time to the new synonym "replicated", but that is about it.

    – s1m3n
    Dec 31 '18 at 9:13
















0















I have a solrcloud setup with multiple Collections based on the same configset.



One of the features I have is that the user can define their own synonyms in order to improve their search experience which has worked fine until now. I use a custom way to do this, which basically ends writing the new synonyms.txt to Zookeeper.



Lately the platform has grown and the user has several dozen Collections, must of them with 200k or more documents of non-trivial size.



The problem is that when the user changes the synonyms, the custom code in the system automatically triggers a sequential reload of all the Collections affected by the change of synonyms. This now (Solr 5.2) always causes problems, to a point where the platform becomes unstable and may need a restart of Solr, which means we have to access the platform and manually stabilize it. For now, I had to disable the reloading of the collections to avoid Solr from hanging.



I have upgraded to Solr 7.6 hoping that the changes since 5.2 help with this problem.



The synonyms are only used at query time, so there is no need to reindex anything and it seems like overkill to reload the Collections to have the new synonyms take effect.



Knowing that I will not use synonyms at index time, I tried creating a QueryTimeSynonymGraphFilterFactory that reloads the synonyms every N seconds and this does not quite work as the searches only apply the new synonyms sometimes. My feeling is that maybe there are a number of searchers and I was only able to change the Dictionary for one of them. On average, it seems that 1 in 4 searches uses the new synonyms.



Is there a way to have a SynonymMap that is shared globally? Is there a way force the "searchers" to be recreated?










share|improve this question

























  • So how do you update synonyms now? Do you use the Managed resources API?

    – MatsLindh
    Dec 30 '18 at 9:56











  • I came with a way before the managed resources API existed which consists of creating the synonyms.txt contents and then upload it to Zookeeper. Finally I call the RELOAD action on the Collections API. I could just as well switch to the managed resources API, but that also needs a RELOAD, which is what I want to avoid.

    – s1m3n
    Dec 30 '18 at 10:35













  • The Managed Resources documentation mentions the issue about not reloading the collections - you'll get different results depending on which node the query hits while the new data is imported to each node. If you can live with that discrepancy while it lasts, it should be possible to create a clone of the SynonymGraphFilter that looks up its map in local memory - but using something like memcache or redis for the synonyms and retrieve them for each query (depending on your query performance and number of queries) could be a possible solution as well. Or add a ZK listener and retrieve asap.

    – MatsLindh
    Dec 30 '18 at 15:10











  • Query performance should not be an issue. I've read the docs and I am quite sure that if I do not explicitly reload the collections, the new synonyms will never que applied, so it does not work for me. The user expects to be able to define a synonym and be able to use it rather soon. I can totally warn the user that this is an expensive operation and it will take some time to the new synonym "replicated", but that is about it.

    – s1m3n
    Dec 31 '18 at 9:13














0












0








0


1






I have a solrcloud setup with multiple Collections based on the same configset.



One of the features I have is that the user can define their own synonyms in order to improve their search experience which has worked fine until now. I use a custom way to do this, which basically ends writing the new synonyms.txt to Zookeeper.



Lately the platform has grown and the user has several dozen Collections, must of them with 200k or more documents of non-trivial size.



The problem is that when the user changes the synonyms, the custom code in the system automatically triggers a sequential reload of all the Collections affected by the change of synonyms. This now (Solr 5.2) always causes problems, to a point where the platform becomes unstable and may need a restart of Solr, which means we have to access the platform and manually stabilize it. For now, I had to disable the reloading of the collections to avoid Solr from hanging.



I have upgraded to Solr 7.6 hoping that the changes since 5.2 help with this problem.



The synonyms are only used at query time, so there is no need to reindex anything and it seems like overkill to reload the Collections to have the new synonyms take effect.



Knowing that I will not use synonyms at index time, I tried creating a QueryTimeSynonymGraphFilterFactory that reloads the synonyms every N seconds and this does not quite work as the searches only apply the new synonyms sometimes. My feeling is that maybe there are a number of searchers and I was only able to change the Dictionary for one of them. On average, it seems that 1 in 4 searches uses the new synonyms.



Is there a way to have a SynonymMap that is shared globally? Is there a way force the "searchers" to be recreated?










share|improve this question
















I have a solrcloud setup with multiple Collections based on the same configset.



One of the features I have is that the user can define their own synonyms in order to improve their search experience which has worked fine until now. I use a custom way to do this, which basically ends writing the new synonyms.txt to Zookeeper.



Lately the platform has grown and the user has several dozen Collections, must of them with 200k or more documents of non-trivial size.



The problem is that when the user changes the synonyms, the custom code in the system automatically triggers a sequential reload of all the Collections affected by the change of synonyms. This now (Solr 5.2) always causes problems, to a point where the platform becomes unstable and may need a restart of Solr, which means we have to access the platform and manually stabilize it. For now, I had to disable the reloading of the collections to avoid Solr from hanging.



I have upgraded to Solr 7.6 hoping that the changes since 5.2 help with this problem.



The synonyms are only used at query time, so there is no need to reindex anything and it seems like overkill to reload the Collections to have the new synonyms take effect.



Knowing that I will not use synonyms at index time, I tried creating a QueryTimeSynonymGraphFilterFactory that reloads the synonyms every N seconds and this does not quite work as the searches only apply the new synonyms sometimes. My feeling is that maybe there are a number of searchers and I was only able to change the Dictionary for one of them. On average, it seems that 1 in 4 searches uses the new synonyms.



Is there a way to have a SynonymMap that is shared globally? Is there a way force the "searchers" to be recreated?







solr solrcloud






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 15 at 14:16







s1m3n

















asked Dec 29 '18 at 8:49









s1m3ns1m3n

470416




470416













  • So how do you update synonyms now? Do you use the Managed resources API?

    – MatsLindh
    Dec 30 '18 at 9:56











  • I came with a way before the managed resources API existed which consists of creating the synonyms.txt contents and then upload it to Zookeeper. Finally I call the RELOAD action on the Collections API. I could just as well switch to the managed resources API, but that also needs a RELOAD, which is what I want to avoid.

    – s1m3n
    Dec 30 '18 at 10:35













  • The Managed Resources documentation mentions the issue about not reloading the collections - you'll get different results depending on which node the query hits while the new data is imported to each node. If you can live with that discrepancy while it lasts, it should be possible to create a clone of the SynonymGraphFilter that looks up its map in local memory - but using something like memcache or redis for the synonyms and retrieve them for each query (depending on your query performance and number of queries) could be a possible solution as well. Or add a ZK listener and retrieve asap.

    – MatsLindh
    Dec 30 '18 at 15:10











  • Query performance should not be an issue. I've read the docs and I am quite sure that if I do not explicitly reload the collections, the new synonyms will never que applied, so it does not work for me. The user expects to be able to define a synonym and be able to use it rather soon. I can totally warn the user that this is an expensive operation and it will take some time to the new synonym "replicated", but that is about it.

    – s1m3n
    Dec 31 '18 at 9:13



















  • So how do you update synonyms now? Do you use the Managed resources API?

    – MatsLindh
    Dec 30 '18 at 9:56











  • I came with a way before the managed resources API existed which consists of creating the synonyms.txt contents and then upload it to Zookeeper. Finally I call the RELOAD action on the Collections API. I could just as well switch to the managed resources API, but that also needs a RELOAD, which is what I want to avoid.

    – s1m3n
    Dec 30 '18 at 10:35













  • The Managed Resources documentation mentions the issue about not reloading the collections - you'll get different results depending on which node the query hits while the new data is imported to each node. If you can live with that discrepancy while it lasts, it should be possible to create a clone of the SynonymGraphFilter that looks up its map in local memory - but using something like memcache or redis for the synonyms and retrieve them for each query (depending on your query performance and number of queries) could be a possible solution as well. Or add a ZK listener and retrieve asap.

    – MatsLindh
    Dec 30 '18 at 15:10











  • Query performance should not be an issue. I've read the docs and I am quite sure that if I do not explicitly reload the collections, the new synonyms will never que applied, so it does not work for me. The user expects to be able to define a synonym and be able to use it rather soon. I can totally warn the user that this is an expensive operation and it will take some time to the new synonym "replicated", but that is about it.

    – s1m3n
    Dec 31 '18 at 9:13

















So how do you update synonyms now? Do you use the Managed resources API?

– MatsLindh
Dec 30 '18 at 9:56





So how do you update synonyms now? Do you use the Managed resources API?

– MatsLindh
Dec 30 '18 at 9:56













I came with a way before the managed resources API existed which consists of creating the synonyms.txt contents and then upload it to Zookeeper. Finally I call the RELOAD action on the Collections API. I could just as well switch to the managed resources API, but that also needs a RELOAD, which is what I want to avoid.

– s1m3n
Dec 30 '18 at 10:35







I came with a way before the managed resources API existed which consists of creating the synonyms.txt contents and then upload it to Zookeeper. Finally I call the RELOAD action on the Collections API. I could just as well switch to the managed resources API, but that also needs a RELOAD, which is what I want to avoid.

– s1m3n
Dec 30 '18 at 10:35















The Managed Resources documentation mentions the issue about not reloading the collections - you'll get different results depending on which node the query hits while the new data is imported to each node. If you can live with that discrepancy while it lasts, it should be possible to create a clone of the SynonymGraphFilter that looks up its map in local memory - but using something like memcache or redis for the synonyms and retrieve them for each query (depending on your query performance and number of queries) could be a possible solution as well. Or add a ZK listener and retrieve asap.

– MatsLindh
Dec 30 '18 at 15:10





The Managed Resources documentation mentions the issue about not reloading the collections - you'll get different results depending on which node the query hits while the new data is imported to each node. If you can live with that discrepancy while it lasts, it should be possible to create a clone of the SynonymGraphFilter that looks up its map in local memory - but using something like memcache or redis for the synonyms and retrieve them for each query (depending on your query performance and number of queries) could be a possible solution as well. Or add a ZK listener and retrieve asap.

– MatsLindh
Dec 30 '18 at 15:10













Query performance should not be an issue. I've read the docs and I am quite sure that if I do not explicitly reload the collections, the new synonyms will never que applied, so it does not work for me. The user expects to be able to define a synonym and be able to use it rather soon. I can totally warn the user that this is an expensive operation and it will take some time to the new synonym "replicated", but that is about it.

– s1m3n
Dec 31 '18 at 9:13





Query performance should not be an issue. I've read the docs and I am quite sure that if I do not explicitly reload the collections, the new synonyms will never que applied, so it does not work for me. The user expects to be able to define a synonym and be able to use it rather soon. I can totally warn the user that this is an expensive operation and it will take some time to the new synonym "replicated", but that is about it.

– s1m3n
Dec 31 '18 at 9:13












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53968086%2fsolr-reload-query-time-synonyms-without-reloading-collections%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53968086%2fsolr-reload-query-time-synonyms-without-reloading-collections%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas