match phrase query not working as expected












0














Reading from elastic documentation:




the match_phrase query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.




I have configured my analyzer to use edge_ngram with keyword tokenizer :



{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}


Here is the java class that is used for indexing :



@Document(indexName = "myindex", type = "program")
@Getter
@Setter
@Setting(settingPath = "/elasticsearch/settings.json")
public class Program {


@org.springframework.data.annotation.Id
private Long instanceId;

@Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )
private String name;
}


if I have the following phrase in document "hello world", the following query will match it :



{
"match" : {
"name" : {
"query" : "ho",
"type" : "phrase"
}
}
}
result : "hello world"


that's not what I expect because not all of the search terms in the document.



my questions :



1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )



2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)





update:



just in case that the question is not clear. The match phrase query should analyze the string to list of terms , here it's ho . Now we will have 2 terms as this is edge_ngram with 1 min_gram. The 2 terms are h and ho . according to elasticsearch the document must contain all of the search terms. However hello world has h only and doesn't have ho so why I did get a match here ?










share|improve this question
























  • (1) You haven't added index mapping, neither you have specified the type for name field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
    – Nishant Saini
    2 days ago










  • @NishantSaini updated the question
    – Mohammad Karmi
    2 days ago
















0














Reading from elastic documentation:




the match_phrase query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.




I have configured my analyzer to use edge_ngram with keyword tokenizer :



{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}


Here is the java class that is used for indexing :



@Document(indexName = "myindex", type = "program")
@Getter
@Setter
@Setting(settingPath = "/elasticsearch/settings.json")
public class Program {


@org.springframework.data.annotation.Id
private Long instanceId;

@Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )
private String name;
}


if I have the following phrase in document "hello world", the following query will match it :



{
"match" : {
"name" : {
"query" : "ho",
"type" : "phrase"
}
}
}
result : "hello world"


that's not what I expect because not all of the search terms in the document.



my questions :



1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )



2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)





update:



just in case that the question is not clear. The match phrase query should analyze the string to list of terms , here it's ho . Now we will have 2 terms as this is edge_ngram with 1 min_gram. The 2 terms are h and ho . according to elasticsearch the document must contain all of the search terms. However hello world has h only and doesn't have ho so why I did get a match here ?










share|improve this question
























  • (1) You haven't added index mapping, neither you have specified the type for name field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
    – Nishant Saini
    2 days ago










  • @NishantSaini updated the question
    – Mohammad Karmi
    2 days ago














0












0








0







Reading from elastic documentation:




the match_phrase query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.




I have configured my analyzer to use edge_ngram with keyword tokenizer :



{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}


Here is the java class that is used for indexing :



@Document(indexName = "myindex", type = "program")
@Getter
@Setter
@Setting(settingPath = "/elasticsearch/settings.json")
public class Program {


@org.springframework.data.annotation.Id
private Long instanceId;

@Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )
private String name;
}


if I have the following phrase in document "hello world", the following query will match it :



{
"match" : {
"name" : {
"query" : "ho",
"type" : "phrase"
}
}
}
result : "hello world"


that's not what I expect because not all of the search terms in the document.



my questions :



1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )



2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)





update:



just in case that the question is not clear. The match phrase query should analyze the string to list of terms , here it's ho . Now we will have 2 terms as this is edge_ngram with 1 min_gram. The 2 terms are h and ho . according to elasticsearch the document must contain all of the search terms. However hello world has h only and doesn't have ho so why I did get a match here ?










share|improve this question















Reading from elastic documentation:




the match_phrase query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.




I have configured my analyzer to use edge_ngram with keyword tokenizer :



{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}


Here is the java class that is used for indexing :



@Document(indexName = "myindex", type = "program")
@Getter
@Setter
@Setting(settingPath = "/elasticsearch/settings.json")
public class Program {


@org.springframework.data.annotation.Id
private Long instanceId;

@Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )
private String name;
}


if I have the following phrase in document "hello world", the following query will match it :



{
"match" : {
"name" : {
"query" : "ho",
"type" : "phrase"
}
}
}
result : "hello world"


that's not what I expect because not all of the search terms in the document.



my questions :



1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )



2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)





update:



just in case that the question is not clear. The match phrase query should analyze the string to list of terms , here it's ho . Now we will have 2 terms as this is edge_ngram with 1 min_gram. The 2 terms are h and ho . according to elasticsearch the document must contain all of the search terms. However hello world has h only and doesn't have ho so why I did get a match here ?







elasticsearch spring-data-elasticsearch elasticsearch-2.0






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday

























asked 2 days ago









Mohammad Karmi

4081517




4081517












  • (1) You haven't added index mapping, neither you have specified the type for name field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
    – Nishant Saini
    2 days ago










  • @NishantSaini updated the question
    – Mohammad Karmi
    2 days ago


















  • (1) You haven't added index mapping, neither you have specified the type for name field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
    – Nishant Saini
    2 days ago










  • @NishantSaini updated the question
    – Mohammad Karmi
    2 days ago
















(1) You haven't added index mapping, neither you have specified the type for name field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
– Nishant Saini
2 days ago




(1) You haven't added index mapping, neither you have specified the type for name field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
– Nishant Saini
2 days ago












@NishantSaini updated the question
– Mohammad Karmi
2 days ago




@NishantSaini updated the question
– Mohammad Karmi
2 days ago












3 Answers
3






active

oldest

votes


















1
















  1. If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:



    PUT test
    {
    "settings": {
    "number_of_shards": 1,
    "analysis": {
    "filter": {
    "autocomplete_filter": {
    "type": "edge_ngram",
    "min_gram": 1,
    "max_gram": 20
    }
    },
    "analyzer": {
    "autocomplete": {
    "type": "custom",
    "tokenizer": "keyword",
    "filter": [
    "lowercase",
    "autocomplete_filter"
    ]
    }
    }
    }
    },
    "mappings": {
    "_doc": {
    "properties": {
    "name": {
    "type": "text",
    "analyzer": "autocomplete"
    }
    }
    }
    }
    }

    PUT test/_doc/1
    {
    "name": "Hello world"
    }

    GET test/_search
    {
    "query": {
    "match_phrase": {
    "name": "hello foo"
    }
    }
    }


  2. Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.


  3. I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?

  4. Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined min_gram: 1, your ho will be searched as h and ho. The h matches the h from hello. match or match_phrase doesn't make a difference here with this analyzer.






share|improve this answer





















  • I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
    – Mohammad Karmi
    yesterday






  • 1




    to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
    – Mohammad Karmi
    yesterday





















0














If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.



Structured Text Tokenizers






share|improve this answer





























    0














    I have got the answer from elasticsearch forum :



    You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho" . Assuming your index is called my_index :



    GET my_index/_analyze
    {
    "text": "ho",
    "analyzer": "autocomplete"
    }


    The response shows you that the output of your analyzer would be two tokens at position 0:



    {
    "tokens": [
    {
    "token": "h",
    "start_offset": 0,
    "end_offset": 2,
    "type": "word",
    "position": 0
    },
    {
    "token": "ho",
    "start_offset": 0,
    "end_offset": 2,
    "type": "word",
    "position": 0
    }
    ]
    }


    What does Elasticsearch do with a query for two tokens at the same position? It treat's the query as an "OR", even if you use a type "phrase" . You can see that from the output of the validate API (which shows you the Lucene query that your query was written into):



    GET my_index/_validate/query?rewrite=true
    {
    "query": {
    "match": {
    "name": {
    "query": "ho",
    "type": "phrase"
    }
    }
    }
    }


    Because both your query and your document have an h at position 0, the document is going to be a hit.



    Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.



    So, if you create your index like this instead:



    PUT my_index
    {
    "settings": {
    "number_of_shards": 1,
    "analysis": {
    "tokenizer": {
    "autocomplete_tokenizer": {
    "type": "edge_ngram",
    "min_gram": 1,
    "max_gram": 20
    }
    },
    "analyzer": {
    "autocomplete": {
    "type": "custom",
    "tokenizer": "autocomplete_tokenizer",
    "filter": [
    "lowercase"
    ]
    }
    }
    }
    },
    "mappings": {
    "doc": {
    "properties": {
    "name": {
    "type": "string",
    "analyzer": "autocomplete"
    }
    }
    }
    }
    }


    You will see that this query is no longer a hit:



    GET my_index/_search
    {
    "query": {
    "match": {
    "name": {
    "query": "ho",
    "type": "phrase"
    }
    }
    }
    }


    But for example this one is:



    GET my_index/_search
    {
    "query": {
    "match": {
    "name": {
    "query": "he",
    "type": "phrase"
    }
    }
    }
    }





    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945046%2fmatch-phrase-query-not-working-as-expected%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1
















      1. If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:



        PUT test
        {
        "settings": {
        "number_of_shards": 1,
        "analysis": {
        "filter": {
        "autocomplete_filter": {
        "type": "edge_ngram",
        "min_gram": 1,
        "max_gram": 20
        }
        },
        "analyzer": {
        "autocomplete": {
        "type": "custom",
        "tokenizer": "keyword",
        "filter": [
        "lowercase",
        "autocomplete_filter"
        ]
        }
        }
        }
        },
        "mappings": {
        "_doc": {
        "properties": {
        "name": {
        "type": "text",
        "analyzer": "autocomplete"
        }
        }
        }
        }
        }

        PUT test/_doc/1
        {
        "name": "Hello world"
        }

        GET test/_search
        {
        "query": {
        "match_phrase": {
        "name": "hello foo"
        }
        }
        }


      2. Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.


      3. I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?

      4. Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined min_gram: 1, your ho will be searched as h and ho. The h matches the h from hello. match or match_phrase doesn't make a difference here with this analyzer.






      share|improve this answer





















      • I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
        – Mohammad Karmi
        yesterday






      • 1




        to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
        – Mohammad Karmi
        yesterday


















      1
















      1. If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:



        PUT test
        {
        "settings": {
        "number_of_shards": 1,
        "analysis": {
        "filter": {
        "autocomplete_filter": {
        "type": "edge_ngram",
        "min_gram": 1,
        "max_gram": 20
        }
        },
        "analyzer": {
        "autocomplete": {
        "type": "custom",
        "tokenizer": "keyword",
        "filter": [
        "lowercase",
        "autocomplete_filter"
        ]
        }
        }
        }
        },
        "mappings": {
        "_doc": {
        "properties": {
        "name": {
        "type": "text",
        "analyzer": "autocomplete"
        }
        }
        }
        }
        }

        PUT test/_doc/1
        {
        "name": "Hello world"
        }

        GET test/_search
        {
        "query": {
        "match_phrase": {
        "name": "hello foo"
        }
        }
        }


      2. Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.


      3. I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?

      4. Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined min_gram: 1, your ho will be searched as h and ho. The h matches the h from hello. match or match_phrase doesn't make a difference here with this analyzer.






      share|improve this answer





















      • I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
        – Mohammad Karmi
        yesterday






      • 1




        to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
        – Mohammad Karmi
        yesterday
















      1












      1








      1








      1. If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:



        PUT test
        {
        "settings": {
        "number_of_shards": 1,
        "analysis": {
        "filter": {
        "autocomplete_filter": {
        "type": "edge_ngram",
        "min_gram": 1,
        "max_gram": 20
        }
        },
        "analyzer": {
        "autocomplete": {
        "type": "custom",
        "tokenizer": "keyword",
        "filter": [
        "lowercase",
        "autocomplete_filter"
        ]
        }
        }
        }
        },
        "mappings": {
        "_doc": {
        "properties": {
        "name": {
        "type": "text",
        "analyzer": "autocomplete"
        }
        }
        }
        }
        }

        PUT test/_doc/1
        {
        "name": "Hello world"
        }

        GET test/_search
        {
        "query": {
        "match_phrase": {
        "name": "hello foo"
        }
        }
        }


      2. Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.


      3. I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?

      4. Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined min_gram: 1, your ho will be searched as h and ho. The h matches the h from hello. match or match_phrase doesn't make a difference here with this analyzer.






      share|improve this answer














      1. If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:



        PUT test
        {
        "settings": {
        "number_of_shards": 1,
        "analysis": {
        "filter": {
        "autocomplete_filter": {
        "type": "edge_ngram",
        "min_gram": 1,
        "max_gram": 20
        }
        },
        "analyzer": {
        "autocomplete": {
        "type": "custom",
        "tokenizer": "keyword",
        "filter": [
        "lowercase",
        "autocomplete_filter"
        ]
        }
        }
        }
        },
        "mappings": {
        "_doc": {
        "properties": {
        "name": {
        "type": "text",
        "analyzer": "autocomplete"
        }
        }
        }
        }
        }

        PUT test/_doc/1
        {
        "name": "Hello world"
        }

        GET test/_search
        {
        "query": {
        "match_phrase": {
        "name": "hello foo"
        }
        }
        }


      2. Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.


      3. I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?

      4. Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined min_gram: 1, your ho will be searched as h and ho. The h matches the h from hello. match or match_phrase doesn't make a difference here with this analyzer.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered 2 days ago









      xeraa

      6,40932254




      6,40932254












      • I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
        – Mohammad Karmi
        yesterday






      • 1




        to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
        – Mohammad Karmi
        yesterday




















      • I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
        – Mohammad Karmi
        yesterday






      • 1




        to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
        – Mohammad Karmi
        yesterday


















      I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
      – Mohammad Karmi
      yesterday




      I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
      – Mohammad Karmi
      yesterday




      1




      1




      to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
      – Mohammad Karmi
      yesterday






      to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
      – Mohammad Karmi
      yesterday















      0














      If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.



      Structured Text Tokenizers






      share|improve this answer


























        0














        If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.



        Structured Text Tokenizers






        share|improve this answer
























          0












          0








          0






          If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.



          Structured Text Tokenizers






          share|improve this answer












          If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.



          Structured Text Tokenizers







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 days ago









          M. Mis

          82




          82























              0














              I have got the answer from elasticsearch forum :



              You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho" . Assuming your index is called my_index :



              GET my_index/_analyze
              {
              "text": "ho",
              "analyzer": "autocomplete"
              }


              The response shows you that the output of your analyzer would be two tokens at position 0:



              {
              "tokens": [
              {
              "token": "h",
              "start_offset": 0,
              "end_offset": 2,
              "type": "word",
              "position": 0
              },
              {
              "token": "ho",
              "start_offset": 0,
              "end_offset": 2,
              "type": "word",
              "position": 0
              }
              ]
              }


              What does Elasticsearch do with a query for two tokens at the same position? It treat's the query as an "OR", even if you use a type "phrase" . You can see that from the output of the validate API (which shows you the Lucene query that your query was written into):



              GET my_index/_validate/query?rewrite=true
              {
              "query": {
              "match": {
              "name": {
              "query": "ho",
              "type": "phrase"
              }
              }
              }
              }


              Because both your query and your document have an h at position 0, the document is going to be a hit.



              Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.



              So, if you create your index like this instead:



              PUT my_index
              {
              "settings": {
              "number_of_shards": 1,
              "analysis": {
              "tokenizer": {
              "autocomplete_tokenizer": {
              "type": "edge_ngram",
              "min_gram": 1,
              "max_gram": 20
              }
              },
              "analyzer": {
              "autocomplete": {
              "type": "custom",
              "tokenizer": "autocomplete_tokenizer",
              "filter": [
              "lowercase"
              ]
              }
              }
              }
              },
              "mappings": {
              "doc": {
              "properties": {
              "name": {
              "type": "string",
              "analyzer": "autocomplete"
              }
              }
              }
              }
              }


              You will see that this query is no longer a hit:



              GET my_index/_search
              {
              "query": {
              "match": {
              "name": {
              "query": "ho",
              "type": "phrase"
              }
              }
              }
              }


              But for example this one is:



              GET my_index/_search
              {
              "query": {
              "match": {
              "name": {
              "query": "he",
              "type": "phrase"
              }
              }
              }
              }





              share|improve this answer


























                0














                I have got the answer from elasticsearch forum :



                You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho" . Assuming your index is called my_index :



                GET my_index/_analyze
                {
                "text": "ho",
                "analyzer": "autocomplete"
                }


                The response shows you that the output of your analyzer would be two tokens at position 0:



                {
                "tokens": [
                {
                "token": "h",
                "start_offset": 0,
                "end_offset": 2,
                "type": "word",
                "position": 0
                },
                {
                "token": "ho",
                "start_offset": 0,
                "end_offset": 2,
                "type": "word",
                "position": 0
                }
                ]
                }


                What does Elasticsearch do with a query for two tokens at the same position? It treat's the query as an "OR", even if you use a type "phrase" . You can see that from the output of the validate API (which shows you the Lucene query that your query was written into):



                GET my_index/_validate/query?rewrite=true
                {
                "query": {
                "match": {
                "name": {
                "query": "ho",
                "type": "phrase"
                }
                }
                }
                }


                Because both your query and your document have an h at position 0, the document is going to be a hit.



                Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.



                So, if you create your index like this instead:



                PUT my_index
                {
                "settings": {
                "number_of_shards": 1,
                "analysis": {
                "tokenizer": {
                "autocomplete_tokenizer": {
                "type": "edge_ngram",
                "min_gram": 1,
                "max_gram": 20
                }
                },
                "analyzer": {
                "autocomplete": {
                "type": "custom",
                "tokenizer": "autocomplete_tokenizer",
                "filter": [
                "lowercase"
                ]
                }
                }
                }
                },
                "mappings": {
                "doc": {
                "properties": {
                "name": {
                "type": "string",
                "analyzer": "autocomplete"
                }
                }
                }
                }
                }


                You will see that this query is no longer a hit:



                GET my_index/_search
                {
                "query": {
                "match": {
                "name": {
                "query": "ho",
                "type": "phrase"
                }
                }
                }
                }


                But for example this one is:



                GET my_index/_search
                {
                "query": {
                "match": {
                "name": {
                "query": "he",
                "type": "phrase"
                }
                }
                }
                }





                share|improve this answer
























                  0












                  0








                  0






                  I have got the answer from elasticsearch forum :



                  You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho" . Assuming your index is called my_index :



                  GET my_index/_analyze
                  {
                  "text": "ho",
                  "analyzer": "autocomplete"
                  }


                  The response shows you that the output of your analyzer would be two tokens at position 0:



                  {
                  "tokens": [
                  {
                  "token": "h",
                  "start_offset": 0,
                  "end_offset": 2,
                  "type": "word",
                  "position": 0
                  },
                  {
                  "token": "ho",
                  "start_offset": 0,
                  "end_offset": 2,
                  "type": "word",
                  "position": 0
                  }
                  ]
                  }


                  What does Elasticsearch do with a query for two tokens at the same position? It treat's the query as an "OR", even if you use a type "phrase" . You can see that from the output of the validate API (which shows you the Lucene query that your query was written into):



                  GET my_index/_validate/query?rewrite=true
                  {
                  "query": {
                  "match": {
                  "name": {
                  "query": "ho",
                  "type": "phrase"
                  }
                  }
                  }
                  }


                  Because both your query and your document have an h at position 0, the document is going to be a hit.



                  Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.



                  So, if you create your index like this instead:



                  PUT my_index
                  {
                  "settings": {
                  "number_of_shards": 1,
                  "analysis": {
                  "tokenizer": {
                  "autocomplete_tokenizer": {
                  "type": "edge_ngram",
                  "min_gram": 1,
                  "max_gram": 20
                  }
                  },
                  "analyzer": {
                  "autocomplete": {
                  "type": "custom",
                  "tokenizer": "autocomplete_tokenizer",
                  "filter": [
                  "lowercase"
                  ]
                  }
                  }
                  }
                  },
                  "mappings": {
                  "doc": {
                  "properties": {
                  "name": {
                  "type": "string",
                  "analyzer": "autocomplete"
                  }
                  }
                  }
                  }
                  }


                  You will see that this query is no longer a hit:



                  GET my_index/_search
                  {
                  "query": {
                  "match": {
                  "name": {
                  "query": "ho",
                  "type": "phrase"
                  }
                  }
                  }
                  }


                  But for example this one is:



                  GET my_index/_search
                  {
                  "query": {
                  "match": {
                  "name": {
                  "query": "he",
                  "type": "phrase"
                  }
                  }
                  }
                  }





                  share|improve this answer












                  I have got the answer from elasticsearch forum :



                  You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho" . Assuming your index is called my_index :



                  GET my_index/_analyze
                  {
                  "text": "ho",
                  "analyzer": "autocomplete"
                  }


                  The response shows you that the output of your analyzer would be two tokens at position 0:



                  {
                  "tokens": [
                  {
                  "token": "h",
                  "start_offset": 0,
                  "end_offset": 2,
                  "type": "word",
                  "position": 0
                  },
                  {
                  "token": "ho",
                  "start_offset": 0,
                  "end_offset": 2,
                  "type": "word",
                  "position": 0
                  }
                  ]
                  }


                  What does Elasticsearch do with a query for two tokens at the same position? It treat's the query as an "OR", even if you use a type "phrase" . You can see that from the output of the validate API (which shows you the Lucene query that your query was written into):



                  GET my_index/_validate/query?rewrite=true
                  {
                  "query": {
                  "match": {
                  "name": {
                  "query": "ho",
                  "type": "phrase"
                  }
                  }
                  }
                  }


                  Because both your query and your document have an h at position 0, the document is going to be a hit.



                  Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.



                  So, if you create your index like this instead:



                  PUT my_index
                  {
                  "settings": {
                  "number_of_shards": 1,
                  "analysis": {
                  "tokenizer": {
                  "autocomplete_tokenizer": {
                  "type": "edge_ngram",
                  "min_gram": 1,
                  "max_gram": 20
                  }
                  },
                  "analyzer": {
                  "autocomplete": {
                  "type": "custom",
                  "tokenizer": "autocomplete_tokenizer",
                  "filter": [
                  "lowercase"
                  ]
                  }
                  }
                  }
                  },
                  "mappings": {
                  "doc": {
                  "properties": {
                  "name": {
                  "type": "string",
                  "analyzer": "autocomplete"
                  }
                  }
                  }
                  }
                  }


                  You will see that this query is no longer a hit:



                  GET my_index/_search
                  {
                  "query": {
                  "match": {
                  "name": {
                  "query": "ho",
                  "type": "phrase"
                  }
                  }
                  }
                  }


                  But for example this one is:



                  GET my_index/_search
                  {
                  "query": {
                  "match": {
                  "name": {
                  "query": "he",
                  "type": "phrase"
                  }
                  }
                  }
                  }






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered yesterday









                  Mohammad Karmi

                  4081517




                  4081517






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945046%2fmatch-phrase-query-not-working-as-expected%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Monofisismo

                      Angular Downloading a file using contenturl with Basic Authentication

                      Olmecas