match phrase query not working as expected

Reading from elastic documentation:

the match_phrase query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.

I have configured my analyzer to use edge_ngram with keyword tokenizer :

{

        "index": {

            "number_of_shards": 1,

            "analysis": {

                "filter": {

                    "autocomplete_filter": {

                        "type": "edge_ngram",

                        "min_gram": 1,

                        "max_gram": 20

                    }

                },

                "analyzer": {

                    "autocomplete": {

                        "type": "custom",

                        "tokenizer": "keyword",

                        "filter": [

                            "lowercase",

                            "autocomplete_filter"

                        ]

                    }

                }

            }

        }

    }

Here is the java class that is used for indexing :

@Document(indexName = "myindex", type = "program")

@Getter

@Setter

@Setting(settingPath = "/elasticsearch/settings.json")

public class Program {





    @org.springframework.data.annotation.Id

    private Long instanceId;



    @Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )

    private String name;

}

if I have the following phrase in document "hello world", the following query will match it :

{

  "match" : {

    "name" : {

      "query" : "ho",

      "type" : "phrase"

    }

  }

}

result : "hello world"

that's not what I expect because not all of the search terms in the document.

my questions :

1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )

2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)

update:

just in case that the question is not clear. The match phrase query should analyze the string to list of terms , here it's ho . Now we will have 2 terms as this is edge_ngram with 1 min_gram. The 2 terms are h and ho . according to elasticsearch the document must contain all of the search terms. However hello world has h only and doesn't have ho so why I did get a match here ?

edited yesterday

asked 2 days ago

Mohammad Karmi

4081517

(1) You haven't added index mapping, neither you have specified the type for name field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
– Nishant Saini
2 days ago

@NishantSaini updated the question
– Mohammad Karmi
2 days ago

add a comment |

Reading from elastic documentation:

the match_phrase query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.

I have configured my analyzer to use edge_ngram with keyword tokenizer :

{

        "index": {

            "number_of_shards": 1,

            "analysis": {

                "filter": {

                    "autocomplete_filter": {

                        "type": "edge_ngram",

                        "min_gram": 1,

                        "max_gram": 20

                    }

                },

                "analyzer": {

                    "autocomplete": {

                        "type": "custom",

                        "tokenizer": "keyword",

                        "filter": [

                            "lowercase",

                            "autocomplete_filter"

                        ]

                    }

                }

            }

        }

    }

Here is the java class that is used for indexing :

@Document(indexName = "myindex", type = "program")

@Getter

@Setter

@Setting(settingPath = "/elasticsearch/settings.json")

public class Program {





    @org.springframework.data.annotation.Id

    private Long instanceId;



    @Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )

    private String name;

}

if I have the following phrase in document "hello world", the following query will match it :

{

  "match" : {

    "name" : {

      "query" : "ho",

      "type" : "phrase"

    }

  }

}

result : "hello world"

that's not what I expect because not all of the search terms in the document.

my questions :

1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )

2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)

update:

edited yesterday

asked 2 days ago

Mohammad Karmi

4081517

(1) You haven't added index mapping, neither you have specified the type for name field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
– Nishant Saini
2 days ago

@NishantSaini updated the question
– Mohammad Karmi
2 days ago

add a comment |

Reading from elastic documentation:

the match_phrase query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.

I have configured my analyzer to use edge_ngram with keyword tokenizer :

{

        "index": {

            "number_of_shards": 1,

            "analysis": {

                "filter": {

                    "autocomplete_filter": {

                        "type": "edge_ngram",

                        "min_gram": 1,

                        "max_gram": 20

                    }

                },

                "analyzer": {

                    "autocomplete": {

                        "type": "custom",

                        "tokenizer": "keyword",

                        "filter": [

                            "lowercase",

                            "autocomplete_filter"

                        ]

                    }

                }

            }

        }

    }

Here is the java class that is used for indexing :

@Document(indexName = "myindex", type = "program")

@Getter

@Setter

@Setting(settingPath = "/elasticsearch/settings.json")

public class Program {





    @org.springframework.data.annotation.Id

    private Long instanceId;



    @Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )

    private String name;

}

if I have the following phrase in document "hello world", the following query will match it :

{

  "match" : {

    "name" : {

      "query" : "ho",

      "type" : "phrase"

    }

  }

}

result : "hello world"

that's not what I expect because not all of the search terms in the document.

my questions :

1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )

2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)

update:

edited yesterday

asked 2 days ago

Mohammad Karmi

4081517

Reading from elastic documentation:

the match_phrase query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.

I have configured my analyzer to use edge_ngram with keyword tokenizer :

{

        "index": {

            "number_of_shards": 1,

            "analysis": {

                "filter": {

                    "autocomplete_filter": {

                        "type": "edge_ngram",

                        "min_gram": 1,

                        "max_gram": 20

                    }

                },

                "analyzer": {

                    "autocomplete": {

                        "type": "custom",

                        "tokenizer": "keyword",

                        "filter": [

                            "lowercase",

                            "autocomplete_filter"

                        ]

                    }

                }

            }

        }

    }

Here is the java class that is used for indexing :

@Document(indexName = "myindex", type = "program")

@Getter

@Setter

@Setting(settingPath = "/elasticsearch/settings.json")

public class Program {





    @org.springframework.data.annotation.Id

    private Long instanceId;



    @Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )

    private String name;

}

if I have the following phrase in document "hello world", the following query will match it :

{

  "match" : {

    "name" : {

      "query" : "ho",

      "type" : "phrase"

    }

  }

}

result : "hello world"

that's not what I expect because not all of the search terms in the document.

my questions :

1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )

2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)

update:

elasticsearch spring-data-elasticsearch elasticsearch-2.0

edited yesterday

asked 2 days ago

Mohammad Karmi

4081517

edited yesterday

asked 2 days ago

Mohammad Karmi

4081517

edited yesterday

asked 2 days ago

Mohammad Karmi

4081517

asked 2 days ago

Mohammad Karmi

4081517

asked 2 days ago

Mohammad Karmi

4081517

(1) You haven't added index mapping, neither you have specified the type for name field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
– Nishant Saini
2 days ago

@NishantSaini updated the question
– Mohammad Karmi
2 days ago

add a comment |

(1) You haven't added index mapping, neither you have specified the type for name field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
– Nishant Saini
2 days ago

@NishantSaini updated the question
– Mohammad Karmi
2 days ago

(1) You haven't added index mapping, neither you have specified the type for name field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
– Nishant Saini
2 days ago

@NishantSaini updated the question
– Mohammad Karmi
2 days ago

add a comment |

3 Answers
3

active

oldest

votes

If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:

PUT test

{

  "settings": {

    "number_of_shards": 1,

    "analysis": {

      "filter": {

        "autocomplete_filter": {

          "type": "edge_ngram",

          "min_gram": 1,

          "max_gram": 20

        }

      },

      "analyzer": {

        "autocomplete": {

          "type": "custom",

          "tokenizer": "keyword",

          "filter": [

            "lowercase",

            "autocomplete_filter"

          ]

        }

      }

    }

  },

  "mappings": {

    "_doc": {

      "properties": {

        "name": {

          "type": "text",

          "analyzer": "autocomplete"

        }

      }

    }

  }

}



PUT test/_doc/1

{

  "name": "Hello world"

}



GET test/_search

{

  "query": {

    "match_phrase": {

      "name": "hello foo"

    }

  }

}

Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.

I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?

Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined min_gram: 1, your ho will be searched as h and ho. The h matches the h from hello. match or match_phrase doesn't make a difference here with this analyzer.

answered 2 days ago

xeraa

6,40932254

I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday

1

to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
– Mohammad Karmi
yesterday

add a comment |

If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.

Structured Text Tokenizers

answered 2 days ago

M. Mis

add a comment |

I have got the answer from elasticsearch forum :

You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho" . Assuming your index is called my_index :

GET my_index/_analyze

{

  "text": "ho",

  "analyzer": "autocomplete"

}

The response shows you that the output of your analyzer would be two tokens at position 0:

{

  "tokens": [

    {

      "token": "h",

      "start_offset": 0,

      "end_offset": 2,

      "type": "word",

      "position": 0

    },

    {

      "token": "ho",

      "start_offset": 0,

      "end_offset": 2,

      "type": "word",

      "position": 0

    }

  ]

}

What does Elasticsearch do with a query for two tokens at the same position? It treat's the query as an "OR", even if you use a type "phrase" . You can see that from the output of the validate API (which shows you the Lucene query that your query was written into):

GET my_index/_validate/query?rewrite=true

{

  "query": {

    "match": {

      "name": {

        "query": "ho",

        "type": "phrase"

      }

    }

  }

}

Because both your query and your document have an h at position 0, the document is going to be a hit.

Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.

So, if you create your index like this instead:

PUT my_index

{

  "settings": {

    "number_of_shards": 1,

    "analysis": {

      "tokenizer": {

        "autocomplete_tokenizer": {

          "type": "edge_ngram",

          "min_gram": 1,

          "max_gram": 20

        }

      },

      "analyzer": {

        "autocomplete": {

          "type": "custom",

          "tokenizer": "autocomplete_tokenizer",

          "filter": [

            "lowercase"

          ]

        }

      }

    }

  },

  "mappings": {

    "doc": {

      "properties": {

        "name": {

          "type": "string",

          "analyzer": "autocomplete"

        }

      }

    }

  }

}

You will see that this query is no longer a hit:

GET my_index/_search

{

  "query": {

    "match": {

      "name": {

        "query": "ho",

        "type": "phrase"

      }

    }

  }

}

But for example this one is:

GET my_index/_search

{

  "query": {

    "match": {

      "name": {

        "query": "he",

        "type": "phrase"

      }

    }

  }

}

answered yesterday

Mohammad Karmi

4081517

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945046%2fmatch-phrase-query-not-working-as-expected%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:

PUT test

{

  "settings": {

    "number_of_shards": 1,

    "analysis": {

      "filter": {

        "autocomplete_filter": {

          "type": "edge_ngram",

          "min_gram": 1,

          "max_gram": 20

        }

      },

      "analyzer": {

        "autocomplete": {

          "type": "custom",

          "tokenizer": "keyword",

          "filter": [

            "lowercase",

            "autocomplete_filter"

          ]

        }

      }

    }

  },

  "mappings": {

    "_doc": {

      "properties": {

        "name": {

          "type": "text",

          "analyzer": "autocomplete"

        }

      }

    }

  }

}



PUT test/_doc/1

{

  "name": "Hello world"

}



GET test/_search

{

  "query": {

    "match_phrase": {

      "name": "hello foo"

    }

  }

}

Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.

I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?

Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined min_gram: 1, your ho will be searched as h and ho. The h matches the h from hello. match or match_phrase doesn't make a difference here with this analyzer.

answered 2 days ago

xeraa

6,40932254

I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday

1

to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
– Mohammad Karmi
yesterday

add a comment |

If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:

PUT test

{

  "settings": {

    "number_of_shards": 1,

    "analysis": {

      "filter": {

        "autocomplete_filter": {

          "type": "edge_ngram",

          "min_gram": 1,

          "max_gram": 20

        }

      },

      "analyzer": {

        "autocomplete": {

          "type": "custom",

          "tokenizer": "keyword",

          "filter": [

            "lowercase",

            "autocomplete_filter"

          ]

        }

      }

    }

  },

  "mappings": {

    "_doc": {

      "properties": {

        "name": {

          "type": "text",

          "analyzer": "autocomplete"

        }

      }

    }

  }

}



PUT test/_doc/1

{

  "name": "Hello world"

}



GET test/_search

{

  "query": {

    "match_phrase": {

      "name": "hello foo"

    }

  }

}

Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.

I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?

Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined min_gram: 1, your ho will be searched as h and ho. The h matches the h from hello. match or match_phrase doesn't make a difference here with this analyzer.

answered 2 days ago

xeraa

6,40932254

I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday

1

to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
– Mohammad Karmi
yesterday

add a comment |

If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:

PUT test

{

  "settings": {

    "number_of_shards": 1,

    "analysis": {

      "filter": {

        "autocomplete_filter": {

          "type": "edge_ngram",

          "min_gram": 1,

          "max_gram": 20

        }

      },

      "analyzer": {

        "autocomplete": {

          "type": "custom",

          "tokenizer": "keyword",

          "filter": [

            "lowercase",

            "autocomplete_filter"

          ]

        }

      }

    }

  },

  "mappings": {

    "_doc": {

      "properties": {

        "name": {

          "type": "text",

          "analyzer": "autocomplete"

        }

      }

    }

  }

}



PUT test/_doc/1

{

  "name": "Hello world"

}



GET test/_search

{

  "query": {

    "match_phrase": {

      "name": "hello foo"

    }

  }

}

Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.

I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?

Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined min_gram: 1, your ho will be searched as h and ho. The h matches the h from hello. match or match_phrase doesn't make a difference here with this analyzer.

answered 2 days ago

xeraa

6,40932254

If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:

PUT test

{

  "settings": {

    "number_of_shards": 1,

    "analysis": {

      "filter": {

        "autocomplete_filter": {

          "type": "edge_ngram",

          "min_gram": 1,

          "max_gram": 20

        }

      },

      "analyzer": {

        "autocomplete": {

          "type": "custom",

          "tokenizer": "keyword",

          "filter": [

            "lowercase",

            "autocomplete_filter"

          ]

        }

      }

    }

  },

  "mappings": {

    "_doc": {

      "properties": {

        "name": {

          "type": "text",

          "analyzer": "autocomplete"

        }

      }

    }

  }

}



PUT test/_doc/1

{

  "name": "Hello world"

}



GET test/_search

{

  "query": {

    "match_phrase": {

      "name": "hello foo"

    }

  }

}

Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.

I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?

Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined min_gram: 1, your ho will be searched as h and ho. The h matches the h from hello. match or match_phrase doesn't make a difference here with this analyzer.

answered 2 days ago

xeraa

6,40932254

answered 2 days ago

xeraa

6,40932254

answered 2 days ago

xeraa

6,40932254

answered 2 days ago

xeraa

6,40932254

I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday

1

to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
– Mohammad Karmi
yesterday

add a comment |

I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday

1

to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
– Mohammad Karmi
yesterday

I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday

to make it more clear the query analyze the string to list of terms h and ho , in the match query it will match the h from hello no problem here. but in match_phrase all search terms should match but ho doesn't is that right ?
– Mohammad Karmi
yesterday

add a comment |

If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.

Structured Text Tokenizers

answered 2 days ago

M. Mis

add a comment |

If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.

Structured Text Tokenizers

answered 2 days ago

M. Mis

add a comment |

If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.

Structured Text Tokenizers

answered 2 days ago

M. Mis

If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.

Structured Text Tokenizers

answered 2 days ago

M. Mis

answered 2 days ago

M. Mis

answered 2 days ago

M. Mis

answered 2 days ago

M. Mis

add a comment |

I have got the answer from elasticsearch forum :

You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho" . Assuming your index is called my_index :

GET my_index/_analyze

{

  "text": "ho",

  "analyzer": "autocomplete"

}

The response shows you that the output of your analyzer would be two tokens at position 0:

{

  "tokens": [

    {

      "token": "h",

      "start_offset": 0,

      "end_offset": 2,

      "type": "word",

      "position": 0

    },

    {

      "token": "ho",

      "start_offset": 0,

      "end_offset": 2,

      "type": "word",

      "position": 0

    }

  ]

}

GET my_index/_validate/query?rewrite=true

{

  "query": {

    "match": {

      "name": {

        "query": "ho",

        "type": "phrase"

      }

    }

  }

}

Because both your query and your document have an h at position 0, the document is going to be a hit.

Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.

So, if you create your index like this instead:

PUT my_index

{

  "settings": {

    "number_of_shards": 1,

    "analysis": {

      "tokenizer": {

        "autocomplete_tokenizer": {

          "type": "edge_ngram",

          "min_gram": 1,

          "max_gram": 20

        }

      },

      "analyzer": {

        "autocomplete": {

          "type": "custom",

          "tokenizer": "autocomplete_tokenizer",

          "filter": [

            "lowercase"

          ]

        }

      }

    }

  },

  "mappings": {

    "doc": {

      "properties": {

        "name": {

          "type": "string",

          "analyzer": "autocomplete"

        }

      }

    }

  }

}

You will see that this query is no longer a hit:

GET my_index/_search

{

  "query": {

    "match": {

      "name": {

        "query": "ho",

        "type": "phrase"

      }

    }

  }

}

But for example this one is:

GET my_index/_search

{

  "query": {

    "match": {

      "name": {

        "query": "he",

        "type": "phrase"

      }

    }

  }

}

answered yesterday

Mohammad Karmi

4081517

add a comment |

I have got the answer from elasticsearch forum :

You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho" . Assuming your index is called my_index :

GET my_index/_analyze

{

  "text": "ho",

  "analyzer": "autocomplete"

}

The response shows you that the output of your analyzer would be two tokens at position 0:

{

  "tokens": [

    {

      "token": "h",

      "start_offset": 0,

      "end_offset": 2,

      "type": "word",

      "position": 0

    },

    {

      "token": "ho",

      "start_offset": 0,

      "end_offset": 2,

      "type": "word",

      "position": 0

    }

  ]

}

GET my_index/_validate/query?rewrite=true

{

  "query": {

    "match": {

      "name": {

        "query": "ho",

        "type": "phrase"

      }

    }

  }

}

Because both your query and your document have an h at position 0, the document is going to be a hit.

Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.

So, if you create your index like this instead:

PUT my_index

{

  "settings": {

    "number_of_shards": 1,

    "analysis": {

      "tokenizer": {

        "autocomplete_tokenizer": {

          "type": "edge_ngram",

          "min_gram": 1,

          "max_gram": 20

        }

      },

      "analyzer": {

        "autocomplete": {

          "type": "custom",

          "tokenizer": "autocomplete_tokenizer",

          "filter": [

            "lowercase"

          ]

        }

      }

    }

  },

  "mappings": {

    "doc": {

      "properties": {

        "name": {

          "type": "string",

          "analyzer": "autocomplete"

        }

      }

    }

  }

}

You will see that this query is no longer a hit:

GET my_index/_search

{

  "query": {

    "match": {

      "name": {

        "query": "ho",

        "type": "phrase"

      }

    }

  }

}

But for example this one is:

GET my_index/_search

{

  "query": {

    "match": {

      "name": {

        "query": "he",

        "type": "phrase"

      }

    }

  }

}

answered yesterday

Mohammad Karmi

4081517

add a comment |

I have got the answer from elasticsearch forum :

You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho" . Assuming your index is called my_index :

GET my_index/_analyze

{

  "text": "ho",

  "analyzer": "autocomplete"

}

The response shows you that the output of your analyzer would be two tokens at position 0:

{

  "tokens": [

    {

      "token": "h",

      "start_offset": 0,

      "end_offset": 2,

      "type": "word",

      "position": 0

    },

    {

      "token": "ho",

      "start_offset": 0,

      "end_offset": 2,

      "type": "word",

      "position": 0

    }

  ]

}

GET my_index/_validate/query?rewrite=true

{

  "query": {

    "match": {

      "name": {

        "query": "ho",

        "type": "phrase"

      }

    }

  }

}

Because both your query and your document have an h at position 0, the document is going to be a hit.

Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.

So, if you create your index like this instead:

PUT my_index

{

  "settings": {

    "number_of_shards": 1,

    "analysis": {

      "tokenizer": {

        "autocomplete_tokenizer": {

          "type": "edge_ngram",

          "min_gram": 1,

          "max_gram": 20

        }

      },

      "analyzer": {

        "autocomplete": {

          "type": "custom",

          "tokenizer": "autocomplete_tokenizer",

          "filter": [

            "lowercase"

          ]

        }

      }

    }

  },

  "mappings": {

    "doc": {

      "properties": {

        "name": {

          "type": "string",

          "analyzer": "autocomplete"

        }

      }

    }

  }

}

You will see that this query is no longer a hit:

GET my_index/_search

{

  "query": {

    "match": {

      "name": {

        "query": "ho",

        "type": "phrase"

      }

    }

  }

}

But for example this one is:

GET my_index/_search

{

  "query": {

    "match": {

      "name": {

        "query": "he",

        "type": "phrase"

      }

    }

  }

}

answered yesterday

Mohammad Karmi

4081517

I have got the answer from elasticsearch forum :

You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho" . Assuming your index is called my_index :

GET my_index/_analyze

{

  "text": "ho",

  "analyzer": "autocomplete"

}

The response shows you that the output of your analyzer would be two tokens at position 0:

{

  "tokens": [

    {

      "token": "h",

      "start_offset": 0,

      "end_offset": 2,

      "type": "word",

      "position": 0

    },

    {

      "token": "ho",

      "start_offset": 0,

      "end_offset": 2,

      "type": "word",

      "position": 0

    }

  ]

}

GET my_index/_validate/query?rewrite=true

{

  "query": {

    "match": {

      "name": {

        "query": "ho",

        "type": "phrase"

      }

    }

  }

}

Because both your query and your document have an h at position 0, the document is going to be a hit.

Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.

So, if you create your index like this instead:

PUT my_index

{

  "settings": {

    "number_of_shards": 1,

    "analysis": {

      "tokenizer": {

        "autocomplete_tokenizer": {

          "type": "edge_ngram",

          "min_gram": 1,

          "max_gram": 20

        }

      },

      "analyzer": {

        "autocomplete": {

          "type": "custom",

          "tokenizer": "autocomplete_tokenizer",

          "filter": [

            "lowercase"

          ]

        }

      }

    }

  },

  "mappings": {

    "doc": {

      "properties": {

        "name": {

          "type": "string",

          "analyzer": "autocomplete"

        }

      }

    }

  }

}

You will see that this query is no longer a hit:

GET my_index/_search

{

  "query": {

    "match": {

      "name": {

        "query": "ho",

        "type": "phrase"

      }

    }

  }

}

But for example this one is:

GET my_index/_search

{

  "query": {

    "match": {

      "name": {

        "query": "he",

        "type": "phrase"

      }

    }

  }

}

answered yesterday

Mohammad Karmi

4081517

answered yesterday

Mohammad Karmi

4081517

answered yesterday

Mohammad Karmi

4081517

answered yesterday

Mohammad Karmi

4081517

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk