match phrase query not working as expected
Reading from elastic documentation:
the
match_phrase
query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.
I have configured my analyzer to use edge_ngram with keyword tokenizer :
{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
Here is the java class that is used for indexing :
@Document(indexName = "myindex", type = "program")
@Getter
@Setter
@Setting(settingPath = "/elasticsearch/settings.json")
public class Program {
@org.springframework.data.annotation.Id
private Long instanceId;
@Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )
private String name;
}
if I have the following phrase in document "hello world", the following query will match it :
{
"match" : {
"name" : {
"query" : "ho",
"type" : "phrase"
}
}
}
result : "hello world"
that's not what I expect because not all of the search terms in the document.
my questions :
1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )
2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)
update:
just in case that the question is not clear. The match phrase query should analyze the string to list of terms , here it's ho
. Now we will have 2 terms as this is edge_ngram with 1
min_gram. The 2 terms are h
and ho
. according to elasticsearch the document must contain all of the search terms. However hello world
has h
only and doesn't have ho
so why I did get a match here ?
elasticsearch spring-data-elasticsearch elasticsearch-2.0
add a comment |
Reading from elastic documentation:
the
match_phrase
query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.
I have configured my analyzer to use edge_ngram with keyword tokenizer :
{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
Here is the java class that is used for indexing :
@Document(indexName = "myindex", type = "program")
@Getter
@Setter
@Setting(settingPath = "/elasticsearch/settings.json")
public class Program {
@org.springframework.data.annotation.Id
private Long instanceId;
@Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )
private String name;
}
if I have the following phrase in document "hello world", the following query will match it :
{
"match" : {
"name" : {
"query" : "ho",
"type" : "phrase"
}
}
}
result : "hello world"
that's not what I expect because not all of the search terms in the document.
my questions :
1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )
2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)
update:
just in case that the question is not clear. The match phrase query should analyze the string to list of terms , here it's ho
. Now we will have 2 terms as this is edge_ngram with 1
min_gram. The 2 terms are h
and ho
. according to elasticsearch the document must contain all of the search terms. However hello world
has h
only and doesn't have ho
so why I did get a match here ?
elasticsearch spring-data-elasticsearch elasticsearch-2.0
(1) You haven't added index mapping, neither you have specified the type forname
field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
– Nishant Saini
2 days ago
@NishantSaini updated the question
– Mohammad Karmi
2 days ago
add a comment |
Reading from elastic documentation:
the
match_phrase
query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.
I have configured my analyzer to use edge_ngram with keyword tokenizer :
{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
Here is the java class that is used for indexing :
@Document(indexName = "myindex", type = "program")
@Getter
@Setter
@Setting(settingPath = "/elasticsearch/settings.json")
public class Program {
@org.springframework.data.annotation.Id
private Long instanceId;
@Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )
private String name;
}
if I have the following phrase in document "hello world", the following query will match it :
{
"match" : {
"name" : {
"query" : "ho",
"type" : "phrase"
}
}
}
result : "hello world"
that's not what I expect because not all of the search terms in the document.
my questions :
1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )
2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)
update:
just in case that the question is not clear. The match phrase query should analyze the string to list of terms , here it's ho
. Now we will have 2 terms as this is edge_ngram with 1
min_gram. The 2 terms are h
and ho
. according to elasticsearch the document must contain all of the search terms. However hello world
has h
only and doesn't have ho
so why I did get a match here ?
elasticsearch spring-data-elasticsearch elasticsearch-2.0
Reading from elastic documentation:
the
match_phrase
query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.
I have configured my analyzer to use edge_ngram with keyword tokenizer :
{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
Here is the java class that is used for indexing :
@Document(indexName = "myindex", type = "program")
@Getter
@Setter
@Setting(settingPath = "/elasticsearch/settings.json")
public class Program {
@org.springframework.data.annotation.Id
private Long instanceId;
@Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )
private String name;
}
if I have the following phrase in document "hello world", the following query will match it :
{
"match" : {
"name" : {
"query" : "ho",
"type" : "phrase"
}
}
}
result : "hello world"
that's not what I expect because not all of the search terms in the document.
my questions :
1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )
2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)
update:
just in case that the question is not clear. The match phrase query should analyze the string to list of terms , here it's ho
. Now we will have 2 terms as this is edge_ngram with 1
min_gram. The 2 terms are h
and ho
. according to elasticsearch the document must contain all of the search terms. However hello world
has h
only and doesn't have ho
so why I did get a match here ?
elasticsearch spring-data-elasticsearch elasticsearch-2.0
elasticsearch spring-data-elasticsearch elasticsearch-2.0
edited yesterday
asked 2 days ago
Mohammad Karmi
4081517
4081517
(1) You haven't added index mapping, neither you have specified the type forname
field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
– Nishant Saini
2 days ago
@NishantSaini updated the question
– Mohammad Karmi
2 days ago
add a comment |
(1) You haven't added index mapping, neither you have specified the type forname
field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.
– Nishant Saini
2 days ago
@NishantSaini updated the question
– Mohammad Karmi
2 days ago
(1) You haven't added index mapping, neither you have specified the type for
name
field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.– Nishant Saini
2 days ago
(1) You haven't added index mapping, neither you have specified the type for
name
field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.– Nishant Saini
2 days ago
@NishantSaini updated the question
– Mohammad Karmi
2 days ago
@NishantSaini updated the question
– Mohammad Karmi
2 days ago
add a comment |
3 Answers
3
active
oldest
votes
If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:
PUT test
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete"
}
}
}
}
}
PUT test/_doc/1
{
"name": "Hello world"
}
GET test/_search
{
"query": {
"match_phrase": {
"name": "hello foo"
}
}
}
Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.
- I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?
- Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined
min_gram: 1
, yourho
will be searched ash
andho
. Theh
matches theh
fromhello
.match
ormatch_phrase
doesn't make a difference here with this analyzer.
I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday
1
to make it more clear the query analyze the string to list of termsh
andho
, in the match query it will match theh
fromhello
no problem here. but in match_phrase all search terms should match butho
doesn't is that right ?
– Mohammad Karmi
yesterday
add a comment |
If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.
Structured Text Tokenizers
add a comment |
I have got the answer from elasticsearch forum :
You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho"
. Assuming your index is called my_index
:
GET my_index/_analyze
{
"text": "ho",
"analyzer": "autocomplete"
}
The response shows you that the output of your analyzer would be two tokens at position 0:
{
"tokens": [
{
"token": "h",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "ho",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
}
]
}
What does Elasticsearch do with a query for two tokens at the same position? It treat's the query as an "OR", even if you use a type "phrase"
. You can see that from the output of the validate API (which shows you the Lucene query that your query was written into):
GET my_index/_validate/query?rewrite=true
{
"query": {
"match": {
"name": {
"query": "ho",
"type": "phrase"
}
}
}
}
Because both your query and your document have an h
at position 0, the document is going to be a hit.
Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.
So, if you create your index like this instead:
PUT my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "autocomplete_tokenizer",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete"
}
}
}
}
}
You will see that this query is no longer a hit:
GET my_index/_search
{
"query": {
"match": {
"name": {
"query": "ho",
"type": "phrase"
}
}
}
}
But for example this one is:
GET my_index/_search
{
"query": {
"match": {
"name": {
"query": "he",
"type": "phrase"
}
}
}
}
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945046%2fmatch-phrase-query-not-working-as-expected%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:
PUT test
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete"
}
}
}
}
}
PUT test/_doc/1
{
"name": "Hello world"
}
GET test/_search
{
"query": {
"match_phrase": {
"name": "hello foo"
}
}
}
Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.
- I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?
- Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined
min_gram: 1
, yourho
will be searched ash
andho
. Theh
matches theh
fromhello
.match
ormatch_phrase
doesn't make a difference here with this analyzer.
I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday
1
to make it more clear the query analyze the string to list of termsh
andho
, in the match query it will match theh
fromhello
no problem here. but in match_phrase all search terms should match butho
doesn't is that right ?
– Mohammad Karmi
yesterday
add a comment |
If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:
PUT test
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete"
}
}
}
}
}
PUT test/_doc/1
{
"name": "Hello world"
}
GET test/_search
{
"query": {
"match_phrase": {
"name": "hello foo"
}
}
}
Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.
- I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?
- Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined
min_gram: 1
, yourho
will be searched ash
andho
. Theh
matches theh
fromhello
.match
ormatch_phrase
doesn't make a difference here with this analyzer.
I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday
1
to make it more clear the query analyze the string to list of termsh
andho
, in the match query it will match theh
fromhello
no problem here. but in match_phrase all search terms should match butho
doesn't is that right ?
– Mohammad Karmi
yesterday
add a comment |
If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:
PUT test
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete"
}
}
}
}
}
PUT test/_doc/1
{
"name": "Hello world"
}
GET test/_search
{
"query": {
"match_phrase": {
"name": "hello foo"
}
}
}
Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.
- I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?
- Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined
min_gram: 1
, yourho
will be searched ash
andho
. Theh
matches theh
fromhello
.match
ormatch_phrase
doesn't make a difference here with this analyzer.
If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:
PUT test
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete"
}
}
}
}
}
PUT test/_doc/1
{
"name": "Hello world"
}
GET test/_search
{
"query": {
"match_phrase": {
"name": "hello foo"
}
}
}
Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.
- I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?
- Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined
min_gram: 1
, yourho
will be searched ash
andho
. Theh
matches theh
fromhello
.match
ormatch_phrase
doesn't make a difference here with this analyzer.
answered 2 days ago
xeraa
6,40932254
6,40932254
I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday
1
to make it more clear the query analyze the string to list of termsh
andho
, in the match query it will match theh
fromhello
no problem here. but in match_phrase all search terms should match butho
doesn't is that right ?
– Mohammad Karmi
yesterday
add a comment |
I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday
1
to make it more clear the query analyze the string to list of termsh
andho
, in the match query it will match theh
fromhello
no problem here. but in match_phrase all search terms should match butho
doesn't is that right ?
– Mohammad Karmi
yesterday
I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday
I'm using elastic 2.x right. I'm not trying to achieve something here but I'm trying to understand the match_phrase. the answer for my question is in 4, why does match and match_phrase doesn't make difference ? according to the doc all search terms should contain in the document but "ho" is not there . is not that how match_phrase work ?
– Mohammad Karmi
yesterday
1
1
to make it more clear the query analyze the string to list of terms
h
and ho
, in the match query it will match the h
from hello
no problem here. but in match_phrase all search terms should match but ho
doesn't is that right ?– Mohammad Karmi
yesterday
to make it more clear the query analyze the string to list of terms
h
and ho
, in the match query it will match the h
from hello
no problem here. but in match_phrase all search terms should match but ho
doesn't is that right ?– Mohammad Karmi
yesterday
add a comment |
If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.
Structured Text Tokenizers
add a comment |
If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.
Structured Text Tokenizers
add a comment |
If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.
Structured Text Tokenizers
If i understand your questions, tokenizer is the problem, "tokenizer": "keyword", search exact phrase and index like one.
Structured Text Tokenizers
answered 2 days ago
M. Mis
82
82
add a comment |
add a comment |
I have got the answer from elasticsearch forum :
You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho"
. Assuming your index is called my_index
:
GET my_index/_analyze
{
"text": "ho",
"analyzer": "autocomplete"
}
The response shows you that the output of your analyzer would be two tokens at position 0:
{
"tokens": [
{
"token": "h",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "ho",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
}
]
}
What does Elasticsearch do with a query for two tokens at the same position? It treat's the query as an "OR", even if you use a type "phrase"
. You can see that from the output of the validate API (which shows you the Lucene query that your query was written into):
GET my_index/_validate/query?rewrite=true
{
"query": {
"match": {
"name": {
"query": "ho",
"type": "phrase"
}
}
}
}
Because both your query and your document have an h
at position 0, the document is going to be a hit.
Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.
So, if you create your index like this instead:
PUT my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "autocomplete_tokenizer",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete"
}
}
}
}
}
You will see that this query is no longer a hit:
GET my_index/_search
{
"query": {
"match": {
"name": {
"query": "ho",
"type": "phrase"
}
}
}
}
But for example this one is:
GET my_index/_search
{
"query": {
"match": {
"name": {
"query": "he",
"type": "phrase"
}
}
}
}
add a comment |
I have got the answer from elasticsearch forum :
You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho"
. Assuming your index is called my_index
:
GET my_index/_analyze
{
"text": "ho",
"analyzer": "autocomplete"
}
The response shows you that the output of your analyzer would be two tokens at position 0:
{
"tokens": [
{
"token": "h",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "ho",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
}
]
}
What does Elasticsearch do with a query for two tokens at the same position? It treat's the query as an "OR", even if you use a type "phrase"
. You can see that from the output of the validate API (which shows you the Lucene query that your query was written into):
GET my_index/_validate/query?rewrite=true
{
"query": {
"match": {
"name": {
"query": "ho",
"type": "phrase"
}
}
}
}
Because both your query and your document have an h
at position 0, the document is going to be a hit.
Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.
So, if you create your index like this instead:
PUT my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "autocomplete_tokenizer",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete"
}
}
}
}
}
You will see that this query is no longer a hit:
GET my_index/_search
{
"query": {
"match": {
"name": {
"query": "ho",
"type": "phrase"
}
}
}
}
But for example this one is:
GET my_index/_search
{
"query": {
"match": {
"name": {
"query": "he",
"type": "phrase"
}
}
}
}
add a comment |
I have got the answer from elasticsearch forum :
You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho"
. Assuming your index is called my_index
:
GET my_index/_analyze
{
"text": "ho",
"analyzer": "autocomplete"
}
The response shows you that the output of your analyzer would be two tokens at position 0:
{
"tokens": [
{
"token": "h",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "ho",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
}
]
}
What does Elasticsearch do with a query for two tokens at the same position? It treat's the query as an "OR", even if you use a type "phrase"
. You can see that from the output of the validate API (which shows you the Lucene query that your query was written into):
GET my_index/_validate/query?rewrite=true
{
"query": {
"match": {
"name": {
"query": "ho",
"type": "phrase"
}
}
}
}
Because both your query and your document have an h
at position 0, the document is going to be a hit.
Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.
So, if you create your index like this instead:
PUT my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "autocomplete_tokenizer",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete"
}
}
}
}
}
You will see that this query is no longer a hit:
GET my_index/_search
{
"query": {
"match": {
"name": {
"query": "ho",
"type": "phrase"
}
}
}
}
But for example this one is:
GET my_index/_search
{
"query": {
"match": {
"name": {
"query": "he",
"type": "phrase"
}
}
}
}
I have got the answer from elasticsearch forum :
You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho"
. Assuming your index is called my_index
:
GET my_index/_analyze
{
"text": "ho",
"analyzer": "autocomplete"
}
The response shows you that the output of your analyzer would be two tokens at position 0:
{
"tokens": [
{
"token": "h",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "ho",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
}
]
}
What does Elasticsearch do with a query for two tokens at the same position? It treat's the query as an "OR", even if you use a type "phrase"
. You can see that from the output of the validate API (which shows you the Lucene query that your query was written into):
GET my_index/_validate/query?rewrite=true
{
"query": {
"match": {
"name": {
"query": "ho",
"type": "phrase"
}
}
}
}
Because both your query and your document have an h
at position 0, the document is going to be a hit.
Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.
So, if you create your index like this instead:
PUT my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "autocomplete_tokenizer",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete"
}
}
}
}
}
You will see that this query is no longer a hit:
GET my_index/_search
{
"query": {
"match": {
"name": {
"query": "ho",
"type": "phrase"
}
}
}
}
But for example this one is:
GET my_index/_search
{
"query": {
"match": {
"name": {
"query": "he",
"type": "phrase"
}
}
}
}
answered yesterday
Mohammad Karmi
4081517
4081517
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945046%2fmatch-phrase-query-not-working-as-expected%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
(1) You haven't added index mapping, neither you have specified the type for
name
field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better.– Nishant Saini
2 days ago
@NishantSaini updated the question
– Mohammad Karmi
2 days ago