SQL - Execution Plan - Why is a Hash Match used for the JOIN operation?

Quick question - why is there a Hash Match used for the following query?

BusinessentityID is of type Int and at this point I currently can't really explain why they need to use hashing at all?

Wouldn't a merge be faster with a trivial array like data structure?

Although I guess it would make sense in case when a different data type e.g a GUID is used in the join clause?

enter image description here

asked Dec 27 at 12:35

HansMusterWhatElse

271519

1

Please don't post images of code; you have enough reputation to know that code should be supplied as text. As for why, please also include the DDL of your tables including all indexes. Thanks.
– Larnu
Dec 27 at 12:38

You'll get a MERGE JOIN if you drop the covering IX_Person_LastName_FirstName_MiddleName index from the Person.Person table. The optimizer used the covering index and a hash join due because it was less costly.
– Dan Guzman
Dec 27 at 12:45

1

TBH it doesn't sound like you understand what a hash or merge join are from your question. you might want to read blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join and blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join
– Martin Smith
Dec 27 at 12:49

@MartinSmith thanks a lot! I will have a look at those!
– HansMusterWhatElse
Dec 27 at 13:16

add a comment |

Quick question - why is there a Hash Match used for the following query?

BusinessentityID is of type Int and at this point I currently can't really explain why they need to use hashing at all?

Wouldn't a merge be faster with a trivial array like data structure?

Although I guess it would make sense in case when a different data type e.g a GUID is used in the join clause?

enter image description here

asked Dec 27 at 12:35

HansMusterWhatElse

271519

1

Please don't post images of code; you have enough reputation to know that code should be supplied as text. As for why, please also include the DDL of your tables including all indexes. Thanks.
– Larnu
Dec 27 at 12:38

You'll get a MERGE JOIN if you drop the covering IX_Person_LastName_FirstName_MiddleName index from the Person.Person table. The optimizer used the covering index and a hash join due because it was less costly.
– Dan Guzman
Dec 27 at 12:45

1

TBH it doesn't sound like you understand what a hash or merge join are from your question. you might want to read blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join and blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join
– Martin Smith
Dec 27 at 12:49

@MartinSmith thanks a lot! I will have a look at those!
– HansMusterWhatElse
Dec 27 at 13:16

add a comment |

Quick question - why is there a Hash Match used for the following query?

BusinessentityID is of type Int and at this point I currently can't really explain why they need to use hashing at all?

Wouldn't a merge be faster with a trivial array like data structure?

Although I guess it would make sense in case when a different data type e.g a GUID is used in the join clause?

enter image description here

asked Dec 27 at 12:35

HansMusterWhatElse

271519

Quick question - why is there a Hash Match used for the following query?

BusinessentityID is of type Int and at this point I currently can't really explain why they need to use hashing at all?

Wouldn't a merge be faster with a trivial array like data structure?

Although I guess it would make sense in case when a different data type e.g a GUID is used in the join clause?

enter image description here

sql sql-server hash inner-join sql-execution-plan

asked Dec 27 at 12:35

HansMusterWhatElse

271519

asked Dec 27 at 12:35

HansMusterWhatElse

271519

asked Dec 27 at 12:35

HansMusterWhatElse

271519

asked Dec 27 at 12:35

HansMusterWhatElse

271519

asked Dec 27 at 12:35

HansMusterWhatElse

271519

1

Please don't post images of code; you have enough reputation to know that code should be supplied as text. As for why, please also include the DDL of your tables including all indexes. Thanks.
– Larnu
Dec 27 at 12:38

You'll get a MERGE JOIN if you drop the covering IX_Person_LastName_FirstName_MiddleName index from the Person.Person table. The optimizer used the covering index and a hash join due because it was less costly.
– Dan Guzman
Dec 27 at 12:45

1

TBH it doesn't sound like you understand what a hash or merge join are from your question. you might want to read blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join and blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join
– Martin Smith
Dec 27 at 12:49

@MartinSmith thanks a lot! I will have a look at those!
– HansMusterWhatElse
Dec 27 at 13:16

add a comment |

1

Please don't post images of code; you have enough reputation to know that code should be supplied as text. As for why, please also include the DDL of your tables including all indexes. Thanks.
– Larnu
Dec 27 at 12:38

You'll get a MERGE JOIN if you drop the covering IX_Person_LastName_FirstName_MiddleName index from the Person.Person table. The optimizer used the covering index and a hash join due because it was less costly.
– Dan Guzman
Dec 27 at 12:45

1

TBH it doesn't sound like you understand what a hash or merge join are from your question. you might want to read blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join and blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join
– Martin Smith
Dec 27 at 12:49

@MartinSmith thanks a lot! I will have a look at those!
– HansMusterWhatElse
Dec 27 at 13:16

Please don't post images of code; you have enough reputation to know that code should be supplied as text. As for why, please also include the DDL of your tables including all indexes. Thanks.
– Larnu
Dec 27 at 12:38

You'll get a MERGE JOIN if you drop the covering IX_Person_LastName_FirstName_MiddleName index from the Person.Person table. The optimizer used the covering index and a hash join due because it was less costly.
– Dan Guzman
Dec 27 at 12:45

TBH it doesn't sound like you understand what a hash or merge join are from your question. you might want to read blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join and blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join
– Martin Smith
Dec 27 at 12:49

@MartinSmith thanks a lot! I will have a look at those!
– HansMusterWhatElse
Dec 27 at 13:16

add a comment |

1 Answer
1

active

oldest

votes

Generally, the fastest type of join is going to use indexes. So, if you care about performance, add indexes on businessentityid.

A merge join is used when the data is already sorted. If it is not sorted, then the sort is rather expensive -- O(n log(n)).

A hash join, by contrast, requires "just" hashing the values and comparing them in bins. Under many circumstances, this is O(n). Of course, if your data is really large or if some values repeat many times, then the hash join will have memory issues or hash collisions -- and the complexity of the algorithm increases.

The key point, though, is that the optimizer chooses the algorithm that it considers best suited to the data. If performance is your primary consideration, then an index is highly recommended.

edited Dec 27 at 12:59

answered Dec 27 at 12:40

Gordon Linoff

758k35291399

One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
– IMSoP
Dec 27 at 12:48

The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
– HansMusterWhatElse
Dec 27 at 13:17

1

@HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
– Gordon Linoff
Dec 27 at 13:19

1

@HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
– Damien_The_Unbeliever
Dec 27 at 13:20

@GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
– HansMusterWhatElse
Dec 27 at 13:20

|
show 3 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945219%2fsql-execution-plan-why-is-a-hash-match-used-for-the-join-operation%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Generally, the fastest type of join is going to use indexes. So, if you care about performance, add indexes on businessentityid.

A merge join is used when the data is already sorted. If it is not sorted, then the sort is rather expensive -- O(n log(n)).

The key point, though, is that the optimizer chooses the algorithm that it considers best suited to the data. If performance is your primary consideration, then an index is highly recommended.

edited Dec 27 at 12:59

answered Dec 27 at 12:40

Gordon Linoff

758k35291399

One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
– IMSoP
Dec 27 at 12:48

The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
– HansMusterWhatElse
Dec 27 at 13:17

1

@HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
– Gordon Linoff
Dec 27 at 13:19

1

@HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
– Damien_The_Unbeliever
Dec 27 at 13:20

@GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
– HansMusterWhatElse
Dec 27 at 13:20

|
show 3 more comments

Generally, the fastest type of join is going to use indexes. So, if you care about performance, add indexes on businessentityid.

A merge join is used when the data is already sorted. If it is not sorted, then the sort is rather expensive -- O(n log(n)).

The key point, though, is that the optimizer chooses the algorithm that it considers best suited to the data. If performance is your primary consideration, then an index is highly recommended.

edited Dec 27 at 12:59

answered Dec 27 at 12:40

Gordon Linoff

758k35291399

One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
– IMSoP
Dec 27 at 12:48

The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
– HansMusterWhatElse
Dec 27 at 13:17

1

@HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
– Gordon Linoff
Dec 27 at 13:19

1

@HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
– Damien_The_Unbeliever
Dec 27 at 13:20

@GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
– HansMusterWhatElse
Dec 27 at 13:20

|
show 3 more comments

Generally, the fastest type of join is going to use indexes. So, if you care about performance, add indexes on businessentityid.

A merge join is used when the data is already sorted. If it is not sorted, then the sort is rather expensive -- O(n log(n)).

The key point, though, is that the optimizer chooses the algorithm that it considers best suited to the data. If performance is your primary consideration, then an index is highly recommended.

edited Dec 27 at 12:59

answered Dec 27 at 12:40

Gordon Linoff

758k35291399

Generally, the fastest type of join is going to use indexes. So, if you care about performance, add indexes on businessentityid.

A merge join is used when the data is already sorted. If it is not sorted, then the sort is rather expensive -- O(n log(n)).

The key point, though, is that the optimizer chooses the algorithm that it considers best suited to the data. If performance is your primary consideration, then an index is highly recommended.

edited Dec 27 at 12:59

answered Dec 27 at 12:40

Gordon Linoff

758k35291399

edited Dec 27 at 12:59

answered Dec 27 at 12:40

Gordon Linoff

758k35291399

answered Dec 27 at 12:40

Gordon Linoff

758k35291399

answered Dec 27 at 12:40

Gordon Linoff

758k35291399

One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
– IMSoP
Dec 27 at 12:48

The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
– HansMusterWhatElse
Dec 27 at 13:17

1

@HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
– Gordon Linoff
Dec 27 at 13:19

1

@HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
– Damien_The_Unbeliever
Dec 27 at 13:20

@GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
– HansMusterWhatElse
Dec 27 at 13:20

|
show 3 more comments

One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
– IMSoP
Dec 27 at 12:48

The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
– HansMusterWhatElse
Dec 27 at 13:17

1

@HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
– Gordon Linoff
Dec 27 at 13:19

1

@HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
– Damien_The_Unbeliever
Dec 27 at 13:20

@GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
– HansMusterWhatElse
Dec 27 at 13:20

One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
– IMSoP
Dec 27 at 12:48

The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
– HansMusterWhatElse
Dec 27 at 13:17

@HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
– Gordon Linoff
Dec 27 at 13:19

@HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
– Damien_The_Unbeliever
Dec 27 at 13:20

@GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
– HansMusterWhatElse
Dec 27 at 13:20

|
show 3 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk