SQL - Execution Plan - Why is a Hash Match used for the JOIN operation?












0














Quick question - why is there a Hash Match used for the following query?



BusinessentityID is of type Int and at this point I currently can't really explain why they need to use hashing at all?



Wouldn't a merge be faster with a trivial array like data structure?



Although I guess it would make sense in case when a different data type e.g a GUID is used in the join clause?



enter image description here










share|improve this question


















  • 1




    Please don't post images of code; you have enough reputation to know that code should be supplied as text. As for why, please also include the DDL of your tables including all indexes. Thanks.
    – Larnu
    Dec 27 at 12:38










  • You'll get a MERGE JOIN if you drop the covering IX_Person_LastName_FirstName_MiddleName index from the Person.Person table. The optimizer used the covering index and a hash join due because it was less costly.
    – Dan Guzman
    Dec 27 at 12:45






  • 1




    TBH it doesn't sound like you understand what a hash or merge join are from your question. you might want to read blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join and blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join
    – Martin Smith
    Dec 27 at 12:49












  • @MartinSmith thanks a lot! I will have a look at those!
    – HansMusterWhatElse
    Dec 27 at 13:16
















0














Quick question - why is there a Hash Match used for the following query?



BusinessentityID is of type Int and at this point I currently can't really explain why they need to use hashing at all?



Wouldn't a merge be faster with a trivial array like data structure?



Although I guess it would make sense in case when a different data type e.g a GUID is used in the join clause?



enter image description here










share|improve this question


















  • 1




    Please don't post images of code; you have enough reputation to know that code should be supplied as text. As for why, please also include the DDL of your tables including all indexes. Thanks.
    – Larnu
    Dec 27 at 12:38










  • You'll get a MERGE JOIN if you drop the covering IX_Person_LastName_FirstName_MiddleName index from the Person.Person table. The optimizer used the covering index and a hash join due because it was less costly.
    – Dan Guzman
    Dec 27 at 12:45






  • 1




    TBH it doesn't sound like you understand what a hash or merge join are from your question. you might want to read blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join and blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join
    – Martin Smith
    Dec 27 at 12:49












  • @MartinSmith thanks a lot! I will have a look at those!
    – HansMusterWhatElse
    Dec 27 at 13:16














0












0








0







Quick question - why is there a Hash Match used for the following query?



BusinessentityID is of type Int and at this point I currently can't really explain why they need to use hashing at all?



Wouldn't a merge be faster with a trivial array like data structure?



Although I guess it would make sense in case when a different data type e.g a GUID is used in the join clause?



enter image description here










share|improve this question













Quick question - why is there a Hash Match used for the following query?



BusinessentityID is of type Int and at this point I currently can't really explain why they need to use hashing at all?



Wouldn't a merge be faster with a trivial array like data structure?



Although I guess it would make sense in case when a different data type e.g a GUID is used in the join clause?



enter image description here







sql sql-server hash inner-join sql-execution-plan






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 27 at 12:35









HansMusterWhatElse

271519




271519








  • 1




    Please don't post images of code; you have enough reputation to know that code should be supplied as text. As for why, please also include the DDL of your tables including all indexes. Thanks.
    – Larnu
    Dec 27 at 12:38










  • You'll get a MERGE JOIN if you drop the covering IX_Person_LastName_FirstName_MiddleName index from the Person.Person table. The optimizer used the covering index and a hash join due because it was less costly.
    – Dan Guzman
    Dec 27 at 12:45






  • 1




    TBH it doesn't sound like you understand what a hash or merge join are from your question. you might want to read blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join and blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join
    – Martin Smith
    Dec 27 at 12:49












  • @MartinSmith thanks a lot! I will have a look at those!
    – HansMusterWhatElse
    Dec 27 at 13:16














  • 1




    Please don't post images of code; you have enough reputation to know that code should be supplied as text. As for why, please also include the DDL of your tables including all indexes. Thanks.
    – Larnu
    Dec 27 at 12:38










  • You'll get a MERGE JOIN if you drop the covering IX_Person_LastName_FirstName_MiddleName index from the Person.Person table. The optimizer used the covering index and a hash join due because it was less costly.
    – Dan Guzman
    Dec 27 at 12:45






  • 1




    TBH it doesn't sound like you understand what a hash or merge join are from your question. you might want to read blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join and blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join
    – Martin Smith
    Dec 27 at 12:49












  • @MartinSmith thanks a lot! I will have a look at those!
    – HansMusterWhatElse
    Dec 27 at 13:16








1




1




Please don't post images of code; you have enough reputation to know that code should be supplied as text. As for why, please also include the DDL of your tables including all indexes. Thanks.
– Larnu
Dec 27 at 12:38




Please don't post images of code; you have enough reputation to know that code should be supplied as text. As for why, please also include the DDL of your tables including all indexes. Thanks.
– Larnu
Dec 27 at 12:38












You'll get a MERGE JOIN if you drop the covering IX_Person_LastName_FirstName_MiddleName index from the Person.Person table. The optimizer used the covering index and a hash join due because it was less costly.
– Dan Guzman
Dec 27 at 12:45




You'll get a MERGE JOIN if you drop the covering IX_Person_LastName_FirstName_MiddleName index from the Person.Person table. The optimizer used the covering index and a hash join due because it was less costly.
– Dan Guzman
Dec 27 at 12:45




1




1




TBH it doesn't sound like you understand what a hash or merge join are from your question. you might want to read blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join and blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join
– Martin Smith
Dec 27 at 12:49






TBH it doesn't sound like you understand what a hash or merge join are from your question. you might want to read blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join and blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join
– Martin Smith
Dec 27 at 12:49














@MartinSmith thanks a lot! I will have a look at those!
– HansMusterWhatElse
Dec 27 at 13:16




@MartinSmith thanks a lot! I will have a look at those!
– HansMusterWhatElse
Dec 27 at 13:16












1 Answer
1






active

oldest

votes


















3














Generally, the fastest type of join is going to use indexes. So, if you care about performance, add indexes on businessentityid.



A merge join is used when the data is already sorted. If it is not sorted, then the sort is rather expensive -- O(n log(n)).



A hash join, by contrast, requires "just" hashing the values and comparing them in bins. Under many circumstances, this is O(n). Of course, if your data is really large or if some values repeat many times, then the hash join will have memory issues or hash collisions -- and the complexity of the algorithm increases.



The key point, though, is that the optimizer chooses the algorithm that it considers best suited to the data. If performance is your primary consideration, then an index is highly recommended.






share|improve this answer























  • One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
    – IMSoP
    Dec 27 at 12:48










  • The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
    – HansMusterWhatElse
    Dec 27 at 13:17








  • 1




    @HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
    – Gordon Linoff
    Dec 27 at 13:19






  • 1




    @HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
    – Damien_The_Unbeliever
    Dec 27 at 13:20












  • @GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
    – HansMusterWhatElse
    Dec 27 at 13:20













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945219%2fsql-execution-plan-why-is-a-hash-match-used-for-the-join-operation%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














Generally, the fastest type of join is going to use indexes. So, if you care about performance, add indexes on businessentityid.



A merge join is used when the data is already sorted. If it is not sorted, then the sort is rather expensive -- O(n log(n)).



A hash join, by contrast, requires "just" hashing the values and comparing them in bins. Under many circumstances, this is O(n). Of course, if your data is really large or if some values repeat many times, then the hash join will have memory issues or hash collisions -- and the complexity of the algorithm increases.



The key point, though, is that the optimizer chooses the algorithm that it considers best suited to the data. If performance is your primary consideration, then an index is highly recommended.






share|improve this answer























  • One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
    – IMSoP
    Dec 27 at 12:48










  • The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
    – HansMusterWhatElse
    Dec 27 at 13:17








  • 1




    @HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
    – Gordon Linoff
    Dec 27 at 13:19






  • 1




    @HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
    – Damien_The_Unbeliever
    Dec 27 at 13:20












  • @GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
    – HansMusterWhatElse
    Dec 27 at 13:20


















3














Generally, the fastest type of join is going to use indexes. So, if you care about performance, add indexes on businessentityid.



A merge join is used when the data is already sorted. If it is not sorted, then the sort is rather expensive -- O(n log(n)).



A hash join, by contrast, requires "just" hashing the values and comparing them in bins. Under many circumstances, this is O(n). Of course, if your data is really large or if some values repeat many times, then the hash join will have memory issues or hash collisions -- and the complexity of the algorithm increases.



The key point, though, is that the optimizer chooses the algorithm that it considers best suited to the data. If performance is your primary consideration, then an index is highly recommended.






share|improve this answer























  • One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
    – IMSoP
    Dec 27 at 12:48










  • The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
    – HansMusterWhatElse
    Dec 27 at 13:17








  • 1




    @HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
    – Gordon Linoff
    Dec 27 at 13:19






  • 1




    @HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
    – Damien_The_Unbeliever
    Dec 27 at 13:20












  • @GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
    – HansMusterWhatElse
    Dec 27 at 13:20
















3












3








3






Generally, the fastest type of join is going to use indexes. So, if you care about performance, add indexes on businessentityid.



A merge join is used when the data is already sorted. If it is not sorted, then the sort is rather expensive -- O(n log(n)).



A hash join, by contrast, requires "just" hashing the values and comparing them in bins. Under many circumstances, this is O(n). Of course, if your data is really large or if some values repeat many times, then the hash join will have memory issues or hash collisions -- and the complexity of the algorithm increases.



The key point, though, is that the optimizer chooses the algorithm that it considers best suited to the data. If performance is your primary consideration, then an index is highly recommended.






share|improve this answer














Generally, the fastest type of join is going to use indexes. So, if you care about performance, add indexes on businessentityid.



A merge join is used when the data is already sorted. If it is not sorted, then the sort is rather expensive -- O(n log(n)).



A hash join, by contrast, requires "just" hashing the values and comparing them in bins. Under many circumstances, this is O(n). Of course, if your data is really large or if some values repeat many times, then the hash join will have memory issues or hash collisions -- and the complexity of the algorithm increases.



The key point, though, is that the optimizer chooses the algorithm that it considers best suited to the data. If performance is your primary consideration, then an index is highly recommended.







share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 27 at 12:59

























answered Dec 27 at 12:40









Gordon Linoff

758k35291399




758k35291399












  • One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
    – IMSoP
    Dec 27 at 12:48










  • The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
    – HansMusterWhatElse
    Dec 27 at 13:17








  • 1




    @HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
    – Gordon Linoff
    Dec 27 at 13:19






  • 1




    @HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
    – Damien_The_Unbeliever
    Dec 27 at 13:20












  • @GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
    – HansMusterWhatElse
    Dec 27 at 13:20




















  • One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
    – IMSoP
    Dec 27 at 12:48










  • The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
    – HansMusterWhatElse
    Dec 27 at 13:17








  • 1




    @HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
    – Gordon Linoff
    Dec 27 at 13:19






  • 1




    @HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
    – Damien_The_Unbeliever
    Dec 27 at 13:20












  • @GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
    – HansMusterWhatElse
    Dec 27 at 13:20


















One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
– IMSoP
Dec 27 at 12:48




One quick caveat that often gets overlooked: indexes are not free, particularly with a large or frequently changing table, so you should look at how much gain you actually get in realistic queries from adding it.
– IMSoP
Dec 27 at 12:48












The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
– HansMusterWhatElse
Dec 27 at 13:17






The reason why I ask is I currently don't undertand why you would need to hash integers? From my very basic understanding of data structures this doesn't add up for me. Nevertheless @MartinSmith pointed out some interesting articles and I hope I will find my answer there.
– HansMusterWhatElse
Dec 27 at 13:17






1




1




@HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
– Gordon Linoff
Dec 27 at 13:19




@HansMusterWhatElse . . . Because they can extend up to about 2 or 4 billion. That is a lot of bins. And, if that is your question, then the question is unclear.
– Gordon Linoff
Dec 27 at 13:19




1




1




@HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
– Damien_The_Unbeliever
Dec 27 at 13:20






@HansMusterWhatElse - why hash integers? Because most parts of the database engine don't feature specialised operators dedicated to individual data types. The engine may use the hash join operator even if (I don't know, haven't checked) the implementation of hashing for integers happens to be the identity function.
– Damien_The_Unbeliever
Dec 27 at 13:20














@GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
– HansMusterWhatElse
Dec 27 at 13:20






@GordonLinoff: Thank you very much! That was what I was looking for. In this case it was only for a small dataset so I guess they don't differentiate between that.
– HansMusterWhatElse
Dec 27 at 13:20




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53945219%2fsql-execution-plan-why-is-a-hash-match-used-for-the-join-operation%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Mossoró

Error while reading .h5 file using the rhdf5 package in R

Pushsharp Apns notification error: 'InvalidToken'