Optimising a SQL query with a huge where clause
I am working on a system (with Laravel) where users can fill a few filters to get the data they need.
Data is not prepared real time, once the filters are set, a job is pushed to the queue and once the query finishes a CSV file is created. Then the user receives an email with the file which was created so that they can download it.
I have seen some errors in the jobs where it took longer than 30 mins to process one job and when I checked I have seen some users created filter with more than 600 values.
This filter values are translated like this:
SELECT filed1,
field2,
field6
FROM table
INNER JOIN table2
ON table.id = table2.cid
/* this is how we try not to give same data to the users again so we used NOT IN */
WHERE table.id NOT IN(SELECT data_id
FROM data_access
WHERE data_user = 26)
AND ( /* this bit is auto populated with the filter values */
table2.filed_a = 'text a'
OR table2.filed_a = 'text b'
OR table2.filed_a = 'text c' )
Well I was not expecting users to go wild and fine tune with a huge filter set. It is okay for them to do this but need a solution to make this query quicker.
One way is to create a temp table on the fly with the filter values and covert the query for INNER JOIN
but not sure if it would increase the performance.
Also, given that in a normal day system would need to create at least 40-ish temp tables and delete them afterwards. Would this become another issue in the long run?
I would love to hear any other suggestions that may help me solve this issue other then temp table method.
mysql sql query-performance
add a comment |
I am working on a system (with Laravel) where users can fill a few filters to get the data they need.
Data is not prepared real time, once the filters are set, a job is pushed to the queue and once the query finishes a CSV file is created. Then the user receives an email with the file which was created so that they can download it.
I have seen some errors in the jobs where it took longer than 30 mins to process one job and when I checked I have seen some users created filter with more than 600 values.
This filter values are translated like this:
SELECT filed1,
field2,
field6
FROM table
INNER JOIN table2
ON table.id = table2.cid
/* this is how we try not to give same data to the users again so we used NOT IN */
WHERE table.id NOT IN(SELECT data_id
FROM data_access
WHERE data_user = 26)
AND ( /* this bit is auto populated with the filter values */
table2.filed_a = 'text a'
OR table2.filed_a = 'text b'
OR table2.filed_a = 'text c' )
Well I was not expecting users to go wild and fine tune with a huge filter set. It is okay for them to do this but need a solution to make this query quicker.
One way is to create a temp table on the fly with the filter values and covert the query for INNER JOIN
but not sure if it would increase the performance.
Also, given that in a normal day system would need to create at least 40-ish temp tables and delete them afterwards. Would this become another issue in the long run?
I would love to hear any other suggestions that may help me solve this issue other then temp table method.
mysql sql query-performance
2
Have you checked theEXPLAIN
plan? I ask this, because the subquery in theWHERE
is not correlated with the outer query, meaning that MySQL can probably run it once, and cache the results. So, I'm not sure that would be your biggest bottleneck here.
– Tim Biegeleisen
Dec 30 '18 at 14:34
1
Indeed, that the subquery in theNOT IN
expression is not correlated seems very suspicious to me, especially in light of the associated comment.
– John Bollinger
Dec 30 '18 at 14:39
you already try to change WHERE table.id NOT IN (sub query) with a LEFT JOIN data_access ON data_access.data_id = table.id WHERE data_access.data_id IS NULL ?
– Magicianred
Dec 30 '18 at 14:41
add a comment |
I am working on a system (with Laravel) where users can fill a few filters to get the data they need.
Data is not prepared real time, once the filters are set, a job is pushed to the queue and once the query finishes a CSV file is created. Then the user receives an email with the file which was created so that they can download it.
I have seen some errors in the jobs where it took longer than 30 mins to process one job and when I checked I have seen some users created filter with more than 600 values.
This filter values are translated like this:
SELECT filed1,
field2,
field6
FROM table
INNER JOIN table2
ON table.id = table2.cid
/* this is how we try not to give same data to the users again so we used NOT IN */
WHERE table.id NOT IN(SELECT data_id
FROM data_access
WHERE data_user = 26)
AND ( /* this bit is auto populated with the filter values */
table2.filed_a = 'text a'
OR table2.filed_a = 'text b'
OR table2.filed_a = 'text c' )
Well I was not expecting users to go wild and fine tune with a huge filter set. It is okay for them to do this but need a solution to make this query quicker.
One way is to create a temp table on the fly with the filter values and covert the query for INNER JOIN
but not sure if it would increase the performance.
Also, given that in a normal day system would need to create at least 40-ish temp tables and delete them afterwards. Would this become another issue in the long run?
I would love to hear any other suggestions that may help me solve this issue other then temp table method.
mysql sql query-performance
I am working on a system (with Laravel) where users can fill a few filters to get the data they need.
Data is not prepared real time, once the filters are set, a job is pushed to the queue and once the query finishes a CSV file is created. Then the user receives an email with the file which was created so that they can download it.
I have seen some errors in the jobs where it took longer than 30 mins to process one job and when I checked I have seen some users created filter with more than 600 values.
This filter values are translated like this:
SELECT filed1,
field2,
field6
FROM table
INNER JOIN table2
ON table.id = table2.cid
/* this is how we try not to give same data to the users again so we used NOT IN */
WHERE table.id NOT IN(SELECT data_id
FROM data_access
WHERE data_user = 26)
AND ( /* this bit is auto populated with the filter values */
table2.filed_a = 'text a'
OR table2.filed_a = 'text b'
OR table2.filed_a = 'text c' )
Well I was not expecting users to go wild and fine tune with a huge filter set. It is okay for them to do this but need a solution to make this query quicker.
One way is to create a temp table on the fly with the filter values and covert the query for INNER JOIN
but not sure if it would increase the performance.
Also, given that in a normal day system would need to create at least 40-ish temp tables and delete them afterwards. Would this become another issue in the long run?
I would love to hear any other suggestions that may help me solve this issue other then temp table method.
mysql sql query-performance
mysql sql query-performance
edited Dec 30 '18 at 14:38
Gordon Linoff
769k35300402
769k35300402
asked Dec 30 '18 at 14:32
Jerry LiJerry Li
195
195
2
Have you checked theEXPLAIN
plan? I ask this, because the subquery in theWHERE
is not correlated with the outer query, meaning that MySQL can probably run it once, and cache the results. So, I'm not sure that would be your biggest bottleneck here.
– Tim Biegeleisen
Dec 30 '18 at 14:34
1
Indeed, that the subquery in theNOT IN
expression is not correlated seems very suspicious to me, especially in light of the associated comment.
– John Bollinger
Dec 30 '18 at 14:39
you already try to change WHERE table.id NOT IN (sub query) with a LEFT JOIN data_access ON data_access.data_id = table.id WHERE data_access.data_id IS NULL ?
– Magicianred
Dec 30 '18 at 14:41
add a comment |
2
Have you checked theEXPLAIN
plan? I ask this, because the subquery in theWHERE
is not correlated with the outer query, meaning that MySQL can probably run it once, and cache the results. So, I'm not sure that would be your biggest bottleneck here.
– Tim Biegeleisen
Dec 30 '18 at 14:34
1
Indeed, that the subquery in theNOT IN
expression is not correlated seems very suspicious to me, especially in light of the associated comment.
– John Bollinger
Dec 30 '18 at 14:39
you already try to change WHERE table.id NOT IN (sub query) with a LEFT JOIN data_access ON data_access.data_id = table.id WHERE data_access.data_id IS NULL ?
– Magicianred
Dec 30 '18 at 14:41
2
2
Have you checked the
EXPLAIN
plan? I ask this, because the subquery in the WHERE
is not correlated with the outer query, meaning that MySQL can probably run it once, and cache the results. So, I'm not sure that would be your biggest bottleneck here.– Tim Biegeleisen
Dec 30 '18 at 14:34
Have you checked the
EXPLAIN
plan? I ask this, because the subquery in the WHERE
is not correlated with the outer query, meaning that MySQL can probably run it once, and cache the results. So, I'm not sure that would be your biggest bottleneck here.– Tim Biegeleisen
Dec 30 '18 at 14:34
1
1
Indeed, that the subquery in the
NOT IN
expression is not correlated seems very suspicious to me, especially in light of the associated comment.– John Bollinger
Dec 30 '18 at 14:39
Indeed, that the subquery in the
NOT IN
expression is not correlated seems very suspicious to me, especially in light of the associated comment.– John Bollinger
Dec 30 '18 at 14:39
you already try to change WHERE table.id NOT IN (sub query) with a LEFT JOIN data_access ON data_access.data_id = table.id WHERE data_access.data_id IS NULL ?
– Magicianred
Dec 30 '18 at 14:41
you already try to change WHERE table.id NOT IN (sub query) with a LEFT JOIN data_access ON data_access.data_id = table.id WHERE data_access.data_id IS NULL ?
– Magicianred
Dec 30 '18 at 14:41
add a comment |
1 Answer
1
active
oldest
votes
I would suggest writing the query like this:
SELECT ?.filed1, ?.field2, ?.field6 -- qualify column names (but no effect on performance)
FROM table t JOIN
table2 t2
ON t.id = t2.cid
WHERE NOT EXISTS (SELECT 1
FROM data_access da
WHERE t.id = da.data_id AND da.data_user = 26
) AND
t2.filed_a IN ('text a', 'text b', 'text c') ;
Then I would recommend indexes. Most likely:
table2(filed_a, cid)
table1(id)
(may not be necessary ifid
is already the primary key)data_access(data_id, data_user)
You can test this as your own query. I don't know how to get Laravel to produce this (assuming it meets your performance objectives).
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53978471%2foptimising-a-sql-query-with-a-huge-where-clause%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I would suggest writing the query like this:
SELECT ?.filed1, ?.field2, ?.field6 -- qualify column names (but no effect on performance)
FROM table t JOIN
table2 t2
ON t.id = t2.cid
WHERE NOT EXISTS (SELECT 1
FROM data_access da
WHERE t.id = da.data_id AND da.data_user = 26
) AND
t2.filed_a IN ('text a', 'text b', 'text c') ;
Then I would recommend indexes. Most likely:
table2(filed_a, cid)
table1(id)
(may not be necessary ifid
is already the primary key)data_access(data_id, data_user)
You can test this as your own query. I don't know how to get Laravel to produce this (assuming it meets your performance objectives).
add a comment |
I would suggest writing the query like this:
SELECT ?.filed1, ?.field2, ?.field6 -- qualify column names (but no effect on performance)
FROM table t JOIN
table2 t2
ON t.id = t2.cid
WHERE NOT EXISTS (SELECT 1
FROM data_access da
WHERE t.id = da.data_id AND da.data_user = 26
) AND
t2.filed_a IN ('text a', 'text b', 'text c') ;
Then I would recommend indexes. Most likely:
table2(filed_a, cid)
table1(id)
(may not be necessary ifid
is already the primary key)data_access(data_id, data_user)
You can test this as your own query. I don't know how to get Laravel to produce this (assuming it meets your performance objectives).
add a comment |
I would suggest writing the query like this:
SELECT ?.filed1, ?.field2, ?.field6 -- qualify column names (but no effect on performance)
FROM table t JOIN
table2 t2
ON t.id = t2.cid
WHERE NOT EXISTS (SELECT 1
FROM data_access da
WHERE t.id = da.data_id AND da.data_user = 26
) AND
t2.filed_a IN ('text a', 'text b', 'text c') ;
Then I would recommend indexes. Most likely:
table2(filed_a, cid)
table1(id)
(may not be necessary ifid
is already the primary key)data_access(data_id, data_user)
You can test this as your own query. I don't know how to get Laravel to produce this (assuming it meets your performance objectives).
I would suggest writing the query like this:
SELECT ?.filed1, ?.field2, ?.field6 -- qualify column names (but no effect on performance)
FROM table t JOIN
table2 t2
ON t.id = t2.cid
WHERE NOT EXISTS (SELECT 1
FROM data_access da
WHERE t.id = da.data_id AND da.data_user = 26
) AND
t2.filed_a IN ('text a', 'text b', 'text c') ;
Then I would recommend indexes. Most likely:
table2(filed_a, cid)
table1(id)
(may not be necessary ifid
is already the primary key)data_access(data_id, data_user)
You can test this as your own query. I don't know how to get Laravel to produce this (assuming it meets your performance objectives).
answered Dec 30 '18 at 14:38
Gordon LinoffGordon Linoff
769k35300402
769k35300402
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53978471%2foptimising-a-sql-query-with-a-huge-where-clause%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Have you checked the
EXPLAIN
plan? I ask this, because the subquery in theWHERE
is not correlated with the outer query, meaning that MySQL can probably run it once, and cache the results. So, I'm not sure that would be your biggest bottleneck here.– Tim Biegeleisen
Dec 30 '18 at 14:34
1
Indeed, that the subquery in the
NOT IN
expression is not correlated seems very suspicious to me, especially in light of the associated comment.– John Bollinger
Dec 30 '18 at 14:39
you already try to change WHERE table.id NOT IN (sub query) with a LEFT JOIN data_access ON data_access.data_id = table.id WHERE data_access.data_id IS NULL ?
– Magicianred
Dec 30 '18 at 14:41