Optimising a SQL query with a huge where clause












1















I am working on a system (with Laravel) where users can fill a few filters to get the data they need.
Data is not prepared real time, once the filters are set, a job is pushed to the queue and once the query finishes a CSV file is created. Then the user receives an email with the file which was created so that they can download it.



I have seen some errors in the jobs where it took longer than 30 mins to process one job and when I checked I have seen some users created filter with more than 600 values.



This filter values are translated like this:



SELECT filed1, 
field2,
field6
FROM table
INNER JOIN table2
ON table.id = table2.cid
/* this is how we try not to give same data to the users again so we used NOT IN */
WHERE table.id NOT IN(SELECT data_id
FROM data_access
WHERE data_user = 26)
AND ( /* this bit is auto populated with the filter values */
table2.filed_a = 'text a'
OR table2.filed_a = 'text b'
OR table2.filed_a = 'text c' )


Well I was not expecting users to go wild and fine tune with a huge filter set. It is okay for them to do this but need a solution to make this query quicker.



One way is to create a temp table on the fly with the filter values and covert the query for INNER JOIN but not sure if it would increase the performance.
Also, given that in a normal day system would need to create at least 40-ish temp tables and delete them afterwards. Would this become another issue in the long run?



I would love to hear any other suggestions that may help me solve this issue other then temp table method.










share|improve this question




















  • 2





    Have you checked the EXPLAIN plan? I ask this, because the subquery in the WHERE is not correlated with the outer query, meaning that MySQL can probably run it once, and cache the results. So, I'm not sure that would be your biggest bottleneck here.

    – Tim Biegeleisen
    Dec 30 '18 at 14:34






  • 1





    Indeed, that the subquery in the NOT IN expression is not correlated seems very suspicious to me, especially in light of the associated comment.

    – John Bollinger
    Dec 30 '18 at 14:39













  • you already try to change WHERE table.id NOT IN (sub query) with a LEFT JOIN data_access ON data_access.data_id = table.id WHERE data_access.data_id IS NULL ?

    – Magicianred
    Dec 30 '18 at 14:41


















1















I am working on a system (with Laravel) where users can fill a few filters to get the data they need.
Data is not prepared real time, once the filters are set, a job is pushed to the queue and once the query finishes a CSV file is created. Then the user receives an email with the file which was created so that they can download it.



I have seen some errors in the jobs where it took longer than 30 mins to process one job and when I checked I have seen some users created filter with more than 600 values.



This filter values are translated like this:



SELECT filed1, 
field2,
field6
FROM table
INNER JOIN table2
ON table.id = table2.cid
/* this is how we try not to give same data to the users again so we used NOT IN */
WHERE table.id NOT IN(SELECT data_id
FROM data_access
WHERE data_user = 26)
AND ( /* this bit is auto populated with the filter values */
table2.filed_a = 'text a'
OR table2.filed_a = 'text b'
OR table2.filed_a = 'text c' )


Well I was not expecting users to go wild and fine tune with a huge filter set. It is okay for them to do this but need a solution to make this query quicker.



One way is to create a temp table on the fly with the filter values and covert the query for INNER JOIN but not sure if it would increase the performance.
Also, given that in a normal day system would need to create at least 40-ish temp tables and delete them afterwards. Would this become another issue in the long run?



I would love to hear any other suggestions that may help me solve this issue other then temp table method.










share|improve this question




















  • 2





    Have you checked the EXPLAIN plan? I ask this, because the subquery in the WHERE is not correlated with the outer query, meaning that MySQL can probably run it once, and cache the results. So, I'm not sure that would be your biggest bottleneck here.

    – Tim Biegeleisen
    Dec 30 '18 at 14:34






  • 1





    Indeed, that the subquery in the NOT IN expression is not correlated seems very suspicious to me, especially in light of the associated comment.

    – John Bollinger
    Dec 30 '18 at 14:39













  • you already try to change WHERE table.id NOT IN (sub query) with a LEFT JOIN data_access ON data_access.data_id = table.id WHERE data_access.data_id IS NULL ?

    – Magicianred
    Dec 30 '18 at 14:41
















1












1








1








I am working on a system (with Laravel) where users can fill a few filters to get the data they need.
Data is not prepared real time, once the filters are set, a job is pushed to the queue and once the query finishes a CSV file is created. Then the user receives an email with the file which was created so that they can download it.



I have seen some errors in the jobs where it took longer than 30 mins to process one job and when I checked I have seen some users created filter with more than 600 values.



This filter values are translated like this:



SELECT filed1, 
field2,
field6
FROM table
INNER JOIN table2
ON table.id = table2.cid
/* this is how we try not to give same data to the users again so we used NOT IN */
WHERE table.id NOT IN(SELECT data_id
FROM data_access
WHERE data_user = 26)
AND ( /* this bit is auto populated with the filter values */
table2.filed_a = 'text a'
OR table2.filed_a = 'text b'
OR table2.filed_a = 'text c' )


Well I was not expecting users to go wild and fine tune with a huge filter set. It is okay for them to do this but need a solution to make this query quicker.



One way is to create a temp table on the fly with the filter values and covert the query for INNER JOIN but not sure if it would increase the performance.
Also, given that in a normal day system would need to create at least 40-ish temp tables and delete them afterwards. Would this become another issue in the long run?



I would love to hear any other suggestions that may help me solve this issue other then temp table method.










share|improve this question
















I am working on a system (with Laravel) where users can fill a few filters to get the data they need.
Data is not prepared real time, once the filters are set, a job is pushed to the queue and once the query finishes a CSV file is created. Then the user receives an email with the file which was created so that they can download it.



I have seen some errors in the jobs where it took longer than 30 mins to process one job and when I checked I have seen some users created filter with more than 600 values.



This filter values are translated like this:



SELECT filed1, 
field2,
field6
FROM table
INNER JOIN table2
ON table.id = table2.cid
/* this is how we try not to give same data to the users again so we used NOT IN */
WHERE table.id NOT IN(SELECT data_id
FROM data_access
WHERE data_user = 26)
AND ( /* this bit is auto populated with the filter values */
table2.filed_a = 'text a'
OR table2.filed_a = 'text b'
OR table2.filed_a = 'text c' )


Well I was not expecting users to go wild and fine tune with a huge filter set. It is okay for them to do this but need a solution to make this query quicker.



One way is to create a temp table on the fly with the filter values and covert the query for INNER JOIN but not sure if it would increase the performance.
Also, given that in a normal day system would need to create at least 40-ish temp tables and delete them afterwards. Would this become another issue in the long run?



I would love to hear any other suggestions that may help me solve this issue other then temp table method.







mysql sql query-performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 30 '18 at 14:38









Gordon Linoff

769k35300402




769k35300402










asked Dec 30 '18 at 14:32









Jerry LiJerry Li

195




195








  • 2





    Have you checked the EXPLAIN plan? I ask this, because the subquery in the WHERE is not correlated with the outer query, meaning that MySQL can probably run it once, and cache the results. So, I'm not sure that would be your biggest bottleneck here.

    – Tim Biegeleisen
    Dec 30 '18 at 14:34






  • 1





    Indeed, that the subquery in the NOT IN expression is not correlated seems very suspicious to me, especially in light of the associated comment.

    – John Bollinger
    Dec 30 '18 at 14:39













  • you already try to change WHERE table.id NOT IN (sub query) with a LEFT JOIN data_access ON data_access.data_id = table.id WHERE data_access.data_id IS NULL ?

    – Magicianred
    Dec 30 '18 at 14:41
















  • 2





    Have you checked the EXPLAIN plan? I ask this, because the subquery in the WHERE is not correlated with the outer query, meaning that MySQL can probably run it once, and cache the results. So, I'm not sure that would be your biggest bottleneck here.

    – Tim Biegeleisen
    Dec 30 '18 at 14:34






  • 1





    Indeed, that the subquery in the NOT IN expression is not correlated seems very suspicious to me, especially in light of the associated comment.

    – John Bollinger
    Dec 30 '18 at 14:39













  • you already try to change WHERE table.id NOT IN (sub query) with a LEFT JOIN data_access ON data_access.data_id = table.id WHERE data_access.data_id IS NULL ?

    – Magicianred
    Dec 30 '18 at 14:41










2




2





Have you checked the EXPLAIN plan? I ask this, because the subquery in the WHERE is not correlated with the outer query, meaning that MySQL can probably run it once, and cache the results. So, I'm not sure that would be your biggest bottleneck here.

– Tim Biegeleisen
Dec 30 '18 at 14:34





Have you checked the EXPLAIN plan? I ask this, because the subquery in the WHERE is not correlated with the outer query, meaning that MySQL can probably run it once, and cache the results. So, I'm not sure that would be your biggest bottleneck here.

– Tim Biegeleisen
Dec 30 '18 at 14:34




1




1





Indeed, that the subquery in the NOT IN expression is not correlated seems very suspicious to me, especially in light of the associated comment.

– John Bollinger
Dec 30 '18 at 14:39







Indeed, that the subquery in the NOT IN expression is not correlated seems very suspicious to me, especially in light of the associated comment.

– John Bollinger
Dec 30 '18 at 14:39















you already try to change WHERE table.id NOT IN (sub query) with a LEFT JOIN data_access ON data_access.data_id = table.id WHERE data_access.data_id IS NULL ?

– Magicianred
Dec 30 '18 at 14:41







you already try to change WHERE table.id NOT IN (sub query) with a LEFT JOIN data_access ON data_access.data_id = table.id WHERE data_access.data_id IS NULL ?

– Magicianred
Dec 30 '18 at 14:41














1 Answer
1






active

oldest

votes


















2














I would suggest writing the query like this:



SELECT ?.filed1, ?.field2, ?.field6  -- qualify column names (but no effect on performance)
FROM table t JOIN
table2 t2
ON t.id = t2.cid

WHERE NOT EXISTS (SELECT 1
FROM data_access da
WHERE t.id = da.data_id AND da.data_user = 26
) AND
t2.filed_a IN ('text a', 'text b', 'text c') ;


Then I would recommend indexes. Most likely:




  • table2(filed_a, cid)


  • table1(id) (may not be necessary if id is already the primary key)

  • data_access(data_id, data_user)


You can test this as your own query. I don't know how to get Laravel to produce this (assuming it meets your performance objectives).






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53978471%2foptimising-a-sql-query-with-a-huge-where-clause%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    I would suggest writing the query like this:



    SELECT ?.filed1, ?.field2, ?.field6  -- qualify column names (but no effect on performance)
    FROM table t JOIN
    table2 t2
    ON t.id = t2.cid

    WHERE NOT EXISTS (SELECT 1
    FROM data_access da
    WHERE t.id = da.data_id AND da.data_user = 26
    ) AND
    t2.filed_a IN ('text a', 'text b', 'text c') ;


    Then I would recommend indexes. Most likely:




    • table2(filed_a, cid)


    • table1(id) (may not be necessary if id is already the primary key)

    • data_access(data_id, data_user)


    You can test this as your own query. I don't know how to get Laravel to produce this (assuming it meets your performance objectives).






    share|improve this answer




























      2














      I would suggest writing the query like this:



      SELECT ?.filed1, ?.field2, ?.field6  -- qualify column names (but no effect on performance)
      FROM table t JOIN
      table2 t2
      ON t.id = t2.cid

      WHERE NOT EXISTS (SELECT 1
      FROM data_access da
      WHERE t.id = da.data_id AND da.data_user = 26
      ) AND
      t2.filed_a IN ('text a', 'text b', 'text c') ;


      Then I would recommend indexes. Most likely:




      • table2(filed_a, cid)


      • table1(id) (may not be necessary if id is already the primary key)

      • data_access(data_id, data_user)


      You can test this as your own query. I don't know how to get Laravel to produce this (assuming it meets your performance objectives).






      share|improve this answer


























        2












        2








        2







        I would suggest writing the query like this:



        SELECT ?.filed1, ?.field2, ?.field6  -- qualify column names (but no effect on performance)
        FROM table t JOIN
        table2 t2
        ON t.id = t2.cid

        WHERE NOT EXISTS (SELECT 1
        FROM data_access da
        WHERE t.id = da.data_id AND da.data_user = 26
        ) AND
        t2.filed_a IN ('text a', 'text b', 'text c') ;


        Then I would recommend indexes. Most likely:




        • table2(filed_a, cid)


        • table1(id) (may not be necessary if id is already the primary key)

        • data_access(data_id, data_user)


        You can test this as your own query. I don't know how to get Laravel to produce this (assuming it meets your performance objectives).






        share|improve this answer













        I would suggest writing the query like this:



        SELECT ?.filed1, ?.field2, ?.field6  -- qualify column names (but no effect on performance)
        FROM table t JOIN
        table2 t2
        ON t.id = t2.cid

        WHERE NOT EXISTS (SELECT 1
        FROM data_access da
        WHERE t.id = da.data_id AND da.data_user = 26
        ) AND
        t2.filed_a IN ('text a', 'text b', 'text c') ;


        Then I would recommend indexes. Most likely:




        • table2(filed_a, cid)


        • table1(id) (may not be necessary if id is already the primary key)

        • data_access(data_id, data_user)


        You can test this as your own query. I don't know how to get Laravel to produce this (assuming it meets your performance objectives).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 30 '18 at 14:38









        Gordon LinoffGordon Linoff

        769k35300402




        769k35300402






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53978471%2foptimising-a-sql-query-with-a-huge-where-clause%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas