How to set updating row's field with value of closest to it by date another field?












0














I have a huge table with 2m+ rows.
The structure is like that:



ThingName (STRING),
Date (DATE),
Value (INT64)


Sometimes Value is null and I need to fix it by setting it with NOT NULL Value of closest to it by Date row corresponding to ThingName...



And I am totally not SQL guy.



I tried to describe my task with this query (and simplified it a lot by using only previous dates (but actually I need to check future dates too)):



update my_tbl as SDP
set SDP.Value = (select SDPI.Value
from my_tbl as SDPI
where SDPI.Date < SDP.Date
and SDP.ThingName = SDPI.ThingName
and SDPI.Value is not null
order by SDPI.Date desc limit 1)
where SDP.Value is null;


There I try to set updating row Value with one that I select from same table for same ThingName and with limit 1 I leave only single result.



But query editor tell me this:
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.



Actually, I am not sure at all that my task can be solved just with query.



So, can anyone help me? If this is impossible, then tell me this, if it possible, tell me what SQL constructions may help me.










share|improve this question





























    0














    I have a huge table with 2m+ rows.
    The structure is like that:



    ThingName (STRING),
    Date (DATE),
    Value (INT64)


    Sometimes Value is null and I need to fix it by setting it with NOT NULL Value of closest to it by Date row corresponding to ThingName...



    And I am totally not SQL guy.



    I tried to describe my task with this query (and simplified it a lot by using only previous dates (but actually I need to check future dates too)):



    update my_tbl as SDP
    set SDP.Value = (select SDPI.Value
    from my_tbl as SDPI
    where SDPI.Date < SDP.Date
    and SDP.ThingName = SDPI.ThingName
    and SDPI.Value is not null
    order by SDPI.Date desc limit 1)
    where SDP.Value is null;


    There I try to set updating row Value with one that I select from same table for same ThingName and with limit 1 I leave only single result.



    But query editor tell me this:
    Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.



    Actually, I am not sure at all that my task can be solved just with query.



    So, can anyone help me? If this is impossible, then tell me this, if it possible, tell me what SQL constructions may help me.










    share|improve this question



























      0












      0








      0







      I have a huge table with 2m+ rows.
      The structure is like that:



      ThingName (STRING),
      Date (DATE),
      Value (INT64)


      Sometimes Value is null and I need to fix it by setting it with NOT NULL Value of closest to it by Date row corresponding to ThingName...



      And I am totally not SQL guy.



      I tried to describe my task with this query (and simplified it a lot by using only previous dates (but actually I need to check future dates too)):



      update my_tbl as SDP
      set SDP.Value = (select SDPI.Value
      from my_tbl as SDPI
      where SDPI.Date < SDP.Date
      and SDP.ThingName = SDPI.ThingName
      and SDPI.Value is not null
      order by SDPI.Date desc limit 1)
      where SDP.Value is null;


      There I try to set updating row Value with one that I select from same table for same ThingName and with limit 1 I leave only single result.



      But query editor tell me this:
      Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.



      Actually, I am not sure at all that my task can be solved just with query.



      So, can anyone help me? If this is impossible, then tell me this, if it possible, tell me what SQL constructions may help me.










      share|improve this question















      I have a huge table with 2m+ rows.
      The structure is like that:



      ThingName (STRING),
      Date (DATE),
      Value (INT64)


      Sometimes Value is null and I need to fix it by setting it with NOT NULL Value of closest to it by Date row corresponding to ThingName...



      And I am totally not SQL guy.



      I tried to describe my task with this query (and simplified it a lot by using only previous dates (but actually I need to check future dates too)):



      update my_tbl as SDP
      set SDP.Value = (select SDPI.Value
      from my_tbl as SDPI
      where SDPI.Date < SDP.Date
      and SDP.ThingName = SDPI.ThingName
      and SDPI.Value is not null
      order by SDPI.Date desc limit 1)
      where SDP.Value is null;


      There I try to set updating row Value with one that I select from same table for same ThingName and with limit 1 I leave only single result.



      But query editor tell me this:
      Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.



      Actually, I am not sure at all that my task can be solved just with query.



      So, can anyone help me? If this is impossible, then tell me this, if it possible, tell me what SQL constructions may help me.







      sql google-bigquery






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 27 at 15:25

























      asked Dec 27 at 14:29









      Kosmos

      1,69732657




      1,69732657
























          2 Answers
          2






          active

          oldest

          votes


















          3














          Below is for BigQuery Standard SQL



          In many (if not most) cases you don't want to update your table (as it incur extra cost and limitations associated with DML statements) but rather can adjust 'missing' values in-query - like in below example:





          #standardSQL
          SELECT
          ThingName,
          date,
          IFNULL(value,
          LAST_VALUE(value IGNORE NULLS)
          OVER(PARTITION BY thingname ORDER BY date)
          ) AS value
          FROM `project.dataset.my_tbl`


          If for some reason you actually need to update the table - above statement will not help as DML's UPDATE does not allow use of analytic functions, so you need to use another approach. For example as below one



          #standardSQL
          SELECT
          t1.ThingName, t1.date,
          ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] AS value
          FROM `project.dataset.my_tbl` AS t1
          LEFT JOIN `project.dataset.my_tbl` AS t2
          ON t2.ThingName = t1.ThingName
          AND t2.date <= t1.date
          GROUP BY t1.ThingName, t1.date, t1.value


          and now you can use it to update your table as in example below



          #standardSQL
          UPDATE `project.dataset.my_tbl` t
          SET value = new_value
          FROM (
          SELECT TO_JSON_STRING(t1) AS id,
          ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] new_value
          FROM `project.dataset.my_tbl` AS t1
          LEFT JOIN `project.dataset.my_tbl` AS t2
          ON t2.ThingName = t1.ThingName
          AND t2.date <= t1.date
          GROUP BY id
          )
          WHERE TO_JSON_STRING(t) = id





          share|improve this answer































            3














            In BigQuery, updates are rather rare. The logic you seem to want is:



            select t.*,
            coalesce(value,
            lag(value ignore nulls) over (partition by thingname order by date)
            ) as value
            from my_tbl;


            I don't really see a reason to save this back in the table.






            share|improve this answer





















            • Thank you. that's something too smart for me. trying to understand how this works O_o
              – Kosmos
              Dec 27 at 15:12












            • @Kosmos . . . I recommend that you learn about lag() and other window functions. They are quite useful. The BQ documentation is a place to start although there are other resources available as well.
              – Gordon Linoff
              Dec 27 at 15:46











            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946643%2fhow-to-set-updating-rows-field-with-value-of-closest-to-it-by-date-another-fiel%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3














            Below is for BigQuery Standard SQL



            In many (if not most) cases you don't want to update your table (as it incur extra cost and limitations associated with DML statements) but rather can adjust 'missing' values in-query - like in below example:





            #standardSQL
            SELECT
            ThingName,
            date,
            IFNULL(value,
            LAST_VALUE(value IGNORE NULLS)
            OVER(PARTITION BY thingname ORDER BY date)
            ) AS value
            FROM `project.dataset.my_tbl`


            If for some reason you actually need to update the table - above statement will not help as DML's UPDATE does not allow use of analytic functions, so you need to use another approach. For example as below one



            #standardSQL
            SELECT
            t1.ThingName, t1.date,
            ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] AS value
            FROM `project.dataset.my_tbl` AS t1
            LEFT JOIN `project.dataset.my_tbl` AS t2
            ON t2.ThingName = t1.ThingName
            AND t2.date <= t1.date
            GROUP BY t1.ThingName, t1.date, t1.value


            and now you can use it to update your table as in example below



            #standardSQL
            UPDATE `project.dataset.my_tbl` t
            SET value = new_value
            FROM (
            SELECT TO_JSON_STRING(t1) AS id,
            ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] new_value
            FROM `project.dataset.my_tbl` AS t1
            LEFT JOIN `project.dataset.my_tbl` AS t2
            ON t2.ThingName = t1.ThingName
            AND t2.date <= t1.date
            GROUP BY id
            )
            WHERE TO_JSON_STRING(t) = id





            share|improve this answer




























              3














              Below is for BigQuery Standard SQL



              In many (if not most) cases you don't want to update your table (as it incur extra cost and limitations associated with DML statements) but rather can adjust 'missing' values in-query - like in below example:





              #standardSQL
              SELECT
              ThingName,
              date,
              IFNULL(value,
              LAST_VALUE(value IGNORE NULLS)
              OVER(PARTITION BY thingname ORDER BY date)
              ) AS value
              FROM `project.dataset.my_tbl`


              If for some reason you actually need to update the table - above statement will not help as DML's UPDATE does not allow use of analytic functions, so you need to use another approach. For example as below one



              #standardSQL
              SELECT
              t1.ThingName, t1.date,
              ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] AS value
              FROM `project.dataset.my_tbl` AS t1
              LEFT JOIN `project.dataset.my_tbl` AS t2
              ON t2.ThingName = t1.ThingName
              AND t2.date <= t1.date
              GROUP BY t1.ThingName, t1.date, t1.value


              and now you can use it to update your table as in example below



              #standardSQL
              UPDATE `project.dataset.my_tbl` t
              SET value = new_value
              FROM (
              SELECT TO_JSON_STRING(t1) AS id,
              ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] new_value
              FROM `project.dataset.my_tbl` AS t1
              LEFT JOIN `project.dataset.my_tbl` AS t2
              ON t2.ThingName = t1.ThingName
              AND t2.date <= t1.date
              GROUP BY id
              )
              WHERE TO_JSON_STRING(t) = id





              share|improve this answer


























                3












                3








                3






                Below is for BigQuery Standard SQL



                In many (if not most) cases you don't want to update your table (as it incur extra cost and limitations associated with DML statements) but rather can adjust 'missing' values in-query - like in below example:





                #standardSQL
                SELECT
                ThingName,
                date,
                IFNULL(value,
                LAST_VALUE(value IGNORE NULLS)
                OVER(PARTITION BY thingname ORDER BY date)
                ) AS value
                FROM `project.dataset.my_tbl`


                If for some reason you actually need to update the table - above statement will not help as DML's UPDATE does not allow use of analytic functions, so you need to use another approach. For example as below one



                #standardSQL
                SELECT
                t1.ThingName, t1.date,
                ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] AS value
                FROM `project.dataset.my_tbl` AS t1
                LEFT JOIN `project.dataset.my_tbl` AS t2
                ON t2.ThingName = t1.ThingName
                AND t2.date <= t1.date
                GROUP BY t1.ThingName, t1.date, t1.value


                and now you can use it to update your table as in example below



                #standardSQL
                UPDATE `project.dataset.my_tbl` t
                SET value = new_value
                FROM (
                SELECT TO_JSON_STRING(t1) AS id,
                ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] new_value
                FROM `project.dataset.my_tbl` AS t1
                LEFT JOIN `project.dataset.my_tbl` AS t2
                ON t2.ThingName = t1.ThingName
                AND t2.date <= t1.date
                GROUP BY id
                )
                WHERE TO_JSON_STRING(t) = id





                share|improve this answer














                Below is for BigQuery Standard SQL



                In many (if not most) cases you don't want to update your table (as it incur extra cost and limitations associated with DML statements) but rather can adjust 'missing' values in-query - like in below example:





                #standardSQL
                SELECT
                ThingName,
                date,
                IFNULL(value,
                LAST_VALUE(value IGNORE NULLS)
                OVER(PARTITION BY thingname ORDER BY date)
                ) AS value
                FROM `project.dataset.my_tbl`


                If for some reason you actually need to update the table - above statement will not help as DML's UPDATE does not allow use of analytic functions, so you need to use another approach. For example as below one



                #standardSQL
                SELECT
                t1.ThingName, t1.date,
                ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] AS value
                FROM `project.dataset.my_tbl` AS t1
                LEFT JOIN `project.dataset.my_tbl` AS t2
                ON t2.ThingName = t1.ThingName
                AND t2.date <= t1.date
                GROUP BY t1.ThingName, t1.date, t1.value


                and now you can use it to update your table as in example below



                #standardSQL
                UPDATE `project.dataset.my_tbl` t
                SET value = new_value
                FROM (
                SELECT TO_JSON_STRING(t1) AS id,
                ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] new_value
                FROM `project.dataset.my_tbl` AS t1
                LEFT JOIN `project.dataset.my_tbl` AS t2
                ON t2.ThingName = t1.ThingName
                AND t2.date <= t1.date
                GROUP BY id
                )
                WHERE TO_JSON_STRING(t) = id






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Dec 27 at 18:29

























                answered Dec 27 at 18:14









                Mikhail Berlyant

                55.5k43368




                55.5k43368

























                    3














                    In BigQuery, updates are rather rare. The logic you seem to want is:



                    select t.*,
                    coalesce(value,
                    lag(value ignore nulls) over (partition by thingname order by date)
                    ) as value
                    from my_tbl;


                    I don't really see a reason to save this back in the table.






                    share|improve this answer





















                    • Thank you. that's something too smart for me. trying to understand how this works O_o
                      – Kosmos
                      Dec 27 at 15:12












                    • @Kosmos . . . I recommend that you learn about lag() and other window functions. They are quite useful. The BQ documentation is a place to start although there are other resources available as well.
                      – Gordon Linoff
                      Dec 27 at 15:46
















                    3














                    In BigQuery, updates are rather rare. The logic you seem to want is:



                    select t.*,
                    coalesce(value,
                    lag(value ignore nulls) over (partition by thingname order by date)
                    ) as value
                    from my_tbl;


                    I don't really see a reason to save this back in the table.






                    share|improve this answer





















                    • Thank you. that's something too smart for me. trying to understand how this works O_o
                      – Kosmos
                      Dec 27 at 15:12












                    • @Kosmos . . . I recommend that you learn about lag() and other window functions. They are quite useful. The BQ documentation is a place to start although there are other resources available as well.
                      – Gordon Linoff
                      Dec 27 at 15:46














                    3












                    3








                    3






                    In BigQuery, updates are rather rare. The logic you seem to want is:



                    select t.*,
                    coalesce(value,
                    lag(value ignore nulls) over (partition by thingname order by date)
                    ) as value
                    from my_tbl;


                    I don't really see a reason to save this back in the table.






                    share|improve this answer












                    In BigQuery, updates are rather rare. The logic you seem to want is:



                    select t.*,
                    coalesce(value,
                    lag(value ignore nulls) over (partition by thingname order by date)
                    ) as value
                    from my_tbl;


                    I don't really see a reason to save this back in the table.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Dec 27 at 15:01









                    Gordon Linoff

                    758k35291399




                    758k35291399












                    • Thank you. that's something too smart for me. trying to understand how this works O_o
                      – Kosmos
                      Dec 27 at 15:12












                    • @Kosmos . . . I recommend that you learn about lag() and other window functions. They are quite useful. The BQ documentation is a place to start although there are other resources available as well.
                      – Gordon Linoff
                      Dec 27 at 15:46


















                    • Thank you. that's something too smart for me. trying to understand how this works O_o
                      – Kosmos
                      Dec 27 at 15:12












                    • @Kosmos . . . I recommend that you learn about lag() and other window functions. They are quite useful. The BQ documentation is a place to start although there are other resources available as well.
                      – Gordon Linoff
                      Dec 27 at 15:46
















                    Thank you. that's something too smart for me. trying to understand how this works O_o
                    – Kosmos
                    Dec 27 at 15:12






                    Thank you. that's something too smart for me. trying to understand how this works O_o
                    – Kosmos
                    Dec 27 at 15:12














                    @Kosmos . . . I recommend that you learn about lag() and other window functions. They are quite useful. The BQ documentation is a place to start although there are other resources available as well.
                    – Gordon Linoff
                    Dec 27 at 15:46




                    @Kosmos . . . I recommend that you learn about lag() and other window functions. They are quite useful. The BQ documentation is a place to start although there are other resources available as well.
                    – Gordon Linoff
                    Dec 27 at 15:46


















                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946643%2fhow-to-set-updating-rows-field-with-value-of-closest-to-it-by-date-another-fiel%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Monofisismo

                    Angular Downloading a file using contenturl with Basic Authentication

                    Olmecas