How to replace value by NAN in spark data frame (problem is parallization)





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







-1















Task :
Let df be a spark data frame. We want to replace a value n in df by NA.



In R I would simply write



df[df==n] <- NA


Problems / questions :
(as I am new to Spark any comment is welcome)




  • What is the equivalent in SparkR to NA?
    I found functions like isNull and isNAN and I am confused if there are some differences.


I was able to do it on one column col1 using ifelse, i.e.



df[[col1]] <- ifelse( df[[col1]] == n, NA, df[[x]])


but I was not able to "parallize" it.



I tried :



df <- spark.lapply(colnamed(df), function(x) {ifelse(df[[x]] == n, NA , df[[x]])})


but I got the message




Job aborted due to stage failure




which I do not understand.










share|improve this question































    -1















    Task :
    Let df be a spark data frame. We want to replace a value n in df by NA.



    In R I would simply write



    df[df==n] <- NA


    Problems / questions :
    (as I am new to Spark any comment is welcome)




    • What is the equivalent in SparkR to NA?
      I found functions like isNull and isNAN and I am confused if there are some differences.


    I was able to do it on one column col1 using ifelse, i.e.



    df[[col1]] <- ifelse( df[[col1]] == n, NA, df[[x]])


    but I was not able to "parallize" it.



    I tried :



    df <- spark.lapply(colnamed(df), function(x) {ifelse(df[[x]] == n, NA , df[[x]])})


    but I got the message




    Job aborted due to stage failure




    which I do not understand.










    share|improve this question



























      -1












      -1








      -1








      Task :
      Let df be a spark data frame. We want to replace a value n in df by NA.



      In R I would simply write



      df[df==n] <- NA


      Problems / questions :
      (as I am new to Spark any comment is welcome)




      • What is the equivalent in SparkR to NA?
        I found functions like isNull and isNAN and I am confused if there are some differences.


      I was able to do it on one column col1 using ifelse, i.e.



      df[[col1]] <- ifelse( df[[col1]] == n, NA, df[[x]])


      but I was not able to "parallize" it.



      I tried :



      df <- spark.lapply(colnamed(df), function(x) {ifelse(df[[x]] == n, NA , df[[x]])})


      but I got the message




      Job aborted due to stage failure




      which I do not understand.










      share|improve this question
















      Task :
      Let df be a spark data frame. We want to replace a value n in df by NA.



      In R I would simply write



      df[df==n] <- NA


      Problems / questions :
      (as I am new to Spark any comment is welcome)




      • What is the equivalent in SparkR to NA?
        I found functions like isNull and isNAN and I am confused if there are some differences.


      I was able to do it on one column col1 using ifelse, i.e.



      df[[col1]] <- ifelse( df[[col1]] == n, NA, df[[x]])


      but I was not able to "parallize" it.



      I tried :



      df <- spark.lapply(colnamed(df), function(x) {ifelse(df[[x]] == n, NA , df[[x]])})


      but I got the message




      Job aborted due to stage failure




      which I do not understand.







      r validation na sparkr






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 5 at 0:19









      AS Mackay

      2,00961221




      2,00961221










      asked Jan 4 at 17:10









      ChristianChristian

      1




      1
























          1 Answer
          1






          active

          oldest

          votes


















          0














          Some solutions that may help troubleshoot that error
          Job aborted due to stage failure: Task from application



          how-to-handle-null-entries-in-sparkr
          Add a column full of NAs in Sparkr



          SparkR API






          share|improve this answer
























          • Thank you for your answer. BUT beside the first link none is dealing with the problem / task, i.e. 1) how to apply a user defined function in general ? sparkr.lapply is for example mentioned in your link to the Sparkr documentation, so why does my "code" does not work ? where is my lack in understanding ? 2) as for somebody who is familiar with R I thought there might exists a easy solution for such a specific problem in sparkr

            – Christian
            Jan 5 at 9:20











          • I don't see 1 being asked anywhere in the question. Read the SparkR API you may find some clues there.

            – Marc0
            Jan 6 at 21:17














          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54043305%2fhow-to-replace-value-by-nan-in-spark-data-frame-problem-is-parallization%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          Some solutions that may help troubleshoot that error
          Job aborted due to stage failure: Task from application



          how-to-handle-null-entries-in-sparkr
          Add a column full of NAs in Sparkr



          SparkR API






          share|improve this answer
























          • Thank you for your answer. BUT beside the first link none is dealing with the problem / task, i.e. 1) how to apply a user defined function in general ? sparkr.lapply is for example mentioned in your link to the Sparkr documentation, so why does my "code" does not work ? where is my lack in understanding ? 2) as for somebody who is familiar with R I thought there might exists a easy solution for such a specific problem in sparkr

            – Christian
            Jan 5 at 9:20











          • I don't see 1 being asked anywhere in the question. Read the SparkR API you may find some clues there.

            – Marc0
            Jan 6 at 21:17


















          0














          Some solutions that may help troubleshoot that error
          Job aborted due to stage failure: Task from application



          how-to-handle-null-entries-in-sparkr
          Add a column full of NAs in Sparkr



          SparkR API






          share|improve this answer
























          • Thank you for your answer. BUT beside the first link none is dealing with the problem / task, i.e. 1) how to apply a user defined function in general ? sparkr.lapply is for example mentioned in your link to the Sparkr documentation, so why does my "code" does not work ? where is my lack in understanding ? 2) as for somebody who is familiar with R I thought there might exists a easy solution for such a specific problem in sparkr

            – Christian
            Jan 5 at 9:20











          • I don't see 1 being asked anywhere in the question. Read the SparkR API you may find some clues there.

            – Marc0
            Jan 6 at 21:17
















          0












          0








          0







          Some solutions that may help troubleshoot that error
          Job aborted due to stage failure: Task from application



          how-to-handle-null-entries-in-sparkr
          Add a column full of NAs in Sparkr



          SparkR API






          share|improve this answer













          Some solutions that may help troubleshoot that error
          Job aborted due to stage failure: Task from application



          how-to-handle-null-entries-in-sparkr
          Add a column full of NAs in Sparkr



          SparkR API







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 4 at 17:42









          Marc0Marc0

          1116




          1116













          • Thank you for your answer. BUT beside the first link none is dealing with the problem / task, i.e. 1) how to apply a user defined function in general ? sparkr.lapply is for example mentioned in your link to the Sparkr documentation, so why does my "code" does not work ? where is my lack in understanding ? 2) as for somebody who is familiar with R I thought there might exists a easy solution for such a specific problem in sparkr

            – Christian
            Jan 5 at 9:20











          • I don't see 1 being asked anywhere in the question. Read the SparkR API you may find some clues there.

            – Marc0
            Jan 6 at 21:17





















          • Thank you for your answer. BUT beside the first link none is dealing with the problem / task, i.e. 1) how to apply a user defined function in general ? sparkr.lapply is for example mentioned in your link to the Sparkr documentation, so why does my "code" does not work ? where is my lack in understanding ? 2) as for somebody who is familiar with R I thought there might exists a easy solution for such a specific problem in sparkr

            – Christian
            Jan 5 at 9:20











          • I don't see 1 being asked anywhere in the question. Read the SparkR API you may find some clues there.

            – Marc0
            Jan 6 at 21:17



















          Thank you for your answer. BUT beside the first link none is dealing with the problem / task, i.e. 1) how to apply a user defined function in general ? sparkr.lapply is for example mentioned in your link to the Sparkr documentation, so why does my "code" does not work ? where is my lack in understanding ? 2) as for somebody who is familiar with R I thought there might exists a easy solution for such a specific problem in sparkr

          – Christian
          Jan 5 at 9:20





          Thank you for your answer. BUT beside the first link none is dealing with the problem / task, i.e. 1) how to apply a user defined function in general ? sparkr.lapply is for example mentioned in your link to the Sparkr documentation, so why does my "code" does not work ? where is my lack in understanding ? 2) as for somebody who is familiar with R I thought there might exists a easy solution for such a specific problem in sparkr

          – Christian
          Jan 5 at 9:20













          I don't see 1 being asked anywhere in the question. Read the SparkR API you may find some clues there.

          – Marc0
          Jan 6 at 21:17







          I don't see 1 being asked anywhere in the question. Read the SparkR API you may find some clues there.

          – Marc0
          Jan 6 at 21:17






















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54043305%2fhow-to-replace-value-by-nan-in-spark-data-frame-problem-is-parallization%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Angular Downloading a file using contenturl with Basic Authentication

          Olmecas

          Can't read property showImagePicker of undefined in react native iOS