Create new dataset using existing dataset by adding null column in-between two columns





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















I created a dataset in Spark using Java by reading a csv file. Following is my initial dataset:



+---+----------+-----+---+
|_c0| _c1| _c2|_c3|
+---+----------+-----+---+
| 1|9090999999|NANDU| 22|
| 2|9999999999| SANU| 21|
| 3|9999909090| MANU| 22|
| 4|9090909090|VEENA| 23|
+---+----------+-----+---+


I want to create dataframe as follows (one column having null values):



+---+----+--------+
|_c0| _c1| _c2|
+---+----|--------+
| 1|null| NANDU|
| 2|null| SANU|
| 3|null| MANU|
| 4|null| VEENA|
+---+----|--------+


Following is my existing code:



Dataset<Row> ds  = spark.read().format("csv").option("header", "false").load("/home/nandu/Data.txt");
Column selectedColumns = new Column[2];
selectedColumns[0]= new Column("_c0");
selectedColumns[1]= new Column("_c2");
ds2 = ds.select(selectedColumns);


which will create dataset as follows.



+---+-----+
|_c0| _c2|
+---+-----+
| 1|NANDU|
| 2| SANU|
| 3| MANU|
| 4|VEENA|
+---+-----+









share|improve this question































    1















    I created a dataset in Spark using Java by reading a csv file. Following is my initial dataset:



    +---+----------+-----+---+
    |_c0| _c1| _c2|_c3|
    +---+----------+-----+---+
    | 1|9090999999|NANDU| 22|
    | 2|9999999999| SANU| 21|
    | 3|9999909090| MANU| 22|
    | 4|9090909090|VEENA| 23|
    +---+----------+-----+---+


    I want to create dataframe as follows (one column having null values):



    +---+----+--------+
    |_c0| _c1| _c2|
    +---+----|--------+
    | 1|null| NANDU|
    | 2|null| SANU|
    | 3|null| MANU|
    | 4|null| VEENA|
    +---+----|--------+


    Following is my existing code:



    Dataset<Row> ds  = spark.read().format("csv").option("header", "false").load("/home/nandu/Data.txt");
    Column selectedColumns = new Column[2];
    selectedColumns[0]= new Column("_c0");
    selectedColumns[1]= new Column("_c2");
    ds2 = ds.select(selectedColumns);


    which will create dataset as follows.



    +---+-----+
    |_c0| _c2|
    +---+-----+
    | 1|NANDU|
    | 2| SANU|
    | 3| MANU|
    | 4|VEENA|
    +---+-----+









    share|improve this question



























      1












      1








      1








      I created a dataset in Spark using Java by reading a csv file. Following is my initial dataset:



      +---+----------+-----+---+
      |_c0| _c1| _c2|_c3|
      +---+----------+-----+---+
      | 1|9090999999|NANDU| 22|
      | 2|9999999999| SANU| 21|
      | 3|9999909090| MANU| 22|
      | 4|9090909090|VEENA| 23|
      +---+----------+-----+---+


      I want to create dataframe as follows (one column having null values):



      +---+----+--------+
      |_c0| _c1| _c2|
      +---+----|--------+
      | 1|null| NANDU|
      | 2|null| SANU|
      | 3|null| MANU|
      | 4|null| VEENA|
      +---+----|--------+


      Following is my existing code:



      Dataset<Row> ds  = spark.read().format("csv").option("header", "false").load("/home/nandu/Data.txt");
      Column selectedColumns = new Column[2];
      selectedColumns[0]= new Column("_c0");
      selectedColumns[1]= new Column("_c2");
      ds2 = ds.select(selectedColumns);


      which will create dataset as follows.



      +---+-----+
      |_c0| _c2|
      +---+-----+
      | 1|NANDU|
      | 2| SANU|
      | 3| MANU|
      | 4|VEENA|
      +---+-----+









      share|improve this question
















      I created a dataset in Spark using Java by reading a csv file. Following is my initial dataset:



      +---+----------+-----+---+
      |_c0| _c1| _c2|_c3|
      +---+----------+-----+---+
      | 1|9090999999|NANDU| 22|
      | 2|9999999999| SANU| 21|
      | 3|9999909090| MANU| 22|
      | 4|9090909090|VEENA| 23|
      +---+----------+-----+---+


      I want to create dataframe as follows (one column having null values):



      +---+----+--------+
      |_c0| _c1| _c2|
      +---+----|--------+
      | 1|null| NANDU|
      | 2|null| SANU|
      | 3|null| MANU|
      | 4|null| VEENA|
      +---+----|--------+


      Following is my existing code:



      Dataset<Row> ds  = spark.read().format("csv").option("header", "false").load("/home/nandu/Data.txt");
      Column selectedColumns = new Column[2];
      selectedColumns[0]= new Column("_c0");
      selectedColumns[1]= new Column("_c2");
      ds2 = ds.select(selectedColumns);


      which will create dataset as follows.



      +---+-----+
      |_c0| _c2|
      +---+-----+
      | 1|NANDU|
      | 2| SANU|
      | 3| MANU|
      | 4|VEENA|
      +---+-----+






      java apache-spark apache-spark-sql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 4 at 6:55









      Shaido

      13.1k123044




      13.1k123044










      asked Jan 4 at 6:42









      NanduNandu

      346




      346
























          3 Answers
          3






          active

          oldest

          votes


















          2














          To select the two columns you want and add a new one with nulls you can use the following:



          import org.apache.spark.sql.functions.*;
          import org.apache.spark.sql.types.StringType;

          ds.select({col("_c0"), lit(null).cast(DataTypes.StringType).as("_c1"), col("_c2")});





          share|improve this answer































            1














            Try Following code



            import org.apache.spark.sql.functions.{ lit => flit}
            import org.apache.spark.sql.types._
            val ds = spark.range(100).withColumn("c2",$"id")
            ds.withColumn("new_col",flit(null: String)).selectExpr("id","new_col","c2").show(5)


            Hope this Helps



            Cheers :)






            share|improve this answer































              1














              Adding new column with string null value may solve the problem. Try the following code although it's written in scala but you'll get the idea:



              import org.apache.spark.sql.functions.lit
              import org.apache.spark.sql.types.StringType
              val ds2 = ds.withColumn("new_col", lit(null).cast(StringType)).selectExpr("_c0", "new_col as _c1", "_c2")





              share|improve this answer
























                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54034155%2fcreate-new-dataset-using-existing-dataset-by-adding-null-column-in-between-two-c%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                2














                To select the two columns you want and add a new one with nulls you can use the following:



                import org.apache.spark.sql.functions.*;
                import org.apache.spark.sql.types.StringType;

                ds.select({col("_c0"), lit(null).cast(DataTypes.StringType).as("_c1"), col("_c2")});





                share|improve this answer




























                  2














                  To select the two columns you want and add a new one with nulls you can use the following:



                  import org.apache.spark.sql.functions.*;
                  import org.apache.spark.sql.types.StringType;

                  ds.select({col("_c0"), lit(null).cast(DataTypes.StringType).as("_c1"), col("_c2")});





                  share|improve this answer


























                    2












                    2








                    2







                    To select the two columns you want and add a new one with nulls you can use the following:



                    import org.apache.spark.sql.functions.*;
                    import org.apache.spark.sql.types.StringType;

                    ds.select({col("_c0"), lit(null).cast(DataTypes.StringType).as("_c1"), col("_c2")});





                    share|improve this answer













                    To select the two columns you want and add a new one with nulls you can use the following:



                    import org.apache.spark.sql.functions.*;
                    import org.apache.spark.sql.types.StringType;

                    ds.select({col("_c0"), lit(null).cast(DataTypes.StringType).as("_c1"), col("_c2")});






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Jan 4 at 6:52









                    ShaidoShaido

                    13.1k123044




                    13.1k123044

























                        1














                        Try Following code



                        import org.apache.spark.sql.functions.{ lit => flit}
                        import org.apache.spark.sql.types._
                        val ds = spark.range(100).withColumn("c2",$"id")
                        ds.withColumn("new_col",flit(null: String)).selectExpr("id","new_col","c2").show(5)


                        Hope this Helps



                        Cheers :)






                        share|improve this answer




























                          1














                          Try Following code



                          import org.apache.spark.sql.functions.{ lit => flit}
                          import org.apache.spark.sql.types._
                          val ds = spark.range(100).withColumn("c2",$"id")
                          ds.withColumn("new_col",flit(null: String)).selectExpr("id","new_col","c2").show(5)


                          Hope this Helps



                          Cheers :)






                          share|improve this answer


























                            1












                            1








                            1







                            Try Following code



                            import org.apache.spark.sql.functions.{ lit => flit}
                            import org.apache.spark.sql.types._
                            val ds = spark.range(100).withColumn("c2",$"id")
                            ds.withColumn("new_col",flit(null: String)).selectExpr("id","new_col","c2").show(5)


                            Hope this Helps



                            Cheers :)






                            share|improve this answer













                            Try Following code



                            import org.apache.spark.sql.functions.{ lit => flit}
                            import org.apache.spark.sql.types._
                            val ds = spark.range(100).withColumn("c2",$"id")
                            ds.withColumn("new_col",flit(null: String)).selectExpr("id","new_col","c2").show(5)


                            Hope this Helps



                            Cheers :)







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Jan 4 at 7:16









                            Harjeet KumarHarjeet Kumar

                            3115




                            3115























                                1














                                Adding new column with string null value may solve the problem. Try the following code although it's written in scala but you'll get the idea:



                                import org.apache.spark.sql.functions.lit
                                import org.apache.spark.sql.types.StringType
                                val ds2 = ds.withColumn("new_col", lit(null).cast(StringType)).selectExpr("_c0", "new_col as _c1", "_c2")





                                share|improve this answer




























                                  1














                                  Adding new column with string null value may solve the problem. Try the following code although it's written in scala but you'll get the idea:



                                  import org.apache.spark.sql.functions.lit
                                  import org.apache.spark.sql.types.StringType
                                  val ds2 = ds.withColumn("new_col", lit(null).cast(StringType)).selectExpr("_c0", "new_col as _c1", "_c2")





                                  share|improve this answer


























                                    1












                                    1








                                    1







                                    Adding new column with string null value may solve the problem. Try the following code although it's written in scala but you'll get the idea:



                                    import org.apache.spark.sql.functions.lit
                                    import org.apache.spark.sql.types.StringType
                                    val ds2 = ds.withColumn("new_col", lit(null).cast(StringType)).selectExpr("_c0", "new_col as _c1", "_c2")





                                    share|improve this answer













                                    Adding new column with string null value may solve the problem. Try the following code although it's written in scala but you'll get the idea:



                                    import org.apache.spark.sql.functions.lit
                                    import org.apache.spark.sql.types.StringType
                                    val ds2 = ds.withColumn("new_col", lit(null).cast(StringType)).selectExpr("_c0", "new_col as _c1", "_c2")






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Jan 4 at 8:35









                                    Md Shihab UddinMd Shihab Uddin

                                    283212




                                    283212






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54034155%2fcreate-new-dataset-using-existing-dataset-by-adding-null-column-in-between-two-c%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Monofisismo

                                        Angular Downloading a file using contenturl with Basic Authentication

                                        Olmecas