Scala: How to take any generic sequence as input to this method












0















Scala noob here. Still trying to learn the syntax.



I am trying to reduce the code I have to write to convert my test data into DataFrames. Here is what I have right now:



  def makeDf[T](seq: Seq[(Int, Int)], colNames: String*): Dataset[Row] = {
val context = session.sqlContext
import context.implicits._
seq.toDF(colNames: _*)
}


The problem is that the above method only takes a sequence of the shape Seq[(Int, Int)] as input. How do I make it take any sequence as input? I can change the inputs shape to Seq[AnyRef], but then the code fails to recognize the toDF call as valid symbol.



I am not able to figure out how to make this work. Any ideas? Thanks!










share|improve this question























  • As far as I know, Spark doesn't support AnyRef in udf()s..

    – stack0114106
    Jan 1 at 10:42











  • as i can see you took the generic type T but didn't used it and toDF method is on seq so what you can do is make it of type Seq[T] then it should work fine.

    – Raman Mishra
    Jan 1 at 11:24
















0















Scala noob here. Still trying to learn the syntax.



I am trying to reduce the code I have to write to convert my test data into DataFrames. Here is what I have right now:



  def makeDf[T](seq: Seq[(Int, Int)], colNames: String*): Dataset[Row] = {
val context = session.sqlContext
import context.implicits._
seq.toDF(colNames: _*)
}


The problem is that the above method only takes a sequence of the shape Seq[(Int, Int)] as input. How do I make it take any sequence as input? I can change the inputs shape to Seq[AnyRef], but then the code fails to recognize the toDF call as valid symbol.



I am not able to figure out how to make this work. Any ideas? Thanks!










share|improve this question























  • As far as I know, Spark doesn't support AnyRef in udf()s..

    – stack0114106
    Jan 1 at 10:42











  • as i can see you took the generic type T but didn't used it and toDF method is on seq so what you can do is make it of type Seq[T] then it should work fine.

    – Raman Mishra
    Jan 1 at 11:24














0












0








0








Scala noob here. Still trying to learn the syntax.



I am trying to reduce the code I have to write to convert my test data into DataFrames. Here is what I have right now:



  def makeDf[T](seq: Seq[(Int, Int)], colNames: String*): Dataset[Row] = {
val context = session.sqlContext
import context.implicits._
seq.toDF(colNames: _*)
}


The problem is that the above method only takes a sequence of the shape Seq[(Int, Int)] as input. How do I make it take any sequence as input? I can change the inputs shape to Seq[AnyRef], but then the code fails to recognize the toDF call as valid symbol.



I am not able to figure out how to make this work. Any ideas? Thanks!










share|improve this question














Scala noob here. Still trying to learn the syntax.



I am trying to reduce the code I have to write to convert my test data into DataFrames. Here is what I have right now:



  def makeDf[T](seq: Seq[(Int, Int)], colNames: String*): Dataset[Row] = {
val context = session.sqlContext
import context.implicits._
seq.toDF(colNames: _*)
}


The problem is that the above method only takes a sequence of the shape Seq[(Int, Int)] as input. How do I make it take any sequence as input? I can change the inputs shape to Seq[AnyRef], but then the code fails to recognize the toDF call as valid symbol.



I am not able to figure out how to make this work. Any ideas? Thanks!







scala apache-spark dataframe apache-spark-sql






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 1 at 9:10









NiyazNiyaz

29k51136177




29k51136177













  • As far as I know, Spark doesn't support AnyRef in udf()s..

    – stack0114106
    Jan 1 at 10:42











  • as i can see you took the generic type T but didn't used it and toDF method is on seq so what you can do is make it of type Seq[T] then it should work fine.

    – Raman Mishra
    Jan 1 at 11:24



















  • As far as I know, Spark doesn't support AnyRef in udf()s..

    – stack0114106
    Jan 1 at 10:42











  • as i can see you took the generic type T but didn't used it and toDF method is on seq so what you can do is make it of type Seq[T] then it should work fine.

    – Raman Mishra
    Jan 1 at 11:24

















As far as I know, Spark doesn't support AnyRef in udf()s..

– stack0114106
Jan 1 at 10:42





As far as I know, Spark doesn't support AnyRef in udf()s..

– stack0114106
Jan 1 at 10:42













as i can see you took the generic type T but didn't used it and toDF method is on seq so what you can do is make it of type Seq[T] then it should work fine.

– Raman Mishra
Jan 1 at 11:24





as i can see you took the generic type T but didn't used it and toDF method is on seq so what you can do is make it of type Seq[T] then it should work fine.

– Raman Mishra
Jan 1 at 11:24












2 Answers
2






active

oldest

votes


















1














Short answer:



import scala.reflect.runtime.universe.TypeTag

def makeDf[T <: Product: TypeTag](seq: Seq[T], colNames: String*): DataFrame = ...


Explanation:



When you are calling seq.toDF you are actually using an implicit defined in SQLImplicits:



implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {
DatasetHolder(_sqlContext.createDataset(s))
}


which in turn requires the generation of an encoder. The problem is that encoders are defined only on certain types. Specifically Product (i.e. tuple, case class etc.) You also need to add the TypeTag implicit so that Scala can get over the type erasure (in the runtime all Sequences have the type sequence regardless of the generics type. TypeTag provides information on this).



As a side node, you do not need to extract sqlcontext from the session, you can simply use:



import sparkSession.implicits._





share|improve this answer
























  • Super cool. Thank you!

    – Niyaz
    Jan 1 at 15:41



















0














As @AssafMendelson already explained the real reason of why you cannot create a Dataset of Any is because Spark needs an Encoder to transform objects from they JVM representation to its internal representation - and Spark cannot guarantee the generation of such Encoder for Any type.



Assaf answers is correct, and will work.

However, IMHO, it is too much restrictive as it will only work for Products (tuples, and case classes) - and even if that includes most use cases, there still a few ones excluded.



Since, what you really need is an Encoder, you may leave that responsibility to the client. Which in most situation will only need to call import spark.implicits._ to get them in scope.

Thus, this is what I believe will be the most general solution.



import org.apache.spark.sql.{DataFrame, Dataset, Encoder, SparkSession}

// Implicit SparkSession to make the call to further methods more transparent.
implicit val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

def makeDf[T: Encoder](seq: Seq[T], colNames: String*)
(implicit spark: SparkSession): DataFrame =
spark.createDataset(seq).toDF(colNames: _*)

def makeDS[T: Encoder](seq: Seq[T])
(implicit spark: SparkSession): Dataset[T] =
spark.createDataset(seq)


Note: This is basically re-inventing the already defined functions from Spark.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994247%2fscala-how-to-take-any-generic-sequence-as-input-to-this-method%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Short answer:



    import scala.reflect.runtime.universe.TypeTag

    def makeDf[T <: Product: TypeTag](seq: Seq[T], colNames: String*): DataFrame = ...


    Explanation:



    When you are calling seq.toDF you are actually using an implicit defined in SQLImplicits:



    implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {
    DatasetHolder(_sqlContext.createDataset(s))
    }


    which in turn requires the generation of an encoder. The problem is that encoders are defined only on certain types. Specifically Product (i.e. tuple, case class etc.) You also need to add the TypeTag implicit so that Scala can get over the type erasure (in the runtime all Sequences have the type sequence regardless of the generics type. TypeTag provides information on this).



    As a side node, you do not need to extract sqlcontext from the session, you can simply use:



    import sparkSession.implicits._





    share|improve this answer
























    • Super cool. Thank you!

      – Niyaz
      Jan 1 at 15:41
















    1














    Short answer:



    import scala.reflect.runtime.universe.TypeTag

    def makeDf[T <: Product: TypeTag](seq: Seq[T], colNames: String*): DataFrame = ...


    Explanation:



    When you are calling seq.toDF you are actually using an implicit defined in SQLImplicits:



    implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {
    DatasetHolder(_sqlContext.createDataset(s))
    }


    which in turn requires the generation of an encoder. The problem is that encoders are defined only on certain types. Specifically Product (i.e. tuple, case class etc.) You also need to add the TypeTag implicit so that Scala can get over the type erasure (in the runtime all Sequences have the type sequence regardless of the generics type. TypeTag provides information on this).



    As a side node, you do not need to extract sqlcontext from the session, you can simply use:



    import sparkSession.implicits._





    share|improve this answer
























    • Super cool. Thank you!

      – Niyaz
      Jan 1 at 15:41














    1












    1








    1







    Short answer:



    import scala.reflect.runtime.universe.TypeTag

    def makeDf[T <: Product: TypeTag](seq: Seq[T], colNames: String*): DataFrame = ...


    Explanation:



    When you are calling seq.toDF you are actually using an implicit defined in SQLImplicits:



    implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {
    DatasetHolder(_sqlContext.createDataset(s))
    }


    which in turn requires the generation of an encoder. The problem is that encoders are defined only on certain types. Specifically Product (i.e. tuple, case class etc.) You also need to add the TypeTag implicit so that Scala can get over the type erasure (in the runtime all Sequences have the type sequence regardless of the generics type. TypeTag provides information on this).



    As a side node, you do not need to extract sqlcontext from the session, you can simply use:



    import sparkSession.implicits._





    share|improve this answer













    Short answer:



    import scala.reflect.runtime.universe.TypeTag

    def makeDf[T <: Product: TypeTag](seq: Seq[T], colNames: String*): DataFrame = ...


    Explanation:



    When you are calling seq.toDF you are actually using an implicit defined in SQLImplicits:



    implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {
    DatasetHolder(_sqlContext.createDataset(s))
    }


    which in turn requires the generation of an encoder. The problem is that encoders are defined only on certain types. Specifically Product (i.e. tuple, case class etc.) You also need to add the TypeTag implicit so that Scala can get over the type erasure (in the runtime all Sequences have the type sequence regardless of the generics type. TypeTag provides information on this).



    As a side node, you do not need to extract sqlcontext from the session, you can simply use:



    import sparkSession.implicits._






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 1 at 11:34









    Assaf MendelsonAssaf Mendelson

    7,30011831




    7,30011831













    • Super cool. Thank you!

      – Niyaz
      Jan 1 at 15:41



















    • Super cool. Thank you!

      – Niyaz
      Jan 1 at 15:41

















    Super cool. Thank you!

    – Niyaz
    Jan 1 at 15:41





    Super cool. Thank you!

    – Niyaz
    Jan 1 at 15:41













    0














    As @AssafMendelson already explained the real reason of why you cannot create a Dataset of Any is because Spark needs an Encoder to transform objects from they JVM representation to its internal representation - and Spark cannot guarantee the generation of such Encoder for Any type.



    Assaf answers is correct, and will work.

    However, IMHO, it is too much restrictive as it will only work for Products (tuples, and case classes) - and even if that includes most use cases, there still a few ones excluded.



    Since, what you really need is an Encoder, you may leave that responsibility to the client. Which in most situation will only need to call import spark.implicits._ to get them in scope.

    Thus, this is what I believe will be the most general solution.



    import org.apache.spark.sql.{DataFrame, Dataset, Encoder, SparkSession}

    // Implicit SparkSession to make the call to further methods more transparent.
    implicit val spark = SparkSession.builder.master("local[*]").getOrCreate()
    import spark.implicits._

    def makeDf[T: Encoder](seq: Seq[T], colNames: String*)
    (implicit spark: SparkSession): DataFrame =
    spark.createDataset(seq).toDF(colNames: _*)

    def makeDS[T: Encoder](seq: Seq[T])
    (implicit spark: SparkSession): Dataset[T] =
    spark.createDataset(seq)


    Note: This is basically re-inventing the already defined functions from Spark.






    share|improve this answer




























      0














      As @AssafMendelson already explained the real reason of why you cannot create a Dataset of Any is because Spark needs an Encoder to transform objects from they JVM representation to its internal representation - and Spark cannot guarantee the generation of such Encoder for Any type.



      Assaf answers is correct, and will work.

      However, IMHO, it is too much restrictive as it will only work for Products (tuples, and case classes) - and even if that includes most use cases, there still a few ones excluded.



      Since, what you really need is an Encoder, you may leave that responsibility to the client. Which in most situation will only need to call import spark.implicits._ to get them in scope.

      Thus, this is what I believe will be the most general solution.



      import org.apache.spark.sql.{DataFrame, Dataset, Encoder, SparkSession}

      // Implicit SparkSession to make the call to further methods more transparent.
      implicit val spark = SparkSession.builder.master("local[*]").getOrCreate()
      import spark.implicits._

      def makeDf[T: Encoder](seq: Seq[T], colNames: String*)
      (implicit spark: SparkSession): DataFrame =
      spark.createDataset(seq).toDF(colNames: _*)

      def makeDS[T: Encoder](seq: Seq[T])
      (implicit spark: SparkSession): Dataset[T] =
      spark.createDataset(seq)


      Note: This is basically re-inventing the already defined functions from Spark.






      share|improve this answer


























        0












        0








        0







        As @AssafMendelson already explained the real reason of why you cannot create a Dataset of Any is because Spark needs an Encoder to transform objects from they JVM representation to its internal representation - and Spark cannot guarantee the generation of such Encoder for Any type.



        Assaf answers is correct, and will work.

        However, IMHO, it is too much restrictive as it will only work for Products (tuples, and case classes) - and even if that includes most use cases, there still a few ones excluded.



        Since, what you really need is an Encoder, you may leave that responsibility to the client. Which in most situation will only need to call import spark.implicits._ to get them in scope.

        Thus, this is what I believe will be the most general solution.



        import org.apache.spark.sql.{DataFrame, Dataset, Encoder, SparkSession}

        // Implicit SparkSession to make the call to further methods more transparent.
        implicit val spark = SparkSession.builder.master("local[*]").getOrCreate()
        import spark.implicits._

        def makeDf[T: Encoder](seq: Seq[T], colNames: String*)
        (implicit spark: SparkSession): DataFrame =
        spark.createDataset(seq).toDF(colNames: _*)

        def makeDS[T: Encoder](seq: Seq[T])
        (implicit spark: SparkSession): Dataset[T] =
        spark.createDataset(seq)


        Note: This is basically re-inventing the already defined functions from Spark.






        share|improve this answer













        As @AssafMendelson already explained the real reason of why you cannot create a Dataset of Any is because Spark needs an Encoder to transform objects from they JVM representation to its internal representation - and Spark cannot guarantee the generation of such Encoder for Any type.



        Assaf answers is correct, and will work.

        However, IMHO, it is too much restrictive as it will only work for Products (tuples, and case classes) - and even if that includes most use cases, there still a few ones excluded.



        Since, what you really need is an Encoder, you may leave that responsibility to the client. Which in most situation will only need to call import spark.implicits._ to get them in scope.

        Thus, this is what I believe will be the most general solution.



        import org.apache.spark.sql.{DataFrame, Dataset, Encoder, SparkSession}

        // Implicit SparkSession to make the call to further methods more transparent.
        implicit val spark = SparkSession.builder.master("local[*]").getOrCreate()
        import spark.implicits._

        def makeDf[T: Encoder](seq: Seq[T], colNames: String*)
        (implicit spark: SparkSession): DataFrame =
        spark.createDataset(seq).toDF(colNames: _*)

        def makeDS[T: Encoder](seq: Seq[T])
        (implicit spark: SparkSession): Dataset[T] =
        spark.createDataset(seq)


        Note: This is basically re-inventing the already defined functions from Spark.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 1 at 15:46









        Luis Miguel Mejía SuárezLuis Miguel Mejía Suárez

        2,6121822




        2,6121822






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994247%2fscala-how-to-take-any-generic-sequence-as-input-to-this-method%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas