Scala: How to take any generic sequence as input to this method

Scala noob here. Still trying to learn the syntax.

I am trying to reduce the code I have to write to convert my test data into DataFrames. Here is what I have right now:

  def makeDf[T](seq: Seq[(Int, Int)], colNames: String*): Dataset[Row] = {

    val context = session.sqlContext

    import context.implicits._

    seq.toDF(colNames: _*)

  }

The problem is that the above method only takes a sequence of the shape Seq[(Int, Int)] as input. How do I make it take any sequence as input? I can change the inputs shape to Seq[AnyRef], but then the code fails to recognize the toDF call as valid symbol.

I am not able to figure out how to make this work. Any ideas? Thanks!

asked Jan 1 at 9:10

Niyaz

29k51136177

As far as I know, Spark doesn't support AnyRef in udf()s..

– stack0114106
Jan 1 at 10:42

as i can see you took the generic type T but didn't used it and toDF method is on seq so what you can do is make it of type Seq[T] then it should work fine.

– Raman Mishra
Jan 1 at 11:24

add a comment |

Scala noob here. Still trying to learn the syntax.

I am trying to reduce the code I have to write to convert my test data into DataFrames. Here is what I have right now:

  def makeDf[T](seq: Seq[(Int, Int)], colNames: String*): Dataset[Row] = {

    val context = session.sqlContext

    import context.implicits._

    seq.toDF(colNames: _*)

  }

I am not able to figure out how to make this work. Any ideas? Thanks!

asked Jan 1 at 9:10

Niyaz

29k51136177

As far as I know, Spark doesn't support AnyRef in udf()s..

– stack0114106
Jan 1 at 10:42

as i can see you took the generic type T but didn't used it and toDF method is on seq so what you can do is make it of type Seq[T] then it should work fine.

– Raman Mishra
Jan 1 at 11:24

add a comment |

Scala noob here. Still trying to learn the syntax.

I am trying to reduce the code I have to write to convert my test data into DataFrames. Here is what I have right now:

  def makeDf[T](seq: Seq[(Int, Int)], colNames: String*): Dataset[Row] = {

    val context = session.sqlContext

    import context.implicits._

    seq.toDF(colNames: _*)

  }

I am not able to figure out how to make this work. Any ideas? Thanks!

asked Jan 1 at 9:10

Niyaz

29k51136177

Scala noob here. Still trying to learn the syntax.

I am trying to reduce the code I have to write to convert my test data into DataFrames. Here is what I have right now:

  def makeDf[T](seq: Seq[(Int, Int)], colNames: String*): Dataset[Row] = {

    val context = session.sqlContext

    import context.implicits._

    seq.toDF(colNames: _*)

  }

I am not able to figure out how to make this work. Any ideas? Thanks!

scala apache-spark dataframe apache-spark-sql

asked Jan 1 at 9:10

Niyaz

29k51136177

asked Jan 1 at 9:10

Niyaz

29k51136177

asked Jan 1 at 9:10

Niyaz

29k51136177

asked Jan 1 at 9:10

Niyaz

29k51136177

asked Jan 1 at 9:10

Niyaz

29k51136177

As far as I know, Spark doesn't support AnyRef in udf()s..

– stack0114106
Jan 1 at 10:42

as i can see you took the generic type T but didn't used it and toDF method is on seq so what you can do is make it of type Seq[T] then it should work fine.

– Raman Mishra
Jan 1 at 11:24

add a comment |

As far as I know, Spark doesn't support AnyRef in udf()s..

– stack0114106
Jan 1 at 10:42

as i can see you took the generic type T but didn't used it and toDF method is on seq so what you can do is make it of type Seq[T] then it should work fine.

– Raman Mishra
Jan 1 at 11:24

As far as I know, Spark doesn't support AnyRef in udf()s..

– stack0114106
Jan 1 at 10:42

as i can see you took the generic type T but didn't used it and toDF method is on seq so what you can do is make it of type Seq[T] then it should work fine.

– Raman Mishra
Jan 1 at 11:24

add a comment |

2 Answers
2

active

oldest

votes

Short answer:

import scala.reflect.runtime.universe.TypeTag



def makeDf[T <: Product: TypeTag](seq: Seq[T], colNames: String*): DataFrame = ...

Explanation:

When you are calling seq.toDF you are actually using an implicit defined in SQLImplicits:

implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {

  DatasetHolder(_sqlContext.createDataset(s))

}

which in turn requires the generation of an encoder. The problem is that encoders are defined only on certain types. Specifically Product (i.e. tuple, case class etc.) You also need to add the TypeTag implicit so that Scala can get over the type erasure (in the runtime all Sequences have the type sequence regardless of the generics type. TypeTag provides information on this).

As a side node, you do not need to extract sqlcontext from the session, you can simply use:

import sparkSession.implicits._

answered Jan 1 at 11:34

Assaf Mendelson

7,30011831

Super cool. Thank you!

– Niyaz
Jan 1 at 15:41

add a comment |

As @AssafMendelson already explained the real reason of why you cannot create a Dataset of Any is because Spark needs an Encoder to transform objects from they JVM representation to its internal representation - and Spark cannot guarantee the generation of such Encoder for Any type.

Assaf answers is correct, and will work.

However, IMHO, it is too much restrictive as it will only work for Products (tuples, and case classes) - and even if that includes most use cases, there still a few ones excluded.

Since, what you really need is an Encoder, you may leave that responsibility to the client. Which in most situation will only need to call import spark.implicits._ to get them in scope.

Thus, this is what I believe will be the most general solution.

import org.apache.spark.sql.{DataFrame, Dataset, Encoder, SparkSession}



// Implicit SparkSession to make the call to further methods more transparent.

implicit val spark = SparkSession.builder.master("local[*]").getOrCreate()

import spark.implicits._



def makeDf[T: Encoder](seq: Seq[T], colNames: String*)

                      (implicit spark: SparkSession): DataFrame =

  spark.createDataset(seq).toDF(colNames: _*)



def makeDS[T: Encoder](seq: Seq[T])

                      (implicit spark: SparkSession): Dataset[T] =

  spark.createDataset(seq)

Note: This is basically re-inventing the already defined functions from Spark.

answered Jan 1 at 15:46

Luis Miguel Mejía Suárez

2,6121822

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994247%2fscala-how-to-take-any-generic-sequence-as-input-to-this-method%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Short answer:

import scala.reflect.runtime.universe.TypeTag



def makeDf[T <: Product: TypeTag](seq: Seq[T], colNames: String*): DataFrame = ...

Explanation:

When you are calling seq.toDF you are actually using an implicit defined in SQLImplicits:

implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {

  DatasetHolder(_sqlContext.createDataset(s))

}

As a side node, you do not need to extract sqlcontext from the session, you can simply use:

import sparkSession.implicits._

answered Jan 1 at 11:34

Assaf Mendelson

7,30011831

Super cool. Thank you!

– Niyaz
Jan 1 at 15:41

add a comment |

Short answer:

import scala.reflect.runtime.universe.TypeTag



def makeDf[T <: Product: TypeTag](seq: Seq[T], colNames: String*): DataFrame = ...

Explanation:

When you are calling seq.toDF you are actually using an implicit defined in SQLImplicits:

implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {

  DatasetHolder(_sqlContext.createDataset(s))

}

As a side node, you do not need to extract sqlcontext from the session, you can simply use:

import sparkSession.implicits._

answered Jan 1 at 11:34

Assaf Mendelson

7,30011831

Super cool. Thank you!

– Niyaz
Jan 1 at 15:41

add a comment |

Short answer:

import scala.reflect.runtime.universe.TypeTag



def makeDf[T <: Product: TypeTag](seq: Seq[T], colNames: String*): DataFrame = ...

Explanation:

When you are calling seq.toDF you are actually using an implicit defined in SQLImplicits:

implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {

  DatasetHolder(_sqlContext.createDataset(s))

}

As a side node, you do not need to extract sqlcontext from the session, you can simply use:

import sparkSession.implicits._

answered Jan 1 at 11:34

Assaf Mendelson

7,30011831

Short answer:

import scala.reflect.runtime.universe.TypeTag



def makeDf[T <: Product: TypeTag](seq: Seq[T], colNames: String*): DataFrame = ...

Explanation:

When you are calling seq.toDF you are actually using an implicit defined in SQLImplicits:

implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {

  DatasetHolder(_sqlContext.createDataset(s))

}

As a side node, you do not need to extract sqlcontext from the session, you can simply use:

import sparkSession.implicits._

answered Jan 1 at 11:34

Assaf Mendelson

7,30011831

answered Jan 1 at 11:34

Assaf Mendelson

7,30011831

answered Jan 1 at 11:34

Assaf Mendelson

7,30011831

answered Jan 1 at 11:34

Assaf Mendelson

7,30011831

Super cool. Thank you!

– Niyaz
Jan 1 at 15:41

add a comment |

Super cool. Thank you!

– Niyaz
Jan 1 at 15:41

Super cool. Thank you!

– Niyaz
Jan 1 at 15:41

add a comment |

import org.apache.spark.sql.{DataFrame, Dataset, Encoder, SparkSession}



// Implicit SparkSession to make the call to further methods more transparent.

implicit val spark = SparkSession.builder.master("local[*]").getOrCreate()

import spark.implicits._



def makeDf[T: Encoder](seq: Seq[T], colNames: String*)

                      (implicit spark: SparkSession): DataFrame =

  spark.createDataset(seq).toDF(colNames: _*)



def makeDS[T: Encoder](seq: Seq[T])

                      (implicit spark: SparkSession): Dataset[T] =

  spark.createDataset(seq)

Note: This is basically re-inventing the already defined functions from Spark.

answered Jan 1 at 15:46

Luis Miguel Mejía Suárez

2,6121822

add a comment |

import org.apache.spark.sql.{DataFrame, Dataset, Encoder, SparkSession}



// Implicit SparkSession to make the call to further methods more transparent.

implicit val spark = SparkSession.builder.master("local[*]").getOrCreate()

import spark.implicits._



def makeDf[T: Encoder](seq: Seq[T], colNames: String*)

                      (implicit spark: SparkSession): DataFrame =

  spark.createDataset(seq).toDF(colNames: _*)



def makeDS[T: Encoder](seq: Seq[T])

                      (implicit spark: SparkSession): Dataset[T] =

  spark.createDataset(seq)

Note: This is basically re-inventing the already defined functions from Spark.

answered Jan 1 at 15:46

Luis Miguel Mejía Suárez

2,6121822

add a comment |

import org.apache.spark.sql.{DataFrame, Dataset, Encoder, SparkSession}



// Implicit SparkSession to make the call to further methods more transparent.

implicit val spark = SparkSession.builder.master("local[*]").getOrCreate()

import spark.implicits._



def makeDf[T: Encoder](seq: Seq[T], colNames: String*)

                      (implicit spark: SparkSession): DataFrame =

  spark.createDataset(seq).toDF(colNames: _*)



def makeDS[T: Encoder](seq: Seq[T])

                      (implicit spark: SparkSession): Dataset[T] =

  spark.createDataset(seq)

Note: This is basically re-inventing the already defined functions from Spark.

answered Jan 1 at 15:46

Luis Miguel Mejía Suárez

2,6121822

import org.apache.spark.sql.{DataFrame, Dataset, Encoder, SparkSession}



// Implicit SparkSession to make the call to further methods more transparent.

implicit val spark = SparkSession.builder.master("local[*]").getOrCreate()

import spark.implicits._



def makeDf[T: Encoder](seq: Seq[T], colNames: String*)

                      (implicit spark: SparkSession): DataFrame =

  spark.createDataset(seq).toDF(colNames: _*)



def makeDS[T: Encoder](seq: Seq[T])

                      (implicit spark: SparkSession): Dataset[T] =

  spark.createDataset(seq)

Note: This is basically re-inventing the already defined functions from Spark.

answered Jan 1 at 15:46

Luis Miguel Mejía Suárez

2,6121822

answered Jan 1 at 15:46

Luis Miguel Mejía Suárez

2,6121822

answered Jan 1 at 15:46

Luis Miguel Mejía Suárez

2,6121822

answered Jan 1 at 15:46

Luis Miguel Mejía Suárez

2,6121822

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk