Calculating percentage of records having value > 2
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I'm trying to get the percentage of records with value above 2.
Here is the code:
val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")
I'm able to achieve using the following steps.
val totalCnt = scores.count()
val morethan2 : Long = scores.filter(col("value") > 2).count()
val percent = morethan2.toFloat/totalCnt;
println(" percent is " + percent)
However, what is the best/optimized way to get this working in a single statement,
possibly using an aggregate function ?
scala apache-spark aggregate
add a comment |
I'm trying to get the percentage of records with value above 2.
Here is the code:
val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")
I'm able to achieve using the following steps.
val totalCnt = scores.count()
val morethan2 : Long = scores.filter(col("value") > 2).count()
val percent = morethan2.toFloat/totalCnt;
println(" percent is " + percent)
However, what is the best/optimized way to get this working in a single statement,
possibly using an aggregate function ?
scala apache-spark aggregate
What you are doing should already be quite optimal. However, depending on the data size and any transformations before the firstcount
, it could help performance if you cache the data, see here for more info: stackoverflow.com/questions/28981359/…
– Shaido
Jan 4 at 6:15
yes, i agree .. doing this in single line (as shown in answer below) seems to be an over-kill
– Karan Alang
Jan 6 at 20:49
add a comment |
I'm trying to get the percentage of records with value above 2.
Here is the code:
val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")
I'm able to achieve using the following steps.
val totalCnt = scores.count()
val morethan2 : Long = scores.filter(col("value") > 2).count()
val percent = morethan2.toFloat/totalCnt;
println(" percent is " + percent)
However, what is the best/optimized way to get this working in a single statement,
possibly using an aggregate function ?
scala apache-spark aggregate
I'm trying to get the percentage of records with value above 2.
Here is the code:
val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")
I'm able to achieve using the following steps.
val totalCnt = scores.count()
val morethan2 : Long = scores.filter(col("value") > 2).count()
val percent = morethan2.toFloat/totalCnt;
println(" percent is " + percent)
However, what is the best/optimized way to get this working in a single statement,
possibly using an aggregate function ?
scala apache-spark aggregate
scala apache-spark aggregate
edited Jan 4 at 3:46
Shaido
13.1k123044
13.1k123044
asked Jan 4 at 2:49
Karan AlangKaran Alang
337
337
What you are doing should already be quite optimal. However, depending on the data size and any transformations before the firstcount
, it could help performance if you cache the data, see here for more info: stackoverflow.com/questions/28981359/…
– Shaido
Jan 4 at 6:15
yes, i agree .. doing this in single line (as shown in answer below) seems to be an over-kill
– Karan Alang
Jan 6 at 20:49
add a comment |
What you are doing should already be quite optimal. However, depending on the data size and any transformations before the firstcount
, it could help performance if you cache the data, see here for more info: stackoverflow.com/questions/28981359/…
– Shaido
Jan 4 at 6:15
yes, i agree .. doing this in single line (as shown in answer below) seems to be an over-kill
– Karan Alang
Jan 6 at 20:49
What you are doing should already be quite optimal. However, depending on the data size and any transformations before the first
count
, it could help performance if you cache the data, see here for more info: stackoverflow.com/questions/28981359/…– Shaido
Jan 4 at 6:15
What you are doing should already be quite optimal. However, depending on the data size and any transformations before the first
count
, it could help performance if you cache the data, see here for more info: stackoverflow.com/questions/28981359/…– Shaido
Jan 4 at 6:15
yes, i agree .. doing this in single line (as shown in answer below) seems to be an over-kill
– Karan Alang
Jan 6 at 20:49
yes, i agree .. doing this in single line (as shown in answer below) seems to be an over-kill
– Karan Alang
Jan 6 at 20:49
add a comment |
1 Answer
1
active
oldest
votes
You will need to do at least one aggregation. Something like this.
val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")
.withColumn("all", lit(1))
.withColumn("morethan2", when(col("value") > 2, lit(1)).otherwise(lit(0)))
.agg(sum(col("all")).as("count"), sum(col("morethan2")).as("morethan2count"))
.withColumn("percent", col("morethan2count") / col("count"))
val percent = scores.take(1)(0).getAs[Double]("percent")
I hope it helps.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54032546%2fcalculating-percentage-of-records-having-value-2%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You will need to do at least one aggregation. Something like this.
val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")
.withColumn("all", lit(1))
.withColumn("morethan2", when(col("value") > 2, lit(1)).otherwise(lit(0)))
.agg(sum(col("all")).as("count"), sum(col("morethan2")).as("morethan2count"))
.withColumn("percent", col("morethan2count") / col("count"))
val percent = scores.take(1)(0).getAs[Double]("percent")
I hope it helps.
add a comment |
You will need to do at least one aggregation. Something like this.
val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")
.withColumn("all", lit(1))
.withColumn("morethan2", when(col("value") > 2, lit(1)).otherwise(lit(0)))
.agg(sum(col("all")).as("count"), sum(col("morethan2")).as("morethan2count"))
.withColumn("percent", col("morethan2count") / col("count"))
val percent = scores.take(1)(0).getAs[Double]("percent")
I hope it helps.
add a comment |
You will need to do at least one aggregation. Something like this.
val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")
.withColumn("all", lit(1))
.withColumn("morethan2", when(col("value") > 2, lit(1)).otherwise(lit(0)))
.agg(sum(col("all")).as("count"), sum(col("morethan2")).as("morethan2count"))
.withColumn("percent", col("morethan2count") / col("count"))
val percent = scores.take(1)(0).getAs[Double]("percent")
I hope it helps.
You will need to do at least one aggregation. Something like this.
val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")
.withColumn("all", lit(1))
.withColumn("morethan2", when(col("value") > 2, lit(1)).otherwise(lit(0)))
.agg(sum(col("all")).as("count"), sum(col("morethan2")).as("morethan2count"))
.withColumn("percent", col("morethan2count") / col("count"))
val percent = scores.take(1)(0).getAs[Double]("percent")
I hope it helps.
answered Jan 4 at 3:57
Apurba PandeyApurba Pandey
625614
625614
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54032546%2fcalculating-percentage-of-records-having-value-2%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What you are doing should already be quite optimal. However, depending on the data size and any transformations before the first
count
, it could help performance if you cache the data, see here for more info: stackoverflow.com/questions/28981359/…– Shaido
Jan 4 at 6:15
yes, i agree .. doing this in single line (as shown in answer below) seems to be an over-kill
– Karan Alang
Jan 6 at 20:49