Calculating percentage of records having value > 2





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I'm trying to get the percentage of records with value above 2.



Here is the code:



val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")


I'm able to achieve using the following steps.



val totalCnt = scores.count()
val morethan2 : Long = scores.filter(col("value") > 2).count()
val percent = morethan2.toFloat/totalCnt;
println(" percent is " + percent)


However, what is the best/optimized way to get this working in a single statement,
possibly using an aggregate function ?










share|improve this question

























  • What you are doing should already be quite optimal. However, depending on the data size and any transformations before the first count, it could help performance if you cache the data, see here for more info: stackoverflow.com/questions/28981359/…

    – Shaido
    Jan 4 at 6:15













  • yes, i agree .. doing this in single line (as shown in answer below) seems to be an over-kill

    – Karan Alang
    Jan 6 at 20:49


















0















I'm trying to get the percentage of records with value above 2.



Here is the code:



val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")


I'm able to achieve using the following steps.



val totalCnt = scores.count()
val morethan2 : Long = scores.filter(col("value") > 2).count()
val percent = morethan2.toFloat/totalCnt;
println(" percent is " + percent)


However, what is the best/optimized way to get this working in a single statement,
possibly using an aggregate function ?










share|improve this question

























  • What you are doing should already be quite optimal. However, depending on the data size and any transformations before the first count, it could help performance if you cache the data, see here for more info: stackoverflow.com/questions/28981359/…

    – Shaido
    Jan 4 at 6:15













  • yes, i agree .. doing this in single line (as shown in answer below) seems to be an over-kill

    – Karan Alang
    Jan 6 at 20:49














0












0








0








I'm trying to get the percentage of records with value above 2.



Here is the code:



val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")


I'm able to achieve using the following steps.



val totalCnt = scores.count()
val morethan2 : Long = scores.filter(col("value") > 2).count()
val percent = morethan2.toFloat/totalCnt;
println(" percent is " + percent)


However, what is the best/optimized way to get this working in a single statement,
possibly using an aggregate function ?










share|improve this question
















I'm trying to get the percentage of records with value above 2.



Here is the code:



val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")


I'm able to achieve using the following steps.



val totalCnt = scores.count()
val morethan2 : Long = scores.filter(col("value") > 2).count()
val percent = morethan2.toFloat/totalCnt;
println(" percent is " + percent)


However, what is the best/optimized way to get this working in a single statement,
possibly using an aggregate function ?







scala apache-spark aggregate






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 4 at 3:46









Shaido

13.1k123044




13.1k123044










asked Jan 4 at 2:49









Karan AlangKaran Alang

337




337













  • What you are doing should already be quite optimal. However, depending on the data size and any transformations before the first count, it could help performance if you cache the data, see here for more info: stackoverflow.com/questions/28981359/…

    – Shaido
    Jan 4 at 6:15













  • yes, i agree .. doing this in single line (as shown in answer below) seems to be an over-kill

    – Karan Alang
    Jan 6 at 20:49



















  • What you are doing should already be quite optimal. However, depending on the data size and any transformations before the first count, it could help performance if you cache the data, see here for more info: stackoverflow.com/questions/28981359/…

    – Shaido
    Jan 4 at 6:15













  • yes, i agree .. doing this in single line (as shown in answer below) seems to be an over-kill

    – Karan Alang
    Jan 6 at 20:49

















What you are doing should already be quite optimal. However, depending on the data size and any transformations before the first count, it could help performance if you cache the data, see here for more info: stackoverflow.com/questions/28981359/…

– Shaido
Jan 4 at 6:15







What you are doing should already be quite optimal. However, depending on the data size and any transformations before the first count, it could help performance if you cache the data, see here for more info: stackoverflow.com/questions/28981359/…

– Shaido
Jan 4 at 6:15















yes, i agree .. doing this in single line (as shown in answer below) seems to be an over-kill

– Karan Alang
Jan 6 at 20:49





yes, i agree .. doing this in single line (as shown in answer below) seems to be an over-kill

– Karan Alang
Jan 6 at 20:49












1 Answer
1






active

oldest

votes


















0














You will need to do at least one aggregation. Something like this.



val seq = Seq(0, 1, 2, 3)
val scores = seq.toDF("value")
.withColumn("all", lit(1))
.withColumn("morethan2", when(col("value") > 2, lit(1)).otherwise(lit(0)))
.agg(sum(col("all")).as("count"), sum(col("morethan2")).as("morethan2count"))
.withColumn("percent", col("morethan2count") / col("count"))

val percent = scores.take(1)(0).getAs[Double]("percent")


I hope it helps.






share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54032546%2fcalculating-percentage-of-records-having-value-2%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    You will need to do at least one aggregation. Something like this.



    val seq = Seq(0, 1, 2, 3)
    val scores = seq.toDF("value")
    .withColumn("all", lit(1))
    .withColumn("morethan2", when(col("value") > 2, lit(1)).otherwise(lit(0)))
    .agg(sum(col("all")).as("count"), sum(col("morethan2")).as("morethan2count"))
    .withColumn("percent", col("morethan2count") / col("count"))

    val percent = scores.take(1)(0).getAs[Double]("percent")


    I hope it helps.






    share|improve this answer




























      0














      You will need to do at least one aggregation. Something like this.



      val seq = Seq(0, 1, 2, 3)
      val scores = seq.toDF("value")
      .withColumn("all", lit(1))
      .withColumn("morethan2", when(col("value") > 2, lit(1)).otherwise(lit(0)))
      .agg(sum(col("all")).as("count"), sum(col("morethan2")).as("morethan2count"))
      .withColumn("percent", col("morethan2count") / col("count"))

      val percent = scores.take(1)(0).getAs[Double]("percent")


      I hope it helps.






      share|improve this answer


























        0












        0








        0







        You will need to do at least one aggregation. Something like this.



        val seq = Seq(0, 1, 2, 3)
        val scores = seq.toDF("value")
        .withColumn("all", lit(1))
        .withColumn("morethan2", when(col("value") > 2, lit(1)).otherwise(lit(0)))
        .agg(sum(col("all")).as("count"), sum(col("morethan2")).as("morethan2count"))
        .withColumn("percent", col("morethan2count") / col("count"))

        val percent = scores.take(1)(0).getAs[Double]("percent")


        I hope it helps.






        share|improve this answer













        You will need to do at least one aggregation. Something like this.



        val seq = Seq(0, 1, 2, 3)
        val scores = seq.toDF("value")
        .withColumn("all", lit(1))
        .withColumn("morethan2", when(col("value") > 2, lit(1)).otherwise(lit(0)))
        .agg(sum(col("all")).as("count"), sum(col("morethan2")).as("morethan2count"))
        .withColumn("percent", col("morethan2count") / col("count"))

        val percent = scores.take(1)(0).getAs[Double]("percent")


        I hope it helps.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 4 at 3:57









        Apurba PandeyApurba Pandey

        625614




        625614
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54032546%2fcalculating-percentage-of-records-having-value-2%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas