Retain the highest absolute value and return an average value from rows with duplicate index












2















I have a set of values with duplicate index in a DataFrame:



         value
CDE 2.318620
CDE -3.097715
LXU -3.791043
LXU 4.818995
SWN 3.059964
SWN -4.349304
OAS -3.336539
LPI -3.037097
LPI -5.701044
LPI -3.519923
CZR -3.270018
CZR -3.056712


The required result is to retain only the highest absolute values and return the average values in a new column:



         value    average
CDE -3.097715 -0.389547
LXU 4.818995 0.513976
SWN -4.349304 -0.644670
OAS -3.336539 -3.336539
LPI -5.701044 -4.086021
CZR -3.270018 -3.163365


I have tried to .apply(lambda) to the duplicate rows but got an "axis" error:



max_absolute = lambda x: max(x.min(), x.max(), key=abs)
df_duplicate_absmax = df.groupby(df.index).apply(max_absolute, axis=1)


ps: Adapting Abhi's solution to work with NaN:



df1 = df.groupby(df.index)['value'].agg([lambda x: max(x[~np.isnan(x)], key=abs), 'mean'])









share|improve this question





























    2















    I have a set of values with duplicate index in a DataFrame:



             value
    CDE 2.318620
    CDE -3.097715
    LXU -3.791043
    LXU 4.818995
    SWN 3.059964
    SWN -4.349304
    OAS -3.336539
    LPI -3.037097
    LPI -5.701044
    LPI -3.519923
    CZR -3.270018
    CZR -3.056712


    The required result is to retain only the highest absolute values and return the average values in a new column:



             value    average
    CDE -3.097715 -0.389547
    LXU 4.818995 0.513976
    SWN -4.349304 -0.644670
    OAS -3.336539 -3.336539
    LPI -5.701044 -4.086021
    CZR -3.270018 -3.163365


    I have tried to .apply(lambda) to the duplicate rows but got an "axis" error:



    max_absolute = lambda x: max(x.min(), x.max(), key=abs)
    df_duplicate_absmax = df.groupby(df.index).apply(max_absolute, axis=1)


    ps: Adapting Abhi's solution to work with NaN:



    df1 = df.groupby(df.index)['value'].agg([lambda x: max(x[~np.isnan(x)], key=abs), 'mean'])









    share|improve this question



























      2












      2








      2


      0






      I have a set of values with duplicate index in a DataFrame:



               value
      CDE 2.318620
      CDE -3.097715
      LXU -3.791043
      LXU 4.818995
      SWN 3.059964
      SWN -4.349304
      OAS -3.336539
      LPI -3.037097
      LPI -5.701044
      LPI -3.519923
      CZR -3.270018
      CZR -3.056712


      The required result is to retain only the highest absolute values and return the average values in a new column:



               value    average
      CDE -3.097715 -0.389547
      LXU 4.818995 0.513976
      SWN -4.349304 -0.644670
      OAS -3.336539 -3.336539
      LPI -5.701044 -4.086021
      CZR -3.270018 -3.163365


      I have tried to .apply(lambda) to the duplicate rows but got an "axis" error:



      max_absolute = lambda x: max(x.min(), x.max(), key=abs)
      df_duplicate_absmax = df.groupby(df.index).apply(max_absolute, axis=1)


      ps: Adapting Abhi's solution to work with NaN:



      df1 = df.groupby(df.index)['value'].agg([lambda x: max(x[~np.isnan(x)], key=abs), 'mean'])









      share|improve this question
















      I have a set of values with duplicate index in a DataFrame:



               value
      CDE 2.318620
      CDE -3.097715
      LXU -3.791043
      LXU 4.818995
      SWN 3.059964
      SWN -4.349304
      OAS -3.336539
      LPI -3.037097
      LPI -5.701044
      LPI -3.519923
      CZR -3.270018
      CZR -3.056712


      The required result is to retain only the highest absolute values and return the average values in a new column:



               value    average
      CDE -3.097715 -0.389547
      LXU 4.818995 0.513976
      SWN -4.349304 -0.644670
      OAS -3.336539 -3.336539
      LPI -5.701044 -4.086021
      CZR -3.270018 -3.163365


      I have tried to .apply(lambda) to the duplicate rows but got an "axis" error:



      max_absolute = lambda x: max(x.min(), x.max(), key=abs)
      df_duplicate_absmax = df.groupby(df.index).apply(max_absolute, axis=1)


      ps: Adapting Abhi's solution to work with NaN:



      df1 = df.groupby(df.index)['value'].agg([lambda x: max(x[~np.isnan(x)], key=abs), 'mean'])






      python python-3.x pandas dataframe pandas-groupby






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 30 '18 at 5:18







      artDeco

















      asked Dec 9 '18 at 1:19









      artDecoartDeco

      185214




      185214
























          3 Answers
          3






          active

          oldest

          votes


















          2














          Use:



          df1 = df.groupby(df.index)['value'].agg([lambda x: max(x,key=abs), 'mean'])

          df1.columns = ['value', 'average']

          print (df1)

          value average
          CDE -3.097715 -0.389547
          CZR -3.270018 -3.163365
          LPI -5.701044 -4.086021
          LXU 4.818995 0.513976
          OAS -3.336539 -3.336539
          SWN -4.349304 -0.644670





          share|improve this answer





















          • 1





            There are two reasons why this solution is inefficient: (1) using pure Python built-ins (max) with Pandas series / NumPy arrays, (2) using groupby twice when a single groupby is sufficient.

            – jpp
            Dec 9 '18 at 1:56








          • 3





            Good solution, Abhi with thanks :) I appreciate the use of max_absolute lambda function for simplicity!

            – artDeco
            Dec 9 '18 at 2:10






          • 2





            Nice!. Note that columns can be named directly in .agg like .agg([('value', lambda x: max(x,key=abs)), ('average', 'mean')])

            – Andrew Lavers
            Dec 9 '18 at 12:05






          • 1





            Excellent, Andrew Lavers for adding value to a good solution! Thanks :)

            – artDeco
            Dec 9 '18 at 12:58



















          1














          Here's a solution using groupby + agg with two functions, one to calculate the maximum by absolute value, the other to calculate the mean:



          def max_abs(x):
          return x.iloc[x.abs().values.argmax()]

          res = df.groupby(level=0).agg([max_abs, 'mean'])
          .xs('value', axis=1, drop_level=True)

          print(res)

          max_abs mean
          CDE -3.097715 -0.389547
          CZR -3.270018 -3.163365
          LPI -5.701044 -4.086021
          LXU 4.818995 0.513976
          OAS -3.336539 -3.336539
          SWN -4.349304 -0.644670





          share|improve this answer



















          • 1





            That's great, jpp - very Pythonic!

            – artDeco
            Dec 9 '18 at 2:11






          • 1





            I also like your use of .xs() and the MultiIndex address, jpp :)

            – artDeco
            Dec 10 '18 at 1:00



















          1














          from io import StringIO
          import pandas as pd
          df = pd.read_fwf(StringIO("""
          cod value
          CDE 2.318620
          CDE -3.097715
          LXU -3.791043
          LXU 4.818995
          SWN 3.059964
          SWN -4.349304
          OAS -3.336539
          LPI -3.037097
          LPI -5.701044
          LPI -3.519923
          CZR -3.270018
          CZR -3.056712
          """), header=1, Index=None)

          # Create a new column with absoulte value
          df['abs_value'] = df['value'].abs()

          # Calulate the mean in new data farame, grouped by code using
          # pandas groupped aggregation naming the column average
          df_avg = df.groupby("cod").value.agg([('average', 'mean')])

          # Choose the row within group with largest abs value
          df_abs = df.sort_values("abs_value").groupby("cod").tail(1)[["cod", "value"]]

          # Join the average and the max
          df_abs.join(df_avg, on="cod")


          Result:



              cod     value   average
          1 CDE -3.097715 -0.389547
          10 CZR -3.270018 -3.163365
          6 OAS -3.336539 -3.336539
          5 SWN -4.349304 -0.644670
          3 LXU 4.818995 0.513976
          8 LPI -5.701044 -4.086021





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53688581%2fretain-the-highest-absolute-value-and-return-an-average-value-from-rows-with-dup%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            Use:



            df1 = df.groupby(df.index)['value'].agg([lambda x: max(x,key=abs), 'mean'])

            df1.columns = ['value', 'average']

            print (df1)

            value average
            CDE -3.097715 -0.389547
            CZR -3.270018 -3.163365
            LPI -5.701044 -4.086021
            LXU 4.818995 0.513976
            OAS -3.336539 -3.336539
            SWN -4.349304 -0.644670





            share|improve this answer





















            • 1





              There are two reasons why this solution is inefficient: (1) using pure Python built-ins (max) with Pandas series / NumPy arrays, (2) using groupby twice when a single groupby is sufficient.

              – jpp
              Dec 9 '18 at 1:56








            • 3





              Good solution, Abhi with thanks :) I appreciate the use of max_absolute lambda function for simplicity!

              – artDeco
              Dec 9 '18 at 2:10






            • 2





              Nice!. Note that columns can be named directly in .agg like .agg([('value', lambda x: max(x,key=abs)), ('average', 'mean')])

              – Andrew Lavers
              Dec 9 '18 at 12:05






            • 1





              Excellent, Andrew Lavers for adding value to a good solution! Thanks :)

              – artDeco
              Dec 9 '18 at 12:58
















            2














            Use:



            df1 = df.groupby(df.index)['value'].agg([lambda x: max(x,key=abs), 'mean'])

            df1.columns = ['value', 'average']

            print (df1)

            value average
            CDE -3.097715 -0.389547
            CZR -3.270018 -3.163365
            LPI -5.701044 -4.086021
            LXU 4.818995 0.513976
            OAS -3.336539 -3.336539
            SWN -4.349304 -0.644670





            share|improve this answer





















            • 1





              There are two reasons why this solution is inefficient: (1) using pure Python built-ins (max) with Pandas series / NumPy arrays, (2) using groupby twice when a single groupby is sufficient.

              – jpp
              Dec 9 '18 at 1:56








            • 3





              Good solution, Abhi with thanks :) I appreciate the use of max_absolute lambda function for simplicity!

              – artDeco
              Dec 9 '18 at 2:10






            • 2





              Nice!. Note that columns can be named directly in .agg like .agg([('value', lambda x: max(x,key=abs)), ('average', 'mean')])

              – Andrew Lavers
              Dec 9 '18 at 12:05






            • 1





              Excellent, Andrew Lavers for adding value to a good solution! Thanks :)

              – artDeco
              Dec 9 '18 at 12:58














            2












            2








            2







            Use:



            df1 = df.groupby(df.index)['value'].agg([lambda x: max(x,key=abs), 'mean'])

            df1.columns = ['value', 'average']

            print (df1)

            value average
            CDE -3.097715 -0.389547
            CZR -3.270018 -3.163365
            LPI -5.701044 -4.086021
            LXU 4.818995 0.513976
            OAS -3.336539 -3.336539
            SWN -4.349304 -0.644670





            share|improve this answer















            Use:



            df1 = df.groupby(df.index)['value'].agg([lambda x: max(x,key=abs), 'mean'])

            df1.columns = ['value', 'average']

            print (df1)

            value average
            CDE -3.097715 -0.389547
            CZR -3.270018 -3.163365
            LPI -5.701044 -4.086021
            LXU 4.818995 0.513976
            OAS -3.336539 -3.336539
            SWN -4.349304 -0.644670






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Dec 9 '18 at 2:14

























            answered Dec 9 '18 at 1:27









            AbhiAbhi

            2,480320




            2,480320








            • 1





              There are two reasons why this solution is inefficient: (1) using pure Python built-ins (max) with Pandas series / NumPy arrays, (2) using groupby twice when a single groupby is sufficient.

              – jpp
              Dec 9 '18 at 1:56








            • 3





              Good solution, Abhi with thanks :) I appreciate the use of max_absolute lambda function for simplicity!

              – artDeco
              Dec 9 '18 at 2:10






            • 2





              Nice!. Note that columns can be named directly in .agg like .agg([('value', lambda x: max(x,key=abs)), ('average', 'mean')])

              – Andrew Lavers
              Dec 9 '18 at 12:05






            • 1





              Excellent, Andrew Lavers for adding value to a good solution! Thanks :)

              – artDeco
              Dec 9 '18 at 12:58














            • 1





              There are two reasons why this solution is inefficient: (1) using pure Python built-ins (max) with Pandas series / NumPy arrays, (2) using groupby twice when a single groupby is sufficient.

              – jpp
              Dec 9 '18 at 1:56








            • 3





              Good solution, Abhi with thanks :) I appreciate the use of max_absolute lambda function for simplicity!

              – artDeco
              Dec 9 '18 at 2:10






            • 2





              Nice!. Note that columns can be named directly in .agg like .agg([('value', lambda x: max(x,key=abs)), ('average', 'mean')])

              – Andrew Lavers
              Dec 9 '18 at 12:05






            • 1





              Excellent, Andrew Lavers for adding value to a good solution! Thanks :)

              – artDeco
              Dec 9 '18 at 12:58








            1




            1





            There are two reasons why this solution is inefficient: (1) using pure Python built-ins (max) with Pandas series / NumPy arrays, (2) using groupby twice when a single groupby is sufficient.

            – jpp
            Dec 9 '18 at 1:56







            There are two reasons why this solution is inefficient: (1) using pure Python built-ins (max) with Pandas series / NumPy arrays, (2) using groupby twice when a single groupby is sufficient.

            – jpp
            Dec 9 '18 at 1:56






            3




            3





            Good solution, Abhi with thanks :) I appreciate the use of max_absolute lambda function for simplicity!

            – artDeco
            Dec 9 '18 at 2:10





            Good solution, Abhi with thanks :) I appreciate the use of max_absolute lambda function for simplicity!

            – artDeco
            Dec 9 '18 at 2:10




            2




            2





            Nice!. Note that columns can be named directly in .agg like .agg([('value', lambda x: max(x,key=abs)), ('average', 'mean')])

            – Andrew Lavers
            Dec 9 '18 at 12:05





            Nice!. Note that columns can be named directly in .agg like .agg([('value', lambda x: max(x,key=abs)), ('average', 'mean')])

            – Andrew Lavers
            Dec 9 '18 at 12:05




            1




            1





            Excellent, Andrew Lavers for adding value to a good solution! Thanks :)

            – artDeco
            Dec 9 '18 at 12:58





            Excellent, Andrew Lavers for adding value to a good solution! Thanks :)

            – artDeco
            Dec 9 '18 at 12:58













            1














            Here's a solution using groupby + agg with two functions, one to calculate the maximum by absolute value, the other to calculate the mean:



            def max_abs(x):
            return x.iloc[x.abs().values.argmax()]

            res = df.groupby(level=0).agg([max_abs, 'mean'])
            .xs('value', axis=1, drop_level=True)

            print(res)

            max_abs mean
            CDE -3.097715 -0.389547
            CZR -3.270018 -3.163365
            LPI -5.701044 -4.086021
            LXU 4.818995 0.513976
            OAS -3.336539 -3.336539
            SWN -4.349304 -0.644670





            share|improve this answer



















            • 1





              That's great, jpp - very Pythonic!

              – artDeco
              Dec 9 '18 at 2:11






            • 1





              I also like your use of .xs() and the MultiIndex address, jpp :)

              – artDeco
              Dec 10 '18 at 1:00
















            1














            Here's a solution using groupby + agg with two functions, one to calculate the maximum by absolute value, the other to calculate the mean:



            def max_abs(x):
            return x.iloc[x.abs().values.argmax()]

            res = df.groupby(level=0).agg([max_abs, 'mean'])
            .xs('value', axis=1, drop_level=True)

            print(res)

            max_abs mean
            CDE -3.097715 -0.389547
            CZR -3.270018 -3.163365
            LPI -5.701044 -4.086021
            LXU 4.818995 0.513976
            OAS -3.336539 -3.336539
            SWN -4.349304 -0.644670





            share|improve this answer



















            • 1





              That's great, jpp - very Pythonic!

              – artDeco
              Dec 9 '18 at 2:11






            • 1





              I also like your use of .xs() and the MultiIndex address, jpp :)

              – artDeco
              Dec 10 '18 at 1:00














            1












            1








            1







            Here's a solution using groupby + agg with two functions, one to calculate the maximum by absolute value, the other to calculate the mean:



            def max_abs(x):
            return x.iloc[x.abs().values.argmax()]

            res = df.groupby(level=0).agg([max_abs, 'mean'])
            .xs('value', axis=1, drop_level=True)

            print(res)

            max_abs mean
            CDE -3.097715 -0.389547
            CZR -3.270018 -3.163365
            LPI -5.701044 -4.086021
            LXU 4.818995 0.513976
            OAS -3.336539 -3.336539
            SWN -4.349304 -0.644670





            share|improve this answer













            Here's a solution using groupby + agg with two functions, one to calculate the maximum by absolute value, the other to calculate the mean:



            def max_abs(x):
            return x.iloc[x.abs().values.argmax()]

            res = df.groupby(level=0).agg([max_abs, 'mean'])
            .xs('value', axis=1, drop_level=True)

            print(res)

            max_abs mean
            CDE -3.097715 -0.389547
            CZR -3.270018 -3.163365
            LPI -5.701044 -4.086021
            LXU 4.818995 0.513976
            OAS -3.336539 -3.336539
            SWN -4.349304 -0.644670






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Dec 9 '18 at 1:53









            jppjpp

            97.9k2159109




            97.9k2159109








            • 1





              That's great, jpp - very Pythonic!

              – artDeco
              Dec 9 '18 at 2:11






            • 1





              I also like your use of .xs() and the MultiIndex address, jpp :)

              – artDeco
              Dec 10 '18 at 1:00














            • 1





              That's great, jpp - very Pythonic!

              – artDeco
              Dec 9 '18 at 2:11






            • 1





              I also like your use of .xs() and the MultiIndex address, jpp :)

              – artDeco
              Dec 10 '18 at 1:00








            1




            1





            That's great, jpp - very Pythonic!

            – artDeco
            Dec 9 '18 at 2:11





            That's great, jpp - very Pythonic!

            – artDeco
            Dec 9 '18 at 2:11




            1




            1





            I also like your use of .xs() and the MultiIndex address, jpp :)

            – artDeco
            Dec 10 '18 at 1:00





            I also like your use of .xs() and the MultiIndex address, jpp :)

            – artDeco
            Dec 10 '18 at 1:00











            1














            from io import StringIO
            import pandas as pd
            df = pd.read_fwf(StringIO("""
            cod value
            CDE 2.318620
            CDE -3.097715
            LXU -3.791043
            LXU 4.818995
            SWN 3.059964
            SWN -4.349304
            OAS -3.336539
            LPI -3.037097
            LPI -5.701044
            LPI -3.519923
            CZR -3.270018
            CZR -3.056712
            """), header=1, Index=None)

            # Create a new column with absoulte value
            df['abs_value'] = df['value'].abs()

            # Calulate the mean in new data farame, grouped by code using
            # pandas groupped aggregation naming the column average
            df_avg = df.groupby("cod").value.agg([('average', 'mean')])

            # Choose the row within group with largest abs value
            df_abs = df.sort_values("abs_value").groupby("cod").tail(1)[["cod", "value"]]

            # Join the average and the max
            df_abs.join(df_avg, on="cod")


            Result:



                cod     value   average
            1 CDE -3.097715 -0.389547
            10 CZR -3.270018 -3.163365
            6 OAS -3.336539 -3.336539
            5 SWN -4.349304 -0.644670
            3 LXU 4.818995 0.513976
            8 LPI -5.701044 -4.086021





            share|improve this answer




























              1














              from io import StringIO
              import pandas as pd
              df = pd.read_fwf(StringIO("""
              cod value
              CDE 2.318620
              CDE -3.097715
              LXU -3.791043
              LXU 4.818995
              SWN 3.059964
              SWN -4.349304
              OAS -3.336539
              LPI -3.037097
              LPI -5.701044
              LPI -3.519923
              CZR -3.270018
              CZR -3.056712
              """), header=1, Index=None)

              # Create a new column with absoulte value
              df['abs_value'] = df['value'].abs()

              # Calulate the mean in new data farame, grouped by code using
              # pandas groupped aggregation naming the column average
              df_avg = df.groupby("cod").value.agg([('average', 'mean')])

              # Choose the row within group with largest abs value
              df_abs = df.sort_values("abs_value").groupby("cod").tail(1)[["cod", "value"]]

              # Join the average and the max
              df_abs.join(df_avg, on="cod")


              Result:



                  cod     value   average
              1 CDE -3.097715 -0.389547
              10 CZR -3.270018 -3.163365
              6 OAS -3.336539 -3.336539
              5 SWN -4.349304 -0.644670
              3 LXU 4.818995 0.513976
              8 LPI -5.701044 -4.086021





              share|improve this answer


























                1












                1








                1







                from io import StringIO
                import pandas as pd
                df = pd.read_fwf(StringIO("""
                cod value
                CDE 2.318620
                CDE -3.097715
                LXU -3.791043
                LXU 4.818995
                SWN 3.059964
                SWN -4.349304
                OAS -3.336539
                LPI -3.037097
                LPI -5.701044
                LPI -3.519923
                CZR -3.270018
                CZR -3.056712
                """), header=1, Index=None)

                # Create a new column with absoulte value
                df['abs_value'] = df['value'].abs()

                # Calulate the mean in new data farame, grouped by code using
                # pandas groupped aggregation naming the column average
                df_avg = df.groupby("cod").value.agg([('average', 'mean')])

                # Choose the row within group with largest abs value
                df_abs = df.sort_values("abs_value").groupby("cod").tail(1)[["cod", "value"]]

                # Join the average and the max
                df_abs.join(df_avg, on="cod")


                Result:



                    cod     value   average
                1 CDE -3.097715 -0.389547
                10 CZR -3.270018 -3.163365
                6 OAS -3.336539 -3.336539
                5 SWN -4.349304 -0.644670
                3 LXU 4.818995 0.513976
                8 LPI -5.701044 -4.086021





                share|improve this answer













                from io import StringIO
                import pandas as pd
                df = pd.read_fwf(StringIO("""
                cod value
                CDE 2.318620
                CDE -3.097715
                LXU -3.791043
                LXU 4.818995
                SWN 3.059964
                SWN -4.349304
                OAS -3.336539
                LPI -3.037097
                LPI -5.701044
                LPI -3.519923
                CZR -3.270018
                CZR -3.056712
                """), header=1, Index=None)

                # Create a new column with absoulte value
                df['abs_value'] = df['value'].abs()

                # Calulate the mean in new data farame, grouped by code using
                # pandas groupped aggregation naming the column average
                df_avg = df.groupby("cod").value.agg([('average', 'mean')])

                # Choose the row within group with largest abs value
                df_abs = df.sort_values("abs_value").groupby("cod").tail(1)[["cod", "value"]]

                # Join the average and the max
                df_abs.join(df_avg, on="cod")


                Result:



                    cod     value   average
                1 CDE -3.097715 -0.389547
                10 CZR -3.270018 -3.163365
                6 OAS -3.336539 -3.336539
                5 SWN -4.349304 -0.644670
                3 LXU 4.818995 0.513976
                8 LPI -5.701044 -4.086021






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Dec 9 '18 at 1:59









                Andrew LaversAndrew Lavers

                3,0511713




                3,0511713






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53688581%2fretain-the-highest-absolute-value-and-return-an-average-value-from-rows-with-dup%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Monofisismo

                    Angular Downloading a file using contenturl with Basic Authentication

                    Olmecas