Extract groups of consecutive values having greater than specified size












1















I am trying to find within a dataframe if there are at least X consecutive operations (I already included a column "Filter_OK" that calculates if the row meets the criteria), and extract that group of rows.



      TRN     TRN_DATE          FILTER_OK  
0 5153 04/04/2017 11:40:00 True
1 7542 04/04/2017 17:18:00 True
2 875 04/04/2017 20:08:00 True
3 74 05/04/2017 20:30:00 False
4 9652 06/04/2017 20:32:00 True
5 965 07/04/2017 12:52:00 True
6 752 10/04/2017 17:40:00 True
7 9541 10/04/2017 19:29:00 True
8 7452 11/04/2017 12:20:00 True
9 9651 12/04/2017 13:57:00 False


For this example, if I am looking for 4 operations.

OUTPUT DESIRED:



    TRN     TRN_DATE    FILTER_OK  
4 9652 06/04/2017 20:32:00 True
5 965 07/04/2017 12:52:00 True
6 752 10/04/2017 17:40:00 True
7 9541 10/04/2017 19:29:00 True
8 7452 11/04/2017 12:20:00 True


How can i subset the operations I need?










share|improve this question

























  • If your question was answered, please accept the most helpful answer here. You can accept an answer by clicking the grey check to the left of the answer to toggle it green. TIA.

    – coldspeed
    Jan 12 at 23:04
















1















I am trying to find within a dataframe if there are at least X consecutive operations (I already included a column "Filter_OK" that calculates if the row meets the criteria), and extract that group of rows.



      TRN     TRN_DATE          FILTER_OK  
0 5153 04/04/2017 11:40:00 True
1 7542 04/04/2017 17:18:00 True
2 875 04/04/2017 20:08:00 True
3 74 05/04/2017 20:30:00 False
4 9652 06/04/2017 20:32:00 True
5 965 07/04/2017 12:52:00 True
6 752 10/04/2017 17:40:00 True
7 9541 10/04/2017 19:29:00 True
8 7452 11/04/2017 12:20:00 True
9 9651 12/04/2017 13:57:00 False


For this example, if I am looking for 4 operations.

OUTPUT DESIRED:



    TRN     TRN_DATE    FILTER_OK  
4 9652 06/04/2017 20:32:00 True
5 965 07/04/2017 12:52:00 True
6 752 10/04/2017 17:40:00 True
7 9541 10/04/2017 19:29:00 True
8 7452 11/04/2017 12:20:00 True


How can i subset the operations I need?










share|improve this question

























  • If your question was answered, please accept the most helpful answer here. You can accept an answer by clicking the grey check to the left of the answer to toggle it green. TIA.

    – coldspeed
    Jan 12 at 23:04














1












1








1








I am trying to find within a dataframe if there are at least X consecutive operations (I already included a column "Filter_OK" that calculates if the row meets the criteria), and extract that group of rows.



      TRN     TRN_DATE          FILTER_OK  
0 5153 04/04/2017 11:40:00 True
1 7542 04/04/2017 17:18:00 True
2 875 04/04/2017 20:08:00 True
3 74 05/04/2017 20:30:00 False
4 9652 06/04/2017 20:32:00 True
5 965 07/04/2017 12:52:00 True
6 752 10/04/2017 17:40:00 True
7 9541 10/04/2017 19:29:00 True
8 7452 11/04/2017 12:20:00 True
9 9651 12/04/2017 13:57:00 False


For this example, if I am looking for 4 operations.

OUTPUT DESIRED:



    TRN     TRN_DATE    FILTER_OK  
4 9652 06/04/2017 20:32:00 True
5 965 07/04/2017 12:52:00 True
6 752 10/04/2017 17:40:00 True
7 9541 10/04/2017 19:29:00 True
8 7452 11/04/2017 12:20:00 True


How can i subset the operations I need?










share|improve this question
















I am trying to find within a dataframe if there are at least X consecutive operations (I already included a column "Filter_OK" that calculates if the row meets the criteria), and extract that group of rows.



      TRN     TRN_DATE          FILTER_OK  
0 5153 04/04/2017 11:40:00 True
1 7542 04/04/2017 17:18:00 True
2 875 04/04/2017 20:08:00 True
3 74 05/04/2017 20:30:00 False
4 9652 06/04/2017 20:32:00 True
5 965 07/04/2017 12:52:00 True
6 752 10/04/2017 17:40:00 True
7 9541 10/04/2017 19:29:00 True
8 7452 11/04/2017 12:20:00 True
9 9651 12/04/2017 13:57:00 False


For this example, if I am looking for 4 operations.

OUTPUT DESIRED:



    TRN     TRN_DATE    FILTER_OK  
4 9652 06/04/2017 20:32:00 True
5 965 07/04/2017 12:52:00 True
6 752 10/04/2017 17:40:00 True
7 9541 10/04/2017 19:29:00 True
8 7452 11/04/2017 12:20:00 True


How can i subset the operations I need?







python pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 2 at 17:12









coldspeed

136k23148234




136k23148234










asked Jan 2 at 17:07









MarPMarP

63




63













  • If your question was answered, please accept the most helpful answer here. You can accept an answer by clicking the grey check to the left of the answer to toggle it green. TIA.

    – coldspeed
    Jan 12 at 23:04



















  • If your question was answered, please accept the most helpful answer here. You can accept an answer by clicking the grey check to the left of the answer to toggle it green. TIA.

    – coldspeed
    Jan 12 at 23:04

















If your question was answered, please accept the most helpful answer here. You can accept an answer by clicking the grey check to the left of the answer to toggle it green. TIA.

– coldspeed
Jan 12 at 23:04





If your question was answered, please accept the most helpful answer here. You can accept an answer by clicking the grey check to the left of the answer to toggle it green. TIA.

– coldspeed
Jan 12 at 23:04












3 Answers
3






active

oldest

votes


















1














You may do this using cumsum, followed by groupby, and transform:



v = (~df.FILTER_OK).cumsum()
df[v.groupby(v).transform('size').ge(4) & df['FILTER_OK']]

TRN TRN_DATE FILTER_OK
4 9652 2017-06-04 20:32:00 True
5 965 2017-07-04 12:52:00 True
6 752 2017-10-04 17:40:00 True
7 9541 2017-10-04 19:29:00 True
8 7452 2017-11-04 12:20:00 True




Details

First, use cumsum to segregate rows into groups:



v = (~df.FILTER_OK).cumsum()
v

0 0
1 0
2 0
3 1
4 1
5 1
6 1
7 1
8 1
9 2
Name: FILTER_OK, dtype: int64


Next, find the size of each group, and then figure out what groups have at least X rows (in your case, 4):



v.groupby(v).transform('size')

0 3
1 3
2 3
3 6
4 6
5 6
6 6
7 6
8 6
9 1
Name: FILTER_OK, dtype: int64

v.groupby(v).transform('size').ge(4)

0 False
1 False
2 False
3 True
4 True
5 True
6 True
7 True
8 True
9 False
Name: FILTER_OK, dtype: bool


AND this mask with "FILTER_OK" to ensure we only take valid rows that fit the criteria.



v.groupby(v).transform('size').ge(4) & df['FILTER_OK']

0 False
1 False
2 False
3 False
4 True
5 True
6 True
7 True
8 True
9 False
Name: FILTER_OK, dtype: bool





share|improve this answer































    1














    This is will also consider 4 consecutive False



    s=df.FILTER_OK.astype(int).diff().ne(0).cumsum()
    df[s.isin(s.value_counts().loc[lambda x : x>4].index)]
    Out[784]:
    TRN TRN_DATE FILTER_OK
    4 9652 06/04/201720:32:00 True
    5 965 07/04/201712:52:00 True
    6 752 10/04/201717:40:00 True
    7 9541 10/04/201719:29:00 True
    8 7452 11/04/201712:20:00 True





    share|improve this answer































      0














      One of possible options is to use itertools.groupby called on source
      df.values.



      An important difference of this method, compared to pd.groupby is
      that if groupping key changes, then a new group is created.



      So you can try the following code:



      import pandas as pd
      import itertools

      # Source DataFrame
      df = pd.DataFrame(data=[
      [ 5153, '04/04/2017 11:40:00', True ], [ 7542, '04/04/2017 17:18:00', True ],
      [ 875, '04/04/2017 20:08:00', True ], [ 74, '05/04/2017 20:30:00', False ],
      [ 9652, '06/04/2017 20:32:00', True ], [ 965, '07/04/2017 12:52:00', True ],
      [ 752, '10/04/2017 17:40:00', True ], [ 9541, '10/04/2017 19:29:00', True ],
      [ 7452, '11/04/2017 12:20:00', True ], [ 9651, '12/04/2017 13:57:00', False ]],
      columns=[ 'TRN', 'TRN_DATE', 'FILTER_OK' ])
      # Work list
      xx =
      # Collect groups for 'True' key with at least 5 members
      for key, group in itertools.groupby(df.values, lambda x: x[2]):
      lst = list(group)
      if key and len(lst) >= 5:
      xx.extend(lst)
      # Create result DataFrame with the same column names
      df2 = pd.DataFrame(data=xx, columns=df.columns)





      share|improve this answer























        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54010386%2fextract-groups-of-consecutive-values-having-greater-than-specified-size%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        1














        You may do this using cumsum, followed by groupby, and transform:



        v = (~df.FILTER_OK).cumsum()
        df[v.groupby(v).transform('size').ge(4) & df['FILTER_OK']]

        TRN TRN_DATE FILTER_OK
        4 9652 2017-06-04 20:32:00 True
        5 965 2017-07-04 12:52:00 True
        6 752 2017-10-04 17:40:00 True
        7 9541 2017-10-04 19:29:00 True
        8 7452 2017-11-04 12:20:00 True




        Details

        First, use cumsum to segregate rows into groups:



        v = (~df.FILTER_OK).cumsum()
        v

        0 0
        1 0
        2 0
        3 1
        4 1
        5 1
        6 1
        7 1
        8 1
        9 2
        Name: FILTER_OK, dtype: int64


        Next, find the size of each group, and then figure out what groups have at least X rows (in your case, 4):



        v.groupby(v).transform('size')

        0 3
        1 3
        2 3
        3 6
        4 6
        5 6
        6 6
        7 6
        8 6
        9 1
        Name: FILTER_OK, dtype: int64

        v.groupby(v).transform('size').ge(4)

        0 False
        1 False
        2 False
        3 True
        4 True
        5 True
        6 True
        7 True
        8 True
        9 False
        Name: FILTER_OK, dtype: bool


        AND this mask with "FILTER_OK" to ensure we only take valid rows that fit the criteria.



        v.groupby(v).transform('size').ge(4) & df['FILTER_OK']

        0 False
        1 False
        2 False
        3 False
        4 True
        5 True
        6 True
        7 True
        8 True
        9 False
        Name: FILTER_OK, dtype: bool





        share|improve this answer




























          1














          You may do this using cumsum, followed by groupby, and transform:



          v = (~df.FILTER_OK).cumsum()
          df[v.groupby(v).transform('size').ge(4) & df['FILTER_OK']]

          TRN TRN_DATE FILTER_OK
          4 9652 2017-06-04 20:32:00 True
          5 965 2017-07-04 12:52:00 True
          6 752 2017-10-04 17:40:00 True
          7 9541 2017-10-04 19:29:00 True
          8 7452 2017-11-04 12:20:00 True




          Details

          First, use cumsum to segregate rows into groups:



          v = (~df.FILTER_OK).cumsum()
          v

          0 0
          1 0
          2 0
          3 1
          4 1
          5 1
          6 1
          7 1
          8 1
          9 2
          Name: FILTER_OK, dtype: int64


          Next, find the size of each group, and then figure out what groups have at least X rows (in your case, 4):



          v.groupby(v).transform('size')

          0 3
          1 3
          2 3
          3 6
          4 6
          5 6
          6 6
          7 6
          8 6
          9 1
          Name: FILTER_OK, dtype: int64

          v.groupby(v).transform('size').ge(4)

          0 False
          1 False
          2 False
          3 True
          4 True
          5 True
          6 True
          7 True
          8 True
          9 False
          Name: FILTER_OK, dtype: bool


          AND this mask with "FILTER_OK" to ensure we only take valid rows that fit the criteria.



          v.groupby(v).transform('size').ge(4) & df['FILTER_OK']

          0 False
          1 False
          2 False
          3 False
          4 True
          5 True
          6 True
          7 True
          8 True
          9 False
          Name: FILTER_OK, dtype: bool





          share|improve this answer


























            1












            1








            1







            You may do this using cumsum, followed by groupby, and transform:



            v = (~df.FILTER_OK).cumsum()
            df[v.groupby(v).transform('size').ge(4) & df['FILTER_OK']]

            TRN TRN_DATE FILTER_OK
            4 9652 2017-06-04 20:32:00 True
            5 965 2017-07-04 12:52:00 True
            6 752 2017-10-04 17:40:00 True
            7 9541 2017-10-04 19:29:00 True
            8 7452 2017-11-04 12:20:00 True




            Details

            First, use cumsum to segregate rows into groups:



            v = (~df.FILTER_OK).cumsum()
            v

            0 0
            1 0
            2 0
            3 1
            4 1
            5 1
            6 1
            7 1
            8 1
            9 2
            Name: FILTER_OK, dtype: int64


            Next, find the size of each group, and then figure out what groups have at least X rows (in your case, 4):



            v.groupby(v).transform('size')

            0 3
            1 3
            2 3
            3 6
            4 6
            5 6
            6 6
            7 6
            8 6
            9 1
            Name: FILTER_OK, dtype: int64

            v.groupby(v).transform('size').ge(4)

            0 False
            1 False
            2 False
            3 True
            4 True
            5 True
            6 True
            7 True
            8 True
            9 False
            Name: FILTER_OK, dtype: bool


            AND this mask with "FILTER_OK" to ensure we only take valid rows that fit the criteria.



            v.groupby(v).transform('size').ge(4) & df['FILTER_OK']

            0 False
            1 False
            2 False
            3 False
            4 True
            5 True
            6 True
            7 True
            8 True
            9 False
            Name: FILTER_OK, dtype: bool





            share|improve this answer













            You may do this using cumsum, followed by groupby, and transform:



            v = (~df.FILTER_OK).cumsum()
            df[v.groupby(v).transform('size').ge(4) & df['FILTER_OK']]

            TRN TRN_DATE FILTER_OK
            4 9652 2017-06-04 20:32:00 True
            5 965 2017-07-04 12:52:00 True
            6 752 2017-10-04 17:40:00 True
            7 9541 2017-10-04 19:29:00 True
            8 7452 2017-11-04 12:20:00 True




            Details

            First, use cumsum to segregate rows into groups:



            v = (~df.FILTER_OK).cumsum()
            v

            0 0
            1 0
            2 0
            3 1
            4 1
            5 1
            6 1
            7 1
            8 1
            9 2
            Name: FILTER_OK, dtype: int64


            Next, find the size of each group, and then figure out what groups have at least X rows (in your case, 4):



            v.groupby(v).transform('size')

            0 3
            1 3
            2 3
            3 6
            4 6
            5 6
            6 6
            7 6
            8 6
            9 1
            Name: FILTER_OK, dtype: int64

            v.groupby(v).transform('size').ge(4)

            0 False
            1 False
            2 False
            3 True
            4 True
            5 True
            6 True
            7 True
            8 True
            9 False
            Name: FILTER_OK, dtype: bool


            AND this mask with "FILTER_OK" to ensure we only take valid rows that fit the criteria.



            v.groupby(v).transform('size').ge(4) & df['FILTER_OK']

            0 False
            1 False
            2 False
            3 False
            4 True
            5 True
            6 True
            7 True
            8 True
            9 False
            Name: FILTER_OK, dtype: bool






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jan 2 at 17:12









            coldspeedcoldspeed

            136k23148234




            136k23148234

























                1














                This is will also consider 4 consecutive False



                s=df.FILTER_OK.astype(int).diff().ne(0).cumsum()
                df[s.isin(s.value_counts().loc[lambda x : x>4].index)]
                Out[784]:
                TRN TRN_DATE FILTER_OK
                4 9652 06/04/201720:32:00 True
                5 965 07/04/201712:52:00 True
                6 752 10/04/201717:40:00 True
                7 9541 10/04/201719:29:00 True
                8 7452 11/04/201712:20:00 True





                share|improve this answer




























                  1














                  This is will also consider 4 consecutive False



                  s=df.FILTER_OK.astype(int).diff().ne(0).cumsum()
                  df[s.isin(s.value_counts().loc[lambda x : x>4].index)]
                  Out[784]:
                  TRN TRN_DATE FILTER_OK
                  4 9652 06/04/201720:32:00 True
                  5 965 07/04/201712:52:00 True
                  6 752 10/04/201717:40:00 True
                  7 9541 10/04/201719:29:00 True
                  8 7452 11/04/201712:20:00 True





                  share|improve this answer


























                    1












                    1








                    1







                    This is will also consider 4 consecutive False



                    s=df.FILTER_OK.astype(int).diff().ne(0).cumsum()
                    df[s.isin(s.value_counts().loc[lambda x : x>4].index)]
                    Out[784]:
                    TRN TRN_DATE FILTER_OK
                    4 9652 06/04/201720:32:00 True
                    5 965 07/04/201712:52:00 True
                    6 752 10/04/201717:40:00 True
                    7 9541 10/04/201719:29:00 True
                    8 7452 11/04/201712:20:00 True





                    share|improve this answer













                    This is will also consider 4 consecutive False



                    s=df.FILTER_OK.astype(int).diff().ne(0).cumsum()
                    df[s.isin(s.value_counts().loc[lambda x : x>4].index)]
                    Out[784]:
                    TRN TRN_DATE FILTER_OK
                    4 9652 06/04/201720:32:00 True
                    5 965 07/04/201712:52:00 True
                    6 752 10/04/201717:40:00 True
                    7 9541 10/04/201719:29:00 True
                    8 7452 11/04/201712:20:00 True






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Jan 2 at 17:20









                    Wen-BenWen-Ben

                    117k83469




                    117k83469























                        0














                        One of possible options is to use itertools.groupby called on source
                        df.values.



                        An important difference of this method, compared to pd.groupby is
                        that if groupping key changes, then a new group is created.



                        So you can try the following code:



                        import pandas as pd
                        import itertools

                        # Source DataFrame
                        df = pd.DataFrame(data=[
                        [ 5153, '04/04/2017 11:40:00', True ], [ 7542, '04/04/2017 17:18:00', True ],
                        [ 875, '04/04/2017 20:08:00', True ], [ 74, '05/04/2017 20:30:00', False ],
                        [ 9652, '06/04/2017 20:32:00', True ], [ 965, '07/04/2017 12:52:00', True ],
                        [ 752, '10/04/2017 17:40:00', True ], [ 9541, '10/04/2017 19:29:00', True ],
                        [ 7452, '11/04/2017 12:20:00', True ], [ 9651, '12/04/2017 13:57:00', False ]],
                        columns=[ 'TRN', 'TRN_DATE', 'FILTER_OK' ])
                        # Work list
                        xx =
                        # Collect groups for 'True' key with at least 5 members
                        for key, group in itertools.groupby(df.values, lambda x: x[2]):
                        lst = list(group)
                        if key and len(lst) >= 5:
                        xx.extend(lst)
                        # Create result DataFrame with the same column names
                        df2 = pd.DataFrame(data=xx, columns=df.columns)





                        share|improve this answer




























                          0














                          One of possible options is to use itertools.groupby called on source
                          df.values.



                          An important difference of this method, compared to pd.groupby is
                          that if groupping key changes, then a new group is created.



                          So you can try the following code:



                          import pandas as pd
                          import itertools

                          # Source DataFrame
                          df = pd.DataFrame(data=[
                          [ 5153, '04/04/2017 11:40:00', True ], [ 7542, '04/04/2017 17:18:00', True ],
                          [ 875, '04/04/2017 20:08:00', True ], [ 74, '05/04/2017 20:30:00', False ],
                          [ 9652, '06/04/2017 20:32:00', True ], [ 965, '07/04/2017 12:52:00', True ],
                          [ 752, '10/04/2017 17:40:00', True ], [ 9541, '10/04/2017 19:29:00', True ],
                          [ 7452, '11/04/2017 12:20:00', True ], [ 9651, '12/04/2017 13:57:00', False ]],
                          columns=[ 'TRN', 'TRN_DATE', 'FILTER_OK' ])
                          # Work list
                          xx =
                          # Collect groups for 'True' key with at least 5 members
                          for key, group in itertools.groupby(df.values, lambda x: x[2]):
                          lst = list(group)
                          if key and len(lst) >= 5:
                          xx.extend(lst)
                          # Create result DataFrame with the same column names
                          df2 = pd.DataFrame(data=xx, columns=df.columns)





                          share|improve this answer


























                            0












                            0








                            0







                            One of possible options is to use itertools.groupby called on source
                            df.values.



                            An important difference of this method, compared to pd.groupby is
                            that if groupping key changes, then a new group is created.



                            So you can try the following code:



                            import pandas as pd
                            import itertools

                            # Source DataFrame
                            df = pd.DataFrame(data=[
                            [ 5153, '04/04/2017 11:40:00', True ], [ 7542, '04/04/2017 17:18:00', True ],
                            [ 875, '04/04/2017 20:08:00', True ], [ 74, '05/04/2017 20:30:00', False ],
                            [ 9652, '06/04/2017 20:32:00', True ], [ 965, '07/04/2017 12:52:00', True ],
                            [ 752, '10/04/2017 17:40:00', True ], [ 9541, '10/04/2017 19:29:00', True ],
                            [ 7452, '11/04/2017 12:20:00', True ], [ 9651, '12/04/2017 13:57:00', False ]],
                            columns=[ 'TRN', 'TRN_DATE', 'FILTER_OK' ])
                            # Work list
                            xx =
                            # Collect groups for 'True' key with at least 5 members
                            for key, group in itertools.groupby(df.values, lambda x: x[2]):
                            lst = list(group)
                            if key and len(lst) >= 5:
                            xx.extend(lst)
                            # Create result DataFrame with the same column names
                            df2 = pd.DataFrame(data=xx, columns=df.columns)





                            share|improve this answer













                            One of possible options is to use itertools.groupby called on source
                            df.values.



                            An important difference of this method, compared to pd.groupby is
                            that if groupping key changes, then a new group is created.



                            So you can try the following code:



                            import pandas as pd
                            import itertools

                            # Source DataFrame
                            df = pd.DataFrame(data=[
                            [ 5153, '04/04/2017 11:40:00', True ], [ 7542, '04/04/2017 17:18:00', True ],
                            [ 875, '04/04/2017 20:08:00', True ], [ 74, '05/04/2017 20:30:00', False ],
                            [ 9652, '06/04/2017 20:32:00', True ], [ 965, '07/04/2017 12:52:00', True ],
                            [ 752, '10/04/2017 17:40:00', True ], [ 9541, '10/04/2017 19:29:00', True ],
                            [ 7452, '11/04/2017 12:20:00', True ], [ 9651, '12/04/2017 13:57:00', False ]],
                            columns=[ 'TRN', 'TRN_DATE', 'FILTER_OK' ])
                            # Work list
                            xx =
                            # Collect groups for 'True' key with at least 5 members
                            for key, group in itertools.groupby(df.values, lambda x: x[2]):
                            lst = list(group)
                            if key and len(lst) >= 5:
                            xx.extend(lst)
                            # Create result DataFrame with the same column names
                            df2 = pd.DataFrame(data=xx, columns=df.columns)






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Jan 2 at 19:00









                            Valdi_BoValdi_Bo

                            5,2252916




                            5,2252916






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54010386%2fextract-groups-of-consecutive-values-having-greater-than-specified-size%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Monofisismo

                                Angular Downloading a file using contenturl with Basic Authentication

                                Olmecas