Extract groups of consecutive values having greater than specified size

I am trying to find within a dataframe if there are at least X consecutive operations (I already included a column "Filter_OK" that calculates if the row meets the criteria), and extract that group of rows.

      TRN     TRN_DATE          FILTER_OK  

0   5153    04/04/2017 11:40:00      True

1   7542    04/04/2017 17:18:00      True

2   875     04/04/2017 20:08:00      True

3   74      05/04/2017 20:30:00     False

4   9652    06/04/2017 20:32:00      True

5   965     07/04/2017 12:52:00      True

6   752     10/04/2017 17:40:00      True

7   9541    10/04/2017 19:29:00      True

8   7452    11/04/2017 12:20:00      True

9   9651    12/04/2017 13:57:00     False

For this example, if I am looking for 4 operations.

OUTPUT DESIRED:

    TRN     TRN_DATE    FILTER_OK  

4   9652    06/04/2017  20:32:00    True 

5   965     07/04/2017  12:52:00    True

6   752     10/04/2017  17:40:00    True

7   9541    10/04/2017  19:29:00    True

8   7452    11/04/2017  12:20:00    True

How can i subset the operations I need?

edited Jan 2 at 17:12

coldspeed

136k23148234

asked Jan 2 at 17:07

MarP

If your question was answered, please accept the most helpful answer here. You can accept an answer by clicking the grey check to the left of the answer to toggle it green. TIA.

– coldspeed
Jan 12 at 23:04

add a comment |

      TRN     TRN_DATE          FILTER_OK  

0   5153    04/04/2017 11:40:00      True

1   7542    04/04/2017 17:18:00      True

2   875     04/04/2017 20:08:00      True

3   74      05/04/2017 20:30:00     False

4   9652    06/04/2017 20:32:00      True

5   965     07/04/2017 12:52:00      True

6   752     10/04/2017 17:40:00      True

7   9541    10/04/2017 19:29:00      True

8   7452    11/04/2017 12:20:00      True

9   9651    12/04/2017 13:57:00     False

For this example, if I am looking for 4 operations.

OUTPUT DESIRED:

    TRN     TRN_DATE    FILTER_OK  

4   9652    06/04/2017  20:32:00    True 

5   965     07/04/2017  12:52:00    True

6   752     10/04/2017  17:40:00    True

7   9541    10/04/2017  19:29:00    True

8   7452    11/04/2017  12:20:00    True

How can i subset the operations I need?

edited Jan 2 at 17:12

coldspeed

136k23148234

asked Jan 2 at 17:07

MarP

If your question was answered, please accept the most helpful answer here. You can accept an answer by clicking the grey check to the left of the answer to toggle it green. TIA.

– coldspeed
Jan 12 at 23:04

add a comment |

      TRN     TRN_DATE          FILTER_OK  

0   5153    04/04/2017 11:40:00      True

1   7542    04/04/2017 17:18:00      True

2   875     04/04/2017 20:08:00      True

3   74      05/04/2017 20:30:00     False

4   9652    06/04/2017 20:32:00      True

5   965     07/04/2017 12:52:00      True

6   752     10/04/2017 17:40:00      True

7   9541    10/04/2017 19:29:00      True

8   7452    11/04/2017 12:20:00      True

9   9651    12/04/2017 13:57:00     False

For this example, if I am looking for 4 operations.

OUTPUT DESIRED:

    TRN     TRN_DATE    FILTER_OK  

4   9652    06/04/2017  20:32:00    True 

5   965     07/04/2017  12:52:00    True

6   752     10/04/2017  17:40:00    True

7   9541    10/04/2017  19:29:00    True

8   7452    11/04/2017  12:20:00    True

How can i subset the operations I need?

edited Jan 2 at 17:12

coldspeed

136k23148234

asked Jan 2 at 17:07

MarP

      TRN     TRN_DATE          FILTER_OK  

0   5153    04/04/2017 11:40:00      True

1   7542    04/04/2017 17:18:00      True

2   875     04/04/2017 20:08:00      True

3   74      05/04/2017 20:30:00     False

4   9652    06/04/2017 20:32:00      True

5   965     07/04/2017 12:52:00      True

6   752     10/04/2017 17:40:00      True

7   9541    10/04/2017 19:29:00      True

8   7452    11/04/2017 12:20:00      True

9   9651    12/04/2017 13:57:00     False

For this example, if I am looking for 4 operations.

OUTPUT DESIRED:

    TRN     TRN_DATE    FILTER_OK  

4   9652    06/04/2017  20:32:00    True 

5   965     07/04/2017  12:52:00    True

6   752     10/04/2017  17:40:00    True

7   9541    10/04/2017  19:29:00    True

8   7452    11/04/2017  12:20:00    True

How can i subset the operations I need?

python pandas dataframe

edited Jan 2 at 17:12

coldspeed

136k23148234

asked Jan 2 at 17:07

MarP

edited Jan 2 at 17:12

coldspeed

136k23148234

asked Jan 2 at 17:07

MarP

edited Jan 2 at 17:12

coldspeed

136k23148234

edited Jan 2 at 17:12

coldspeed

136k23148234

edited Jan 2 at 17:12

coldspeed

136k23148234

asked Jan 2 at 17:07

MarP

asked Jan 2 at 17:07

MarP

asked Jan 2 at 17:07

MarP

If your question was answered, please accept the most helpful answer here. You can accept an answer by clicking the grey check to the left of the answer to toggle it green. TIA.

– coldspeed
Jan 12 at 23:04

add a comment |

If your question was answered, please accept the most helpful answer here. You can accept an answer by clicking the grey check to the left of the answer to toggle it green. TIA.

– coldspeed
Jan 12 at 23:04

If your question was answered, please accept the most helpful answer here. You can accept an answer by clicking the grey check to the left of the answer to toggle it green. TIA.

– coldspeed
Jan 12 at 23:04

add a comment |

3 Answers
3

active

oldest

votes

You may do this using cumsum, followed by groupby, and transform:

v = (~df.FILTER_OK).cumsum()

df[v.groupby(v).transform('size').ge(4) & df['FILTER_OK']]



    TRN            TRN_DATE  FILTER_OK

4  9652 2017-06-04 20:32:00       True

5   965 2017-07-04 12:52:00       True

6   752 2017-10-04 17:40:00       True

7  9541 2017-10-04 19:29:00       True

8  7452 2017-11-04 12:20:00       True

Details

First, use cumsum to segregate rows into groups:

v = (~df.FILTER_OK).cumsum()

v



0    0

1    0

2    0

3    1

4    1

5    1

6    1

7    1

8    1

9    2

Name: FILTER_OK, dtype: int64

Next, find the size of each group, and then figure out what groups have at least X rows (in your case, 4):

v.groupby(v).transform('size')



0    3

1    3

2    3

3    6

4    6

5    6

6    6

7    6

8    6

9    1

Name: FILTER_OK, dtype: int64



v.groupby(v).transform('size').ge(4)



0    False

1    False

2    False

3     True

4     True

5     True

6     True

7     True

8     True

9    False

Name: FILTER_OK, dtype: bool

AND this mask with "FILTER_OK" to ensure we only take valid rows that fit the criteria.

v.groupby(v).transform('size').ge(4) & df['FILTER_OK']



0    False

1    False

2    False

3    False

4     True

5     True

6     True

7     True

8     True

9    False

Name: FILTER_OK, dtype: bool

answered Jan 2 at 17:12

coldspeed

136k23148234

add a comment |

This is will also consider 4 consecutive False

s=df.FILTER_OK.astype(int).diff().ne(0).cumsum()

df[s.isin(s.value_counts().loc[lambda x : x>4].index)]

Out[784]: 

    TRN            TRN_DATE  FILTER_OK

4  9652  06/04/201720:32:00       True

5   965  07/04/201712:52:00       True

6   752  10/04/201717:40:00       True

7  9541  10/04/201719:29:00       True

8  7452  11/04/201712:20:00       True

answered Jan 2 at 17:20

Wen-Ben

117k83469

add a comment |

One of possible options is to use itertools.groupby called on source
df.values.

An important difference of this method, compared to pd.groupby is
that if groupping key changes, then a new group is created.

So you can try the following code:

import pandas as pd

import itertools



# Source DataFrame

df = pd.DataFrame(data=[

    [ 5153, '04/04/2017 11:40:00', True ], [ 7542, '04/04/2017 17:18:00', True ],

    [  875, '04/04/2017 20:08:00', True ], [   74, '05/04/2017 20:30:00', False ],

    [ 9652, '06/04/2017 20:32:00', True ], [  965, '07/04/2017 12:52:00', True ],

    [  752, '10/04/2017 17:40:00', True ], [ 9541, '10/04/2017 19:29:00', True ],

    [ 7452, '11/04/2017 12:20:00', True ], [ 9651, '12/04/2017 13:57:00', False ]],

    columns=[ 'TRN', 'TRN_DATE', 'FILTER_OK' ])

# Work list 

xx = 

# Collect groups for 'True' key with at least 5 members

for key, group in itertools.groupby(df.values, lambda x: x[2]):

    lst = list(group)

    if key and len(lst) >= 5:

        xx.extend(lst)

# Create result DataFrame with the same column names

df2 = pd.DataFrame(data=xx, columns=df.columns)

answered Jan 2 at 19:00

Valdi_Bo

5,2252916

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54010386%2fextract-groups-of-consecutive-values-having-greater-than-specified-size%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

You may do this using cumsum, followed by groupby, and transform:

v = (~df.FILTER_OK).cumsum()

df[v.groupby(v).transform('size').ge(4) & df['FILTER_OK']]



    TRN            TRN_DATE  FILTER_OK

4  9652 2017-06-04 20:32:00       True

5   965 2017-07-04 12:52:00       True

6   752 2017-10-04 17:40:00       True

7  9541 2017-10-04 19:29:00       True

8  7452 2017-11-04 12:20:00       True

Details

First, use cumsum to segregate rows into groups:

v = (~df.FILTER_OK).cumsum()

v



0    0

1    0

2    0

3    1

4    1

5    1

6    1

7    1

8    1

9    2

Name: FILTER_OK, dtype: int64

Next, find the size of each group, and then figure out what groups have at least X rows (in your case, 4):

v.groupby(v).transform('size')



0    3

1    3

2    3

3    6

4    6

5    6

6    6

7    6

8    6

9    1

Name: FILTER_OK, dtype: int64



v.groupby(v).transform('size').ge(4)



0    False

1    False

2    False

3     True

4     True

5     True

6     True

7     True

8     True

9    False

Name: FILTER_OK, dtype: bool

AND this mask with "FILTER_OK" to ensure we only take valid rows that fit the criteria.

v.groupby(v).transform('size').ge(4) & df['FILTER_OK']



0    False

1    False

2    False

3    False

4     True

5     True

6     True

7     True

8     True

9    False

Name: FILTER_OK, dtype: bool

answered Jan 2 at 17:12

coldspeed

136k23148234

add a comment |

You may do this using cumsum, followed by groupby, and transform:

v = (~df.FILTER_OK).cumsum()

df[v.groupby(v).transform('size').ge(4) & df['FILTER_OK']]



    TRN            TRN_DATE  FILTER_OK

4  9652 2017-06-04 20:32:00       True

5   965 2017-07-04 12:52:00       True

6   752 2017-10-04 17:40:00       True

7  9541 2017-10-04 19:29:00       True

8  7452 2017-11-04 12:20:00       True

Details

First, use cumsum to segregate rows into groups:

v = (~df.FILTER_OK).cumsum()

v



0    0

1    0

2    0

3    1

4    1

5    1

6    1

7    1

8    1

9    2

Name: FILTER_OK, dtype: int64

Next, find the size of each group, and then figure out what groups have at least X rows (in your case, 4):

v.groupby(v).transform('size')



0    3

1    3

2    3

3    6

4    6

5    6

6    6

7    6

8    6

9    1

Name: FILTER_OK, dtype: int64



v.groupby(v).transform('size').ge(4)



0    False

1    False

2    False

3     True

4     True

5     True

6     True

7     True

8     True

9    False

Name: FILTER_OK, dtype: bool

AND this mask with "FILTER_OK" to ensure we only take valid rows that fit the criteria.

v.groupby(v).transform('size').ge(4) & df['FILTER_OK']



0    False

1    False

2    False

3    False

4     True

5     True

6     True

7     True

8     True

9    False

Name: FILTER_OK, dtype: bool

answered Jan 2 at 17:12

coldspeed

136k23148234

add a comment |

You may do this using cumsum, followed by groupby, and transform:

v = (~df.FILTER_OK).cumsum()

df[v.groupby(v).transform('size').ge(4) & df['FILTER_OK']]



    TRN            TRN_DATE  FILTER_OK

4  9652 2017-06-04 20:32:00       True

5   965 2017-07-04 12:52:00       True

6   752 2017-10-04 17:40:00       True

7  9541 2017-10-04 19:29:00       True

8  7452 2017-11-04 12:20:00       True

Details

First, use cumsum to segregate rows into groups:

v = (~df.FILTER_OK).cumsum()

v



0    0

1    0

2    0

3    1

4    1

5    1

6    1

7    1

8    1

9    2

Name: FILTER_OK, dtype: int64

Next, find the size of each group, and then figure out what groups have at least X rows (in your case, 4):

v.groupby(v).transform('size')



0    3

1    3

2    3

3    6

4    6

5    6

6    6

7    6

8    6

9    1

Name: FILTER_OK, dtype: int64



v.groupby(v).transform('size').ge(4)



0    False

1    False

2    False

3     True

4     True

5     True

6     True

7     True

8     True

9    False

Name: FILTER_OK, dtype: bool

AND this mask with "FILTER_OK" to ensure we only take valid rows that fit the criteria.

v.groupby(v).transform('size').ge(4) & df['FILTER_OK']



0    False

1    False

2    False

3    False

4     True

5     True

6     True

7     True

8     True

9    False

Name: FILTER_OK, dtype: bool

answered Jan 2 at 17:12

coldspeed

136k23148234

You may do this using cumsum, followed by groupby, and transform:

v = (~df.FILTER_OK).cumsum()

df[v.groupby(v).transform('size').ge(4) & df['FILTER_OK']]



    TRN            TRN_DATE  FILTER_OK

4  9652 2017-06-04 20:32:00       True

5   965 2017-07-04 12:52:00       True

6   752 2017-10-04 17:40:00       True

7  9541 2017-10-04 19:29:00       True

8  7452 2017-11-04 12:20:00       True

Details

First, use cumsum to segregate rows into groups:

v = (~df.FILTER_OK).cumsum()

v



0    0

1    0

2    0

3    1

4    1

5    1

6    1

7    1

8    1

9    2

Name: FILTER_OK, dtype: int64

Next, find the size of each group, and then figure out what groups have at least X rows (in your case, 4):

v.groupby(v).transform('size')



0    3

1    3

2    3

3    6

4    6

5    6

6    6

7    6

8    6

9    1

Name: FILTER_OK, dtype: int64



v.groupby(v).transform('size').ge(4)



0    False

1    False

2    False

3     True

4     True

5     True

6     True

7     True

8     True

9    False

Name: FILTER_OK, dtype: bool

AND this mask with "FILTER_OK" to ensure we only take valid rows that fit the criteria.

v.groupby(v).transform('size').ge(4) & df['FILTER_OK']



0    False

1    False

2    False

3    False

4     True

5     True

6     True

7     True

8     True

9    False

Name: FILTER_OK, dtype: bool

answered Jan 2 at 17:12

coldspeed

136k23148234

answered Jan 2 at 17:12

coldspeed

136k23148234

answered Jan 2 at 17:12

coldspeed

136k23148234

answered Jan 2 at 17:12

coldspeed

136k23148234

add a comment |

This is will also consider 4 consecutive False

s=df.FILTER_OK.astype(int).diff().ne(0).cumsum()

df[s.isin(s.value_counts().loc[lambda x : x>4].index)]

Out[784]: 

    TRN            TRN_DATE  FILTER_OK

4  9652  06/04/201720:32:00       True

5   965  07/04/201712:52:00       True

6   752  10/04/201717:40:00       True

7  9541  10/04/201719:29:00       True

8  7452  11/04/201712:20:00       True

answered Jan 2 at 17:20

Wen-Ben

117k83469

add a comment |

This is will also consider 4 consecutive False

s=df.FILTER_OK.astype(int).diff().ne(0).cumsum()

df[s.isin(s.value_counts().loc[lambda x : x>4].index)]

Out[784]: 

    TRN            TRN_DATE  FILTER_OK

4  9652  06/04/201720:32:00       True

5   965  07/04/201712:52:00       True

6   752  10/04/201717:40:00       True

7  9541  10/04/201719:29:00       True

8  7452  11/04/201712:20:00       True

answered Jan 2 at 17:20

Wen-Ben

117k83469

add a comment |

This is will also consider 4 consecutive False

s=df.FILTER_OK.astype(int).diff().ne(0).cumsum()

df[s.isin(s.value_counts().loc[lambda x : x>4].index)]

Out[784]: 

    TRN            TRN_DATE  FILTER_OK

4  9652  06/04/201720:32:00       True

5   965  07/04/201712:52:00       True

6   752  10/04/201717:40:00       True

7  9541  10/04/201719:29:00       True

8  7452  11/04/201712:20:00       True

answered Jan 2 at 17:20

Wen-Ben

117k83469

This is will also consider 4 consecutive False

s=df.FILTER_OK.astype(int).diff().ne(0).cumsum()

df[s.isin(s.value_counts().loc[lambda x : x>4].index)]

Out[784]: 

    TRN            TRN_DATE  FILTER_OK

4  9652  06/04/201720:32:00       True

5   965  07/04/201712:52:00       True

6   752  10/04/201717:40:00       True

7  9541  10/04/201719:29:00       True

8  7452  11/04/201712:20:00       True

answered Jan 2 at 17:20

Wen-Ben

117k83469

answered Jan 2 at 17:20

Wen-Ben

117k83469

answered Jan 2 at 17:20

Wen-Ben

117k83469

answered Jan 2 at 17:20

Wen-Ben

117k83469

add a comment |

One of possible options is to use itertools.groupby called on source
df.values.

An important difference of this method, compared to pd.groupby is
that if groupping key changes, then a new group is created.

So you can try the following code:

import pandas as pd

import itertools



# Source DataFrame

df = pd.DataFrame(data=[

    [ 5153, '04/04/2017 11:40:00', True ], [ 7542, '04/04/2017 17:18:00', True ],

    [  875, '04/04/2017 20:08:00', True ], [   74, '05/04/2017 20:30:00', False ],

    [ 9652, '06/04/2017 20:32:00', True ], [  965, '07/04/2017 12:52:00', True ],

    [  752, '10/04/2017 17:40:00', True ], [ 9541, '10/04/2017 19:29:00', True ],

    [ 7452, '11/04/2017 12:20:00', True ], [ 9651, '12/04/2017 13:57:00', False ]],

    columns=[ 'TRN', 'TRN_DATE', 'FILTER_OK' ])

# Work list 

xx = 

# Collect groups for 'True' key with at least 5 members

for key, group in itertools.groupby(df.values, lambda x: x[2]):

    lst = list(group)

    if key and len(lst) >= 5:

        xx.extend(lst)

# Create result DataFrame with the same column names

df2 = pd.DataFrame(data=xx, columns=df.columns)

answered Jan 2 at 19:00

Valdi_Bo

5,2252916

add a comment |

One of possible options is to use itertools.groupby called on source
df.values.

An important difference of this method, compared to pd.groupby is
that if groupping key changes, then a new group is created.

So you can try the following code:

import pandas as pd

import itertools



# Source DataFrame

df = pd.DataFrame(data=[

    [ 5153, '04/04/2017 11:40:00', True ], [ 7542, '04/04/2017 17:18:00', True ],

    [  875, '04/04/2017 20:08:00', True ], [   74, '05/04/2017 20:30:00', False ],

    [ 9652, '06/04/2017 20:32:00', True ], [  965, '07/04/2017 12:52:00', True ],

    [  752, '10/04/2017 17:40:00', True ], [ 9541, '10/04/2017 19:29:00', True ],

    [ 7452, '11/04/2017 12:20:00', True ], [ 9651, '12/04/2017 13:57:00', False ]],

    columns=[ 'TRN', 'TRN_DATE', 'FILTER_OK' ])

# Work list 

xx = 

# Collect groups for 'True' key with at least 5 members

for key, group in itertools.groupby(df.values, lambda x: x[2]):

    lst = list(group)

    if key and len(lst) >= 5:

        xx.extend(lst)

# Create result DataFrame with the same column names

df2 = pd.DataFrame(data=xx, columns=df.columns)

answered Jan 2 at 19:00

Valdi_Bo

5,2252916

add a comment |

One of possible options is to use itertools.groupby called on source
df.values.

An important difference of this method, compared to pd.groupby is
that if groupping key changes, then a new group is created.

So you can try the following code:

import pandas as pd

import itertools



# Source DataFrame

df = pd.DataFrame(data=[

    [ 5153, '04/04/2017 11:40:00', True ], [ 7542, '04/04/2017 17:18:00', True ],

    [  875, '04/04/2017 20:08:00', True ], [   74, '05/04/2017 20:30:00', False ],

    [ 9652, '06/04/2017 20:32:00', True ], [  965, '07/04/2017 12:52:00', True ],

    [  752, '10/04/2017 17:40:00', True ], [ 9541, '10/04/2017 19:29:00', True ],

    [ 7452, '11/04/2017 12:20:00', True ], [ 9651, '12/04/2017 13:57:00', False ]],

    columns=[ 'TRN', 'TRN_DATE', 'FILTER_OK' ])

# Work list 

xx = 

# Collect groups for 'True' key with at least 5 members

for key, group in itertools.groupby(df.values, lambda x: x[2]):

    lst = list(group)

    if key and len(lst) >= 5:

        xx.extend(lst)

# Create result DataFrame with the same column names

df2 = pd.DataFrame(data=xx, columns=df.columns)

answered Jan 2 at 19:00

Valdi_Bo

5,2252916

One of possible options is to use itertools.groupby called on source
df.values.

An important difference of this method, compared to pd.groupby is
that if groupping key changes, then a new group is created.

So you can try the following code:

import pandas as pd

import itertools



# Source DataFrame

df = pd.DataFrame(data=[

    [ 5153, '04/04/2017 11:40:00', True ], [ 7542, '04/04/2017 17:18:00', True ],

    [  875, '04/04/2017 20:08:00', True ], [   74, '05/04/2017 20:30:00', False ],

    [ 9652, '06/04/2017 20:32:00', True ], [  965, '07/04/2017 12:52:00', True ],

    [  752, '10/04/2017 17:40:00', True ], [ 9541, '10/04/2017 19:29:00', True ],

    [ 7452, '11/04/2017 12:20:00', True ], [ 9651, '12/04/2017 13:57:00', False ]],

    columns=[ 'TRN', 'TRN_DATE', 'FILTER_OK' ])

# Work list 

xx = 

# Collect groups for 'True' key with at least 5 members

for key, group in itertools.groupby(df.values, lambda x: x[2]):

    lst = list(group)

    if key and len(lst) >= 5:

        xx.extend(lst)

# Create result DataFrame with the same column names

df2 = pd.DataFrame(data=xx, columns=df.columns)

answered Jan 2 at 19:00

Valdi_Bo

5,2252916

answered Jan 2 at 19:00

Valdi_Bo

5,2252916

answered Jan 2 at 19:00

Valdi_Bo

5,2252916

answered Jan 2 at 19:00

Valdi_Bo

5,2252916

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk