More efficient solution to find longest series based on boolean in NumPy ndArray

I search my ndArray to find longest series based on True values. Is there an option to find longest series without looping through array?

I've already wrote my own solution with numpy.nonzero, but there is probably better one.

import numpy as np

arr = np.array([[[1,2,3,4,5],

                [6,7,8,9,10],

                [11,12,13,14,15],

                [16,17,18,19,20],

                [21,22,23,24,25]],

                [[True,True,True,False,True],

                [True,True,True,True,False],

                [True,True,False,True,True],

                [True,True,True,False,True],

                [True,True,True,False,True]]])



def getIndices(arr):

    arr_to_search = np.nonzero(arr)

    arrs = 

    prev_el0 = 0

    prev_el1 = -1

    activ_long = 

    for i in range(len(arr_to_search[0])):

        if arr_to_search[0][i] == prev_el0:

            if arr_to_search[1][i] != prev_el1 + 1:

                arrs.append(activ_long)

                activ_long = 

        else:

            arrs.append(activ_long)

            activ_long = 

        activ_long.append((arr_to_search[0][i],arr_to_search[1][i]))

        prev_el0 = arr_to_search[0][i]

        prev_el1 = arr_to_search[1][i]



    max_len = len(max(arrs,key=len))

    longest_arr_list = [a for a in arrs if len(a) == max_len]

    return longest_arr_list



print(getIndices(arr[1,:,:]))

print(getIndices(arr[1,:,:].T))





[[(1, 0), (1, 1), (1, 2), (1, 3)]]

[[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)], [(1, 0), (1, 1), (1, 2), (1, 3), (1, 4)]]

asked Dec 30 '18 at 15:05

grudzien

If your code works, might be more appropriate for code review

– user3483203
Dec 30 '18 at 15:13

What do you mean with find longest series based on True values ? What is the expected output?

– yatu
Dec 30 '18 at 15:14

By series i mean longest subarray where value is True, like in example below: [True,True,True,False,True] [True,True,True,True,False] [True,True,False,True,True] [True,True,True,False,True] [True,True,True,False,True] here, longest series is in col 0 and col 1, 5 x True, False is breaking series

– grudzien
Dec 30 '18 at 15:25

add a comment |

I search my ndArray to find longest series based on True values. Is there an option to find longest series without looping through array?

I've already wrote my own solution with numpy.nonzero, but there is probably better one.

import numpy as np

arr = np.array([[[1,2,3,4,5],

                [6,7,8,9,10],

                [11,12,13,14,15],

                [16,17,18,19,20],

                [21,22,23,24,25]],

                [[True,True,True,False,True],

                [True,True,True,True,False],

                [True,True,False,True,True],

                [True,True,True,False,True],

                [True,True,True,False,True]]])



def getIndices(arr):

    arr_to_search = np.nonzero(arr)

    arrs = 

    prev_el0 = 0

    prev_el1 = -1

    activ_long = 

    for i in range(len(arr_to_search[0])):

        if arr_to_search[0][i] == prev_el0:

            if arr_to_search[1][i] != prev_el1 + 1:

                arrs.append(activ_long)

                activ_long = 

        else:

            arrs.append(activ_long)

            activ_long = 

        activ_long.append((arr_to_search[0][i],arr_to_search[1][i]))

        prev_el0 = arr_to_search[0][i]

        prev_el1 = arr_to_search[1][i]



    max_len = len(max(arrs,key=len))

    longest_arr_list = [a for a in arrs if len(a) == max_len]

    return longest_arr_list



print(getIndices(arr[1,:,:]))

print(getIndices(arr[1,:,:].T))





[[(1, 0), (1, 1), (1, 2), (1, 3)]]

[[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)], [(1, 0), (1, 1), (1, 2), (1, 3), (1, 4)]]

asked Dec 30 '18 at 15:05

grudzien

If your code works, might be more appropriate for code review

– user3483203
Dec 30 '18 at 15:13

What do you mean with find longest series based on True values ? What is the expected output?

– yatu
Dec 30 '18 at 15:14

By series i mean longest subarray where value is True, like in example below: [True,True,True,False,True] [True,True,True,True,False] [True,True,False,True,True] [True,True,True,False,True] [True,True,True,False,True] here, longest series is in col 0 and col 1, 5 x True, False is breaking series

– grudzien
Dec 30 '18 at 15:25

add a comment |

I search my ndArray to find longest series based on True values. Is there an option to find longest series without looping through array?

I've already wrote my own solution with numpy.nonzero, but there is probably better one.

import numpy as np

arr = np.array([[[1,2,3,4,5],

                [6,7,8,9,10],

                [11,12,13,14,15],

                [16,17,18,19,20],

                [21,22,23,24,25]],

                [[True,True,True,False,True],

                [True,True,True,True,False],

                [True,True,False,True,True],

                [True,True,True,False,True],

                [True,True,True,False,True]]])



def getIndices(arr):

    arr_to_search = np.nonzero(arr)

    arrs = 

    prev_el0 = 0

    prev_el1 = -1

    activ_long = 

    for i in range(len(arr_to_search[0])):

        if arr_to_search[0][i] == prev_el0:

            if arr_to_search[1][i] != prev_el1 + 1:

                arrs.append(activ_long)

                activ_long = 

        else:

            arrs.append(activ_long)

            activ_long = 

        activ_long.append((arr_to_search[0][i],arr_to_search[1][i]))

        prev_el0 = arr_to_search[0][i]

        prev_el1 = arr_to_search[1][i]



    max_len = len(max(arrs,key=len))

    longest_arr_list = [a for a in arrs if len(a) == max_len]

    return longest_arr_list



print(getIndices(arr[1,:,:]))

print(getIndices(arr[1,:,:].T))





[[(1, 0), (1, 1), (1, 2), (1, 3)]]

[[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)], [(1, 0), (1, 1), (1, 2), (1, 3), (1, 4)]]

asked Dec 30 '18 at 15:05

grudzien

I search my ndArray to find longest series based on True values. Is there an option to find longest series without looping through array?

I've already wrote my own solution with numpy.nonzero, but there is probably better one.

import numpy as np

arr = np.array([[[1,2,3,4,5],

                [6,7,8,9,10],

                [11,12,13,14,15],

                [16,17,18,19,20],

                [21,22,23,24,25]],

                [[True,True,True,False,True],

                [True,True,True,True,False],

                [True,True,False,True,True],

                [True,True,True,False,True],

                [True,True,True,False,True]]])



def getIndices(arr):

    arr_to_search = np.nonzero(arr)

    arrs = 

    prev_el0 = 0

    prev_el1 = -1

    activ_long = 

    for i in range(len(arr_to_search[0])):

        if arr_to_search[0][i] == prev_el0:

            if arr_to_search[1][i] != prev_el1 + 1:

                arrs.append(activ_long)

                activ_long = 

        else:

            arrs.append(activ_long)

            activ_long = 

        activ_long.append((arr_to_search[0][i],arr_to_search[1][i]))

        prev_el0 = arr_to_search[0][i]

        prev_el1 = arr_to_search[1][i]



    max_len = len(max(arrs,key=len))

    longest_arr_list = [a for a in arrs if len(a) == max_len]

    return longest_arr_list



print(getIndices(arr[1,:,:]))

print(getIndices(arr[1,:,:].T))





[[(1, 0), (1, 1), (1, 2), (1, 3)]]

[[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)], [(1, 0), (1, 1), (1, 2), (1, 3), (1, 4)]]

python numpy multidimensional-array

asked Dec 30 '18 at 15:05

grudzien

asked Dec 30 '18 at 15:05

grudzien

asked Dec 30 '18 at 15:05

grudzien

asked Dec 30 '18 at 15:05

grudzien

asked Dec 30 '18 at 15:05

grudzien

If your code works, might be more appropriate for code review

– user3483203
Dec 30 '18 at 15:13

What do you mean with find longest series based on True values ? What is the expected output?

– yatu
Dec 30 '18 at 15:14

By series i mean longest subarray where value is True, like in example below: [True,True,True,False,True] [True,True,True,True,False] [True,True,False,True,True] [True,True,True,False,True] [True,True,True,False,True] here, longest series is in col 0 and col 1, 5 x True, False is breaking series

– grudzien
Dec 30 '18 at 15:25

add a comment |

If your code works, might be more appropriate for code review

– user3483203
Dec 30 '18 at 15:13

What do you mean with find longest series based on True values ? What is the expected output?

– yatu
Dec 30 '18 at 15:14

By series i mean longest subarray where value is True, like in example below: [True,True,True,False,True] [True,True,True,True,False] [True,True,False,True,True] [True,True,True,False,True] [True,True,True,False,True] here, longest series is in col 0 and col 1, 5 x True, False is breaking series

– grudzien
Dec 30 '18 at 15:25

If your code works, might be more appropriate for code review

– user3483203
Dec 30 '18 at 15:13

What do you mean with find longest series based on True values ? What is the expected output?

– yatu
Dec 30 '18 at 15:14

By series i mean longest subarray where value is True, like in example below: [True,True,True,False,True] [True,True,True,True,False] [True,True,False,True,True] [True,True,True,False,True] [True,True,True,False,True] here, longest series is in col 0 and col 1, 5 x True, False is breaking series

– grudzien
Dec 30 '18 at 15:25

add a comment |

1 Answer
1

active

oldest

votes

Here is a numpy solution which avoids explicit loops based on this previous question.

I'm assuming the boolean array is named a. Essentially we find the indices for where rows change from 0 to 1 or from 1 to 0 and look at the difference between these. By padding 0s on the frong and back, we ensure that for every transition from a 0 to a 1, there is another transition from a 1 to a 0.

For convenience I process a and a.T at the same time, but you could do them separately if you want.

m,n = a.shape

A = np.zeros((2*m,n+2))

A[:m,1:-1] = a

A[m:,1:-1] = a.T



dA = np.diff(A)



start = np.where(dA>0)

end = np.where(dA<0)



argmax_run = np.argmax(end[1]-start[1])



row = start[0][argmax_run]

col_start = start[1][argmax_run]

col_end= end[1][argmax_run]-1



max_len = col_end - col_start + 1



print('max run of length {}'.format(max_len))

print('in '+('row' if row<m else'col')+' {}'.format(row%m)+' from '+('col' if row<m else'row')+' {} to {}'.format(col_start,col_end))

To improve performance and storage we can change A to a boolean array. Since the -1 and 1 in dA above always come in pairs, we can find start and end as below.

nz = np.nonzero(dA)

start = (nz[0][::2], nz[1][::2])

end = (nz[0][1::2], nz[1][1::2])

Note that you can then remove the variables start and end entirely as they aren't really needed.

answered Dec 30 '18 at 16:21

tch

42525

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53978707%2fmore-efficient-solution-to-find-longest-series-based-on-boolean-in-numpy-ndarray%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Here is a numpy solution which avoids explicit loops based on this previous question.

For convenience I process a and a.T at the same time, but you could do them separately if you want.

m,n = a.shape

A = np.zeros((2*m,n+2))

A[:m,1:-1] = a

A[m:,1:-1] = a.T



dA = np.diff(A)



start = np.where(dA>0)

end = np.where(dA<0)



argmax_run = np.argmax(end[1]-start[1])



row = start[0][argmax_run]

col_start = start[1][argmax_run]

col_end= end[1][argmax_run]-1



max_len = col_end - col_start + 1



print('max run of length {}'.format(max_len))

print('in '+('row' if row<m else'col')+' {}'.format(row%m)+' from '+('col' if row<m else'row')+' {} to {}'.format(col_start,col_end))

To improve performance and storage we can change A to a boolean array. Since the -1 and 1 in dA above always come in pairs, we can find start and end as below.

nz = np.nonzero(dA)

start = (nz[0][::2], nz[1][::2])

end = (nz[0][1::2], nz[1][1::2])

Note that you can then remove the variables start and end entirely as they aren't really needed.

answered Dec 30 '18 at 16:21

tch

42525

add a comment |

Here is a numpy solution which avoids explicit loops based on this previous question.

For convenience I process a and a.T at the same time, but you could do them separately if you want.

m,n = a.shape

A = np.zeros((2*m,n+2))

A[:m,1:-1] = a

A[m:,1:-1] = a.T



dA = np.diff(A)



start = np.where(dA>0)

end = np.where(dA<0)



argmax_run = np.argmax(end[1]-start[1])



row = start[0][argmax_run]

col_start = start[1][argmax_run]

col_end= end[1][argmax_run]-1



max_len = col_end - col_start + 1



print('max run of length {}'.format(max_len))

print('in '+('row' if row<m else'col')+' {}'.format(row%m)+' from '+('col' if row<m else'row')+' {} to {}'.format(col_start,col_end))

To improve performance and storage we can change A to a boolean array. Since the -1 and 1 in dA above always come in pairs, we can find start and end as below.

nz = np.nonzero(dA)

start = (nz[0][::2], nz[1][::2])

end = (nz[0][1::2], nz[1][1::2])

Note that you can then remove the variables start and end entirely as they aren't really needed.

answered Dec 30 '18 at 16:21

tch

42525

add a comment |

Here is a numpy solution which avoids explicit loops based on this previous question.

For convenience I process a and a.T at the same time, but you could do them separately if you want.

m,n = a.shape

A = np.zeros((2*m,n+2))

A[:m,1:-1] = a

A[m:,1:-1] = a.T



dA = np.diff(A)



start = np.where(dA>0)

end = np.where(dA<0)



argmax_run = np.argmax(end[1]-start[1])



row = start[0][argmax_run]

col_start = start[1][argmax_run]

col_end= end[1][argmax_run]-1



max_len = col_end - col_start + 1



print('max run of length {}'.format(max_len))

print('in '+('row' if row<m else'col')+' {}'.format(row%m)+' from '+('col' if row<m else'row')+' {} to {}'.format(col_start,col_end))

To improve performance and storage we can change A to a boolean array. Since the -1 and 1 in dA above always come in pairs, we can find start and end as below.

nz = np.nonzero(dA)

start = (nz[0][::2], nz[1][::2])

end = (nz[0][1::2], nz[1][1::2])

Note that you can then remove the variables start and end entirely as they aren't really needed.

answered Dec 30 '18 at 16:21

tch

42525

Here is a numpy solution which avoids explicit loops based on this previous question.

For convenience I process a and a.T at the same time, but you could do them separately if you want.

m,n = a.shape

A = np.zeros((2*m,n+2))

A[:m,1:-1] = a

A[m:,1:-1] = a.T



dA = np.diff(A)



start = np.where(dA>0)

end = np.where(dA<0)



argmax_run = np.argmax(end[1]-start[1])



row = start[0][argmax_run]

col_start = start[1][argmax_run]

col_end= end[1][argmax_run]-1



max_len = col_end - col_start + 1



print('max run of length {}'.format(max_len))

print('in '+('row' if row<m else'col')+' {}'.format(row%m)+' from '+('col' if row<m else'row')+' {} to {}'.format(col_start,col_end))

To improve performance and storage we can change A to a boolean array. Since the -1 and 1 in dA above always come in pairs, we can find start and end as below.

nz = np.nonzero(dA)

start = (nz[0][::2], nz[1][::2])

end = (nz[0][1::2], nz[1][1::2])

Note that you can then remove the variables start and end entirely as they aren't really needed.

answered Dec 30 '18 at 16:21

tch

42525

answered Dec 30 '18 at 16:21

tch

42525

answered Dec 30 '18 at 16:21

tch

42525

answered Dec 30 '18 at 16:21

tch

42525

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk