More efficient solution to find longest series based on boolean in NumPy ndArray












1















I search my ndArray to find longest series based on True values. Is there an option to find longest series without looping through array?



I've already wrote my own solution with numpy.nonzero, but there is probably better one.



import numpy as np
arr = np.array([[[1,2,3,4,5],
[6,7,8,9,10],
[11,12,13,14,15],
[16,17,18,19,20],
[21,22,23,24,25]],
[[True,True,True,False,True],
[True,True,True,True,False],
[True,True,False,True,True],
[True,True,True,False,True],
[True,True,True,False,True]]])

def getIndices(arr):
arr_to_search = np.nonzero(arr)
arrs =
prev_el0 = 0
prev_el1 = -1
activ_long =
for i in range(len(arr_to_search[0])):
if arr_to_search[0][i] == prev_el0:
if arr_to_search[1][i] != prev_el1 + 1:
arrs.append(activ_long)
activ_long =
else:
arrs.append(activ_long)
activ_long =
activ_long.append((arr_to_search[0][i],arr_to_search[1][i]))
prev_el0 = arr_to_search[0][i]
prev_el1 = arr_to_search[1][i]

max_len = len(max(arrs,key=len))
longest_arr_list = [a for a in arrs if len(a) == max_len]
return longest_arr_list

print(getIndices(arr[1,:,:]))
print(getIndices(arr[1,:,:].T))


[[(1, 0), (1, 1), (1, 2), (1, 3)]]
[[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)], [(1, 0), (1, 1), (1, 2), (1, 3), (1, 4)]]









share|improve this question























  • If your code works, might be more appropriate for code review

    – user3483203
    Dec 30 '18 at 15:13











  • What do you mean with find longest series based on True values ? What is the expected output?

    – yatu
    Dec 30 '18 at 15:14













  • By series i mean longest subarray where value is True, like in example below: [True,True,True,False,True] [True,True,True,True,False] [True,True,False,True,True] [True,True,True,False,True] [True,True,True,False,True] here, longest series is in col 0 and col 1, 5 x True, False is breaking series

    – grudzien
    Dec 30 '18 at 15:25


















1















I search my ndArray to find longest series based on True values. Is there an option to find longest series without looping through array?



I've already wrote my own solution with numpy.nonzero, but there is probably better one.



import numpy as np
arr = np.array([[[1,2,3,4,5],
[6,7,8,9,10],
[11,12,13,14,15],
[16,17,18,19,20],
[21,22,23,24,25]],
[[True,True,True,False,True],
[True,True,True,True,False],
[True,True,False,True,True],
[True,True,True,False,True],
[True,True,True,False,True]]])

def getIndices(arr):
arr_to_search = np.nonzero(arr)
arrs =
prev_el0 = 0
prev_el1 = -1
activ_long =
for i in range(len(arr_to_search[0])):
if arr_to_search[0][i] == prev_el0:
if arr_to_search[1][i] != prev_el1 + 1:
arrs.append(activ_long)
activ_long =
else:
arrs.append(activ_long)
activ_long =
activ_long.append((arr_to_search[0][i],arr_to_search[1][i]))
prev_el0 = arr_to_search[0][i]
prev_el1 = arr_to_search[1][i]

max_len = len(max(arrs,key=len))
longest_arr_list = [a for a in arrs if len(a) == max_len]
return longest_arr_list

print(getIndices(arr[1,:,:]))
print(getIndices(arr[1,:,:].T))


[[(1, 0), (1, 1), (1, 2), (1, 3)]]
[[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)], [(1, 0), (1, 1), (1, 2), (1, 3), (1, 4)]]









share|improve this question























  • If your code works, might be more appropriate for code review

    – user3483203
    Dec 30 '18 at 15:13











  • What do you mean with find longest series based on True values ? What is the expected output?

    – yatu
    Dec 30 '18 at 15:14













  • By series i mean longest subarray where value is True, like in example below: [True,True,True,False,True] [True,True,True,True,False] [True,True,False,True,True] [True,True,True,False,True] [True,True,True,False,True] here, longest series is in col 0 and col 1, 5 x True, False is breaking series

    – grudzien
    Dec 30 '18 at 15:25
















1












1








1








I search my ndArray to find longest series based on True values. Is there an option to find longest series without looping through array?



I've already wrote my own solution with numpy.nonzero, but there is probably better one.



import numpy as np
arr = np.array([[[1,2,3,4,5],
[6,7,8,9,10],
[11,12,13,14,15],
[16,17,18,19,20],
[21,22,23,24,25]],
[[True,True,True,False,True],
[True,True,True,True,False],
[True,True,False,True,True],
[True,True,True,False,True],
[True,True,True,False,True]]])

def getIndices(arr):
arr_to_search = np.nonzero(arr)
arrs =
prev_el0 = 0
prev_el1 = -1
activ_long =
for i in range(len(arr_to_search[0])):
if arr_to_search[0][i] == prev_el0:
if arr_to_search[1][i] != prev_el1 + 1:
arrs.append(activ_long)
activ_long =
else:
arrs.append(activ_long)
activ_long =
activ_long.append((arr_to_search[0][i],arr_to_search[1][i]))
prev_el0 = arr_to_search[0][i]
prev_el1 = arr_to_search[1][i]

max_len = len(max(arrs,key=len))
longest_arr_list = [a for a in arrs if len(a) == max_len]
return longest_arr_list

print(getIndices(arr[1,:,:]))
print(getIndices(arr[1,:,:].T))


[[(1, 0), (1, 1), (1, 2), (1, 3)]]
[[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)], [(1, 0), (1, 1), (1, 2), (1, 3), (1, 4)]]









share|improve this question














I search my ndArray to find longest series based on True values. Is there an option to find longest series without looping through array?



I've already wrote my own solution with numpy.nonzero, but there is probably better one.



import numpy as np
arr = np.array([[[1,2,3,4,5],
[6,7,8,9,10],
[11,12,13,14,15],
[16,17,18,19,20],
[21,22,23,24,25]],
[[True,True,True,False,True],
[True,True,True,True,False],
[True,True,False,True,True],
[True,True,True,False,True],
[True,True,True,False,True]]])

def getIndices(arr):
arr_to_search = np.nonzero(arr)
arrs =
prev_el0 = 0
prev_el1 = -1
activ_long =
for i in range(len(arr_to_search[0])):
if arr_to_search[0][i] == prev_el0:
if arr_to_search[1][i] != prev_el1 + 1:
arrs.append(activ_long)
activ_long =
else:
arrs.append(activ_long)
activ_long =
activ_long.append((arr_to_search[0][i],arr_to_search[1][i]))
prev_el0 = arr_to_search[0][i]
prev_el1 = arr_to_search[1][i]

max_len = len(max(arrs,key=len))
longest_arr_list = [a for a in arrs if len(a) == max_len]
return longest_arr_list

print(getIndices(arr[1,:,:]))
print(getIndices(arr[1,:,:].T))


[[(1, 0), (1, 1), (1, 2), (1, 3)]]
[[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)], [(1, 0), (1, 1), (1, 2), (1, 3), (1, 4)]]






python numpy multidimensional-array






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 30 '18 at 15:05









grudziengrudzien

61




61













  • If your code works, might be more appropriate for code review

    – user3483203
    Dec 30 '18 at 15:13











  • What do you mean with find longest series based on True values ? What is the expected output?

    – yatu
    Dec 30 '18 at 15:14













  • By series i mean longest subarray where value is True, like in example below: [True,True,True,False,True] [True,True,True,True,False] [True,True,False,True,True] [True,True,True,False,True] [True,True,True,False,True] here, longest series is in col 0 and col 1, 5 x True, False is breaking series

    – grudzien
    Dec 30 '18 at 15:25





















  • If your code works, might be more appropriate for code review

    – user3483203
    Dec 30 '18 at 15:13











  • What do you mean with find longest series based on True values ? What is the expected output?

    – yatu
    Dec 30 '18 at 15:14













  • By series i mean longest subarray where value is True, like in example below: [True,True,True,False,True] [True,True,True,True,False] [True,True,False,True,True] [True,True,True,False,True] [True,True,True,False,True] here, longest series is in col 0 and col 1, 5 x True, False is breaking series

    – grudzien
    Dec 30 '18 at 15:25



















If your code works, might be more appropriate for code review

– user3483203
Dec 30 '18 at 15:13





If your code works, might be more appropriate for code review

– user3483203
Dec 30 '18 at 15:13













What do you mean with find longest series based on True values ? What is the expected output?

– yatu
Dec 30 '18 at 15:14







What do you mean with find longest series based on True values ? What is the expected output?

– yatu
Dec 30 '18 at 15:14















By series i mean longest subarray where value is True, like in example below: [True,True,True,False,True] [True,True,True,True,False] [True,True,False,True,True] [True,True,True,False,True] [True,True,True,False,True] here, longest series is in col 0 and col 1, 5 x True, False is breaking series

– grudzien
Dec 30 '18 at 15:25







By series i mean longest subarray where value is True, like in example below: [True,True,True,False,True] [True,True,True,True,False] [True,True,False,True,True] [True,True,True,False,True] [True,True,True,False,True] here, longest series is in col 0 and col 1, 5 x True, False is breaking series

– grudzien
Dec 30 '18 at 15:25














1 Answer
1






active

oldest

votes


















0














Here is a numpy solution which avoids explicit loops based on this previous question.



I'm assuming the boolean array is named a. Essentially we find the indices for where rows change from 0 to 1 or from 1 to 0 and look at the difference between these. By padding 0s on the frong and back, we ensure that for every transition from a 0 to a 1, there is another transition from a 1 to a 0.



For convenience I process a and a.T at the same time, but you could do them separately if you want.



m,n = a.shape
A = np.zeros((2*m,n+2))
A[:m,1:-1] = a
A[m:,1:-1] = a.T

dA = np.diff(A)

start = np.where(dA>0)
end = np.where(dA<0)

argmax_run = np.argmax(end[1]-start[1])

row = start[0][argmax_run]
col_start = start[1][argmax_run]
col_end= end[1][argmax_run]-1

max_len = col_end - col_start + 1

print('max run of length {}'.format(max_len))
print('in '+('row' if row<m else'col')+' {}'.format(row%m)+' from '+('col' if row<m else'row')+' {} to {}'.format(col_start,col_end))


To improve performance and storage we can change A to a boolean array. Since the -1 and 1 in dA above always come in pairs, we can find start and end as below.



nz = np.nonzero(dA)
start = (nz[0][::2], nz[1][::2])
end = (nz[0][1::2], nz[1][1::2])


Note that you can then remove the variables start and end entirely as they aren't really needed.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53978707%2fmore-efficient-solution-to-find-longest-series-based-on-boolean-in-numpy-ndarray%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Here is a numpy solution which avoids explicit loops based on this previous question.



    I'm assuming the boolean array is named a. Essentially we find the indices for where rows change from 0 to 1 or from 1 to 0 and look at the difference between these. By padding 0s on the frong and back, we ensure that for every transition from a 0 to a 1, there is another transition from a 1 to a 0.



    For convenience I process a and a.T at the same time, but you could do them separately if you want.



    m,n = a.shape
    A = np.zeros((2*m,n+2))
    A[:m,1:-1] = a
    A[m:,1:-1] = a.T

    dA = np.diff(A)

    start = np.where(dA>0)
    end = np.where(dA<0)

    argmax_run = np.argmax(end[1]-start[1])

    row = start[0][argmax_run]
    col_start = start[1][argmax_run]
    col_end= end[1][argmax_run]-1

    max_len = col_end - col_start + 1

    print('max run of length {}'.format(max_len))
    print('in '+('row' if row<m else'col')+' {}'.format(row%m)+' from '+('col' if row<m else'row')+' {} to {}'.format(col_start,col_end))


    To improve performance and storage we can change A to a boolean array. Since the -1 and 1 in dA above always come in pairs, we can find start and end as below.



    nz = np.nonzero(dA)
    start = (nz[0][::2], nz[1][::2])
    end = (nz[0][1::2], nz[1][1::2])


    Note that you can then remove the variables start and end entirely as they aren't really needed.






    share|improve this answer




























      0














      Here is a numpy solution which avoids explicit loops based on this previous question.



      I'm assuming the boolean array is named a. Essentially we find the indices for where rows change from 0 to 1 or from 1 to 0 and look at the difference between these. By padding 0s on the frong and back, we ensure that for every transition from a 0 to a 1, there is another transition from a 1 to a 0.



      For convenience I process a and a.T at the same time, but you could do them separately if you want.



      m,n = a.shape
      A = np.zeros((2*m,n+2))
      A[:m,1:-1] = a
      A[m:,1:-1] = a.T

      dA = np.diff(A)

      start = np.where(dA>0)
      end = np.where(dA<0)

      argmax_run = np.argmax(end[1]-start[1])

      row = start[0][argmax_run]
      col_start = start[1][argmax_run]
      col_end= end[1][argmax_run]-1

      max_len = col_end - col_start + 1

      print('max run of length {}'.format(max_len))
      print('in '+('row' if row<m else'col')+' {}'.format(row%m)+' from '+('col' if row<m else'row')+' {} to {}'.format(col_start,col_end))


      To improve performance and storage we can change A to a boolean array. Since the -1 and 1 in dA above always come in pairs, we can find start and end as below.



      nz = np.nonzero(dA)
      start = (nz[0][::2], nz[1][::2])
      end = (nz[0][1::2], nz[1][1::2])


      Note that you can then remove the variables start and end entirely as they aren't really needed.






      share|improve this answer


























        0












        0








        0







        Here is a numpy solution which avoids explicit loops based on this previous question.



        I'm assuming the boolean array is named a. Essentially we find the indices for where rows change from 0 to 1 or from 1 to 0 and look at the difference between these. By padding 0s on the frong and back, we ensure that for every transition from a 0 to a 1, there is another transition from a 1 to a 0.



        For convenience I process a and a.T at the same time, but you could do them separately if you want.



        m,n = a.shape
        A = np.zeros((2*m,n+2))
        A[:m,1:-1] = a
        A[m:,1:-1] = a.T

        dA = np.diff(A)

        start = np.where(dA>0)
        end = np.where(dA<0)

        argmax_run = np.argmax(end[1]-start[1])

        row = start[0][argmax_run]
        col_start = start[1][argmax_run]
        col_end= end[1][argmax_run]-1

        max_len = col_end - col_start + 1

        print('max run of length {}'.format(max_len))
        print('in '+('row' if row<m else'col')+' {}'.format(row%m)+' from '+('col' if row<m else'row')+' {} to {}'.format(col_start,col_end))


        To improve performance and storage we can change A to a boolean array. Since the -1 and 1 in dA above always come in pairs, we can find start and end as below.



        nz = np.nonzero(dA)
        start = (nz[0][::2], nz[1][::2])
        end = (nz[0][1::2], nz[1][1::2])


        Note that you can then remove the variables start and end entirely as they aren't really needed.






        share|improve this answer













        Here is a numpy solution which avoids explicit loops based on this previous question.



        I'm assuming the boolean array is named a. Essentially we find the indices for where rows change from 0 to 1 or from 1 to 0 and look at the difference between these. By padding 0s on the frong and back, we ensure that for every transition from a 0 to a 1, there is another transition from a 1 to a 0.



        For convenience I process a and a.T at the same time, but you could do them separately if you want.



        m,n = a.shape
        A = np.zeros((2*m,n+2))
        A[:m,1:-1] = a
        A[m:,1:-1] = a.T

        dA = np.diff(A)

        start = np.where(dA>0)
        end = np.where(dA<0)

        argmax_run = np.argmax(end[1]-start[1])

        row = start[0][argmax_run]
        col_start = start[1][argmax_run]
        col_end= end[1][argmax_run]-1

        max_len = col_end - col_start + 1

        print('max run of length {}'.format(max_len))
        print('in '+('row' if row<m else'col')+' {}'.format(row%m)+' from '+('col' if row<m else'row')+' {} to {}'.format(col_start,col_end))


        To improve performance and storage we can change A to a boolean array. Since the -1 and 1 in dA above always come in pairs, we can find start and end as below.



        nz = np.nonzero(dA)
        start = (nz[0][::2], nz[1][::2])
        end = (nz[0][1::2], nz[1][1::2])


        Note that you can then remove the variables start and end entirely as they aren't really needed.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 30 '18 at 16:21









        tchtch

        42525




        42525






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53978707%2fmore-efficient-solution-to-find-longest-series-based-on-boolean-in-numpy-ndarray%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas