Efficiently combine (AND) groups of columns in a logical matrix












3















I am looking for an efficient way to combine selected columns in a logical matrix by "ANDing" them together and ending up with a new matrix. An example of what I am looking for:



matrixData <- rep(c(TRUE, TRUE, FALSE), 8)
exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)
exampleMatrix
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE FALSE TRUE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE TRUE TRUE FALSE
[4,] TRUE TRUE FALSE TRUE
[5,] TRUE FALSE TRUE TRUE
[6,] FALSE TRUE TRUE FALSE


The columns to be ANDed to each other are specified in a numeric vector of length ncol(exampleMatrix), where the columns to be grouped together ANDed have the same value (a value from 1 to n, where n <= ncol(exampleMatrix) and every value in 1:n is used at least once). The resulting matrix should have the columns in order from 1:n. For example, if the vector that specifies the column groups is



colGroups <- c(3, 2, 2, 1)


Then the resulting matrix would be



      [,1]  [,2]  [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
[5,] TRUE FALSE TRUE
[6,] FALSE TRUE FALSE


Where in the resulting matrix



[,1] = exampleMatrix[,4] 
[,2] = exampleMatrix[,2] & exampleMatrix[,3]
[,3] = exampleMatrix[,1]


My current way of doing this looks basically like this:



finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)
for (i in 1:3){
selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]
finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)
}


Where rowSums(selectedColumns)==ncol(selectedColumns) is an efficient way to AND all of the columns of a matrix together.



My problem is that I am doing this on very big matrices (millions of rows) and I am looking for any way to make this quicker. My first instinct would be to use apply in some way but I can't see any way to use that to improve efficiency as I am not performing the operation in the for loop many times but instead it is the operation in the loop that is slow.



In addition, any tips to reduce memory allocation would be very useful, as I currently have to run gc() within the loop frequently to avoid running out of memory completely, and it is a very expensive operation that significantly slows everything down as well. Thanks!



For a more representative example, this is a much larger exampleMatrix:



matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)
exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)









share|improve this question




















  • 1





    @Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question

    – Walker in the City
    Jan 3 at 0:19
















3















I am looking for an efficient way to combine selected columns in a logical matrix by "ANDing" them together and ending up with a new matrix. An example of what I am looking for:



matrixData <- rep(c(TRUE, TRUE, FALSE), 8)
exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)
exampleMatrix
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE FALSE TRUE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE TRUE TRUE FALSE
[4,] TRUE TRUE FALSE TRUE
[5,] TRUE FALSE TRUE TRUE
[6,] FALSE TRUE TRUE FALSE


The columns to be ANDed to each other are specified in a numeric vector of length ncol(exampleMatrix), where the columns to be grouped together ANDed have the same value (a value from 1 to n, where n <= ncol(exampleMatrix) and every value in 1:n is used at least once). The resulting matrix should have the columns in order from 1:n. For example, if the vector that specifies the column groups is



colGroups <- c(3, 2, 2, 1)


Then the resulting matrix would be



      [,1]  [,2]  [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
[5,] TRUE FALSE TRUE
[6,] FALSE TRUE FALSE


Where in the resulting matrix



[,1] = exampleMatrix[,4] 
[,2] = exampleMatrix[,2] & exampleMatrix[,3]
[,3] = exampleMatrix[,1]


My current way of doing this looks basically like this:



finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)
for (i in 1:3){
selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]
finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)
}


Where rowSums(selectedColumns)==ncol(selectedColumns) is an efficient way to AND all of the columns of a matrix together.



My problem is that I am doing this on very big matrices (millions of rows) and I am looking for any way to make this quicker. My first instinct would be to use apply in some way but I can't see any way to use that to improve efficiency as I am not performing the operation in the for loop many times but instead it is the operation in the loop that is slow.



In addition, any tips to reduce memory allocation would be very useful, as I currently have to run gc() within the loop frequently to avoid running out of memory completely, and it is a very expensive operation that significantly slows everything down as well. Thanks!



For a more representative example, this is a much larger exampleMatrix:



matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)
exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)









share|improve this question




















  • 1





    @Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question

    – Walker in the City
    Jan 3 at 0:19














3












3








3








I am looking for an efficient way to combine selected columns in a logical matrix by "ANDing" them together and ending up with a new matrix. An example of what I am looking for:



matrixData <- rep(c(TRUE, TRUE, FALSE), 8)
exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)
exampleMatrix
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE FALSE TRUE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE TRUE TRUE FALSE
[4,] TRUE TRUE FALSE TRUE
[5,] TRUE FALSE TRUE TRUE
[6,] FALSE TRUE TRUE FALSE


The columns to be ANDed to each other are specified in a numeric vector of length ncol(exampleMatrix), where the columns to be grouped together ANDed have the same value (a value from 1 to n, where n <= ncol(exampleMatrix) and every value in 1:n is used at least once). The resulting matrix should have the columns in order from 1:n. For example, if the vector that specifies the column groups is



colGroups <- c(3, 2, 2, 1)


Then the resulting matrix would be



      [,1]  [,2]  [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
[5,] TRUE FALSE TRUE
[6,] FALSE TRUE FALSE


Where in the resulting matrix



[,1] = exampleMatrix[,4] 
[,2] = exampleMatrix[,2] & exampleMatrix[,3]
[,3] = exampleMatrix[,1]


My current way of doing this looks basically like this:



finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)
for (i in 1:3){
selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]
finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)
}


Where rowSums(selectedColumns)==ncol(selectedColumns) is an efficient way to AND all of the columns of a matrix together.



My problem is that I am doing this on very big matrices (millions of rows) and I am looking for any way to make this quicker. My first instinct would be to use apply in some way but I can't see any way to use that to improve efficiency as I am not performing the operation in the for loop many times but instead it is the operation in the loop that is slow.



In addition, any tips to reduce memory allocation would be very useful, as I currently have to run gc() within the loop frequently to avoid running out of memory completely, and it is a very expensive operation that significantly slows everything down as well. Thanks!



For a more representative example, this is a much larger exampleMatrix:



matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)
exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)









share|improve this question
















I am looking for an efficient way to combine selected columns in a logical matrix by "ANDing" them together and ending up with a new matrix. An example of what I am looking for:



matrixData <- rep(c(TRUE, TRUE, FALSE), 8)
exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)
exampleMatrix
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE FALSE TRUE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE TRUE TRUE FALSE
[4,] TRUE TRUE FALSE TRUE
[5,] TRUE FALSE TRUE TRUE
[6,] FALSE TRUE TRUE FALSE


The columns to be ANDed to each other are specified in a numeric vector of length ncol(exampleMatrix), where the columns to be grouped together ANDed have the same value (a value from 1 to n, where n <= ncol(exampleMatrix) and every value in 1:n is used at least once). The resulting matrix should have the columns in order from 1:n. For example, if the vector that specifies the column groups is



colGroups <- c(3, 2, 2, 1)


Then the resulting matrix would be



      [,1]  [,2]  [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
[5,] TRUE FALSE TRUE
[6,] FALSE TRUE FALSE


Where in the resulting matrix



[,1] = exampleMatrix[,4] 
[,2] = exampleMatrix[,2] & exampleMatrix[,3]
[,3] = exampleMatrix[,1]


My current way of doing this looks basically like this:



finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)
for (i in 1:3){
selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]
finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)
}


Where rowSums(selectedColumns)==ncol(selectedColumns) is an efficient way to AND all of the columns of a matrix together.



My problem is that I am doing this on very big matrices (millions of rows) and I am looking for any way to make this quicker. My first instinct would be to use apply in some way but I can't see any way to use that to improve efficiency as I am not performing the operation in the for loop many times but instead it is the operation in the loop that is slow.



In addition, any tips to reduce memory allocation would be very useful, as I currently have to run gc() within the loop frequently to avoid running out of memory completely, and it is a very expensive operation that significantly slows everything down as well. Thanks!



For a more representative example, this is a much larger exampleMatrix:



matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)
exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)






r matrix






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 3 at 0:54









Henrik

42.1k994110




42.1k994110










asked Jan 2 at 22:52









Walker in the CityWalker in the City

113112




113112








  • 1





    @Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question

    – Walker in the City
    Jan 3 at 0:19














  • 1





    @Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question

    – Walker in the City
    Jan 3 at 0:19








1




1





@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question

– Walker in the City
Jan 3 at 0:19





@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question

– Walker in the City
Jan 3 at 0:19












2 Answers
2






active

oldest

votes


















4














From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups (30% improvement over your suggestion):



for (jj in seq_along(colGroups)) 
finalMatrix[ , colGroups[jj]] =
finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]


I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).






share|improve this answer































    2














    As far as I can tell, this is an aggregation across columns using the all function. So if you transpose to rows, then use colGroups as the grouping factor to apply all, then transpose back to columns, you should get the intended result:



    t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])

    # [,1] [,2] [,3]
    #V1 TRUE FALSE TRUE
    #V2 TRUE FALSE TRUE
    #V3 FALSE TRUE FALSE
    #V4 TRUE FALSE TRUE
    #V5 TRUE FALSE TRUE
    #V6 FALSE TRUE FALSE


    The [-1] just drops the group-identifier variable which you don't require in the final output.



    If you're working with stupid big data, the by-group aggregation could be done in data.table as well:



    library(data.table)
    t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])





    share|improve this answer





















    • 1





      This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

      – Walker in the City
      Jan 3 at 0:00











    • @WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

      – thelatemail
      Jan 3 at 0:07













    • I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

      – Walker in the City
      Jan 3 at 0:25






    • 1





      @WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

      – thelatemail
      Jan 3 at 0:32











    • I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

      – Walker in the City
      Jan 3 at 0:55











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014215%2fefficiently-combine-and-groups-of-columns-in-a-logical-matrix%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4














    From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups (30% improvement over your suggestion):



    for (jj in seq_along(colGroups)) 
    finalMatrix[ , colGroups[jj]] =
    finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]


    I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).






    share|improve this answer




























      4














      From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups (30% improvement over your suggestion):



      for (jj in seq_along(colGroups)) 
      finalMatrix[ , colGroups[jj]] =
      finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]


      I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).






      share|improve this answer


























        4












        4








        4







        From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups (30% improvement over your suggestion):



        for (jj in seq_along(colGroups)) 
        finalMatrix[ , colGroups[jj]] =
        finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]


        I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).






        share|improve this answer













        From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups (30% improvement over your suggestion):



        for (jj in seq_along(colGroups)) 
        finalMatrix[ , colGroups[jj]] =
        finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]


        I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 3 at 1:41









        MichaelChiricoMichaelChirico

        20.5k863117




        20.5k863117

























            2














            As far as I can tell, this is an aggregation across columns using the all function. So if you transpose to rows, then use colGroups as the grouping factor to apply all, then transpose back to columns, you should get the intended result:



            t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])

            # [,1] [,2] [,3]
            #V1 TRUE FALSE TRUE
            #V2 TRUE FALSE TRUE
            #V3 FALSE TRUE FALSE
            #V4 TRUE FALSE TRUE
            #V5 TRUE FALSE TRUE
            #V6 FALSE TRUE FALSE


            The [-1] just drops the group-identifier variable which you don't require in the final output.



            If you're working with stupid big data, the by-group aggregation could be done in data.table as well:



            library(data.table)
            t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])





            share|improve this answer





















            • 1





              This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

              – Walker in the City
              Jan 3 at 0:00











            • @WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

              – thelatemail
              Jan 3 at 0:07













            • I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

              – Walker in the City
              Jan 3 at 0:25






            • 1





              @WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

              – thelatemail
              Jan 3 at 0:32











            • I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

              – Walker in the City
              Jan 3 at 0:55
















            2














            As far as I can tell, this is an aggregation across columns using the all function. So if you transpose to rows, then use colGroups as the grouping factor to apply all, then transpose back to columns, you should get the intended result:



            t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])

            # [,1] [,2] [,3]
            #V1 TRUE FALSE TRUE
            #V2 TRUE FALSE TRUE
            #V3 FALSE TRUE FALSE
            #V4 TRUE FALSE TRUE
            #V5 TRUE FALSE TRUE
            #V6 FALSE TRUE FALSE


            The [-1] just drops the group-identifier variable which you don't require in the final output.



            If you're working with stupid big data, the by-group aggregation could be done in data.table as well:



            library(data.table)
            t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])





            share|improve this answer





















            • 1





              This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

              – Walker in the City
              Jan 3 at 0:00











            • @WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

              – thelatemail
              Jan 3 at 0:07













            • I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

              – Walker in the City
              Jan 3 at 0:25






            • 1





              @WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

              – thelatemail
              Jan 3 at 0:32











            • I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

              – Walker in the City
              Jan 3 at 0:55














            2












            2








            2







            As far as I can tell, this is an aggregation across columns using the all function. So if you transpose to rows, then use colGroups as the grouping factor to apply all, then transpose back to columns, you should get the intended result:



            t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])

            # [,1] [,2] [,3]
            #V1 TRUE FALSE TRUE
            #V2 TRUE FALSE TRUE
            #V3 FALSE TRUE FALSE
            #V4 TRUE FALSE TRUE
            #V5 TRUE FALSE TRUE
            #V6 FALSE TRUE FALSE


            The [-1] just drops the group-identifier variable which you don't require in the final output.



            If you're working with stupid big data, the by-group aggregation could be done in data.table as well:



            library(data.table)
            t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])





            share|improve this answer















            As far as I can tell, this is an aggregation across columns using the all function. So if you transpose to rows, then use colGroups as the grouping factor to apply all, then transpose back to columns, you should get the intended result:



            t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])

            # [,1] [,2] [,3]
            #V1 TRUE FALSE TRUE
            #V2 TRUE FALSE TRUE
            #V3 FALSE TRUE FALSE
            #V4 TRUE FALSE TRUE
            #V5 TRUE FALSE TRUE
            #V6 FALSE TRUE FALSE


            The [-1] just drops the group-identifier variable which you don't require in the final output.



            If you're working with stupid big data, the by-group aggregation could be done in data.table as well:



            library(data.table)
            t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jan 2 at 23:43

























            answered Jan 2 at 23:27









            thelatemailthelatemail

            68k883151




            68k883151








            • 1





              This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

              – Walker in the City
              Jan 3 at 0:00











            • @WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

              – thelatemail
              Jan 3 at 0:07













            • I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

              – Walker in the City
              Jan 3 at 0:25






            • 1





              @WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

              – thelatemail
              Jan 3 at 0:32











            • I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

              – Walker in the City
              Jan 3 at 0:55














            • 1





              This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

              – Walker in the City
              Jan 3 at 0:00











            • @WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

              – thelatemail
              Jan 3 at 0:07













            • I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

              – Walker in the City
              Jan 3 at 0:25






            • 1





              @WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

              – thelatemail
              Jan 3 at 0:32











            • I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

              – Walker in the City
              Jan 3 at 0:55








            1




            1





            This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

            – Walker in the City
            Jan 3 at 0:00





            This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

            – Walker in the City
            Jan 3 at 0:00













            @WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

            – thelatemail
            Jan 3 at 0:07







            @WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

            – thelatemail
            Jan 3 at 0:07















            I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

            – Walker in the City
            Jan 3 at 0:25





            I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

            – Walker in the City
            Jan 3 at 0:25




            1




            1





            @WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

            – thelatemail
            Jan 3 at 0:32





            @WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

            – thelatemail
            Jan 3 at 0:32













            I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

            – Walker in the City
            Jan 3 at 0:55





            I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

            – Walker in the City
            Jan 3 at 0:55


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014215%2fefficiently-combine-and-groups-of-columns-in-a-logical-matrix%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas