Efficiently combine (AND) groups of columns in a logical matrix

I am looking for an efficient way to combine selected columns in a logical matrix by "ANDing" them together and ending up with a new matrix. An example of what I am looking for:

matrixData <- rep(c(TRUE, TRUE, FALSE), 8)

exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)

exampleMatrix

      [,1]  [,2]  [,3]  [,4]

[1,]  TRUE  TRUE FALSE  TRUE

[2,]  TRUE FALSE  TRUE  TRUE

[3,] FALSE  TRUE  TRUE FALSE

[4,]  TRUE  TRUE FALSE  TRUE

[5,]  TRUE FALSE  TRUE  TRUE

[6,] FALSE  TRUE  TRUE FALSE

The columns to be ANDed to each other are specified in a numeric vector of length ncol(exampleMatrix), where the columns to be grouped together ANDed have the same value (a value from 1 to n, where n <= ncol(exampleMatrix) and every value in 1:n is used at least once). The resulting matrix should have the columns in order from 1:n. For example, if the vector that specifies the column groups is

colGroups <- c(3, 2, 2, 1)

Then the resulting matrix would be

      [,1]  [,2]  [,3]

[1,]  TRUE FALSE  TRUE

[2,]  TRUE FALSE  TRUE

[3,] FALSE  TRUE FALSE

[4,]  TRUE FALSE  TRUE

[5,]  TRUE FALSE  TRUE

[6,] FALSE  TRUE FALSE

Where in the resulting matrix

[,1] = exampleMatrix[,4] 

[,2] = exampleMatrix[,2] & exampleMatrix[,3]

[,3] = exampleMatrix[,1]

My current way of doing this looks basically like this:

finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)

for (i in 1:3){

    selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]

    finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)

}

Where rowSums(selectedColumns)==ncol(selectedColumns) is an efficient way to AND all of the columns of a matrix together.

My problem is that I am doing this on very big matrices (millions of rows) and I am looking for any way to make this quicker. My first instinct would be to use apply in some way but I can't see any way to use that to improve efficiency as I am not performing the operation in the for loop many times but instead it is the operation in the loop that is slow.

In addition, any tips to reduce memory allocation would be very useful, as I currently have to run gc() within the loop frequently to avoid running out of memory completely, and it is a very expensive operation that significantly slows everything down as well. Thanks!

For a more representative example, this is a much larger exampleMatrix:

matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)

exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)

edited Jan 3 at 0:54

Henrik

42.1k994110

asked Jan 2 at 22:52

Walker in the City

113112

1

@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question

– Walker in the City
Jan 3 at 0:19

add a comment |

I am looking for an efficient way to combine selected columns in a logical matrix by "ANDing" them together and ending up with a new matrix. An example of what I am looking for:

matrixData <- rep(c(TRUE, TRUE, FALSE), 8)

exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)

exampleMatrix

      [,1]  [,2]  [,3]  [,4]

[1,]  TRUE  TRUE FALSE  TRUE

[2,]  TRUE FALSE  TRUE  TRUE

[3,] FALSE  TRUE  TRUE FALSE

[4,]  TRUE  TRUE FALSE  TRUE

[5,]  TRUE FALSE  TRUE  TRUE

[6,] FALSE  TRUE  TRUE FALSE

colGroups <- c(3, 2, 2, 1)

Then the resulting matrix would be

      [,1]  [,2]  [,3]

[1,]  TRUE FALSE  TRUE

[2,]  TRUE FALSE  TRUE

[3,] FALSE  TRUE FALSE

[4,]  TRUE FALSE  TRUE

[5,]  TRUE FALSE  TRUE

[6,] FALSE  TRUE FALSE

Where in the resulting matrix

[,1] = exampleMatrix[,4] 

[,2] = exampleMatrix[,2] & exampleMatrix[,3]

[,3] = exampleMatrix[,1]

My current way of doing this looks basically like this:

finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)

for (i in 1:3){

    selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]

    finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)

}

Where rowSums(selectedColumns)==ncol(selectedColumns) is an efficient way to AND all of the columns of a matrix together.

For a more representative example, this is a much larger exampleMatrix:

matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)

exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)

edited Jan 3 at 0:54

Henrik

42.1k994110

asked Jan 2 at 22:52

Walker in the City

113112

1

@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question

– Walker in the City
Jan 3 at 0:19

add a comment |

I am looking for an efficient way to combine selected columns in a logical matrix by "ANDing" them together and ending up with a new matrix. An example of what I am looking for:

matrixData <- rep(c(TRUE, TRUE, FALSE), 8)

exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)

exampleMatrix

      [,1]  [,2]  [,3]  [,4]

[1,]  TRUE  TRUE FALSE  TRUE

[2,]  TRUE FALSE  TRUE  TRUE

[3,] FALSE  TRUE  TRUE FALSE

[4,]  TRUE  TRUE FALSE  TRUE

[5,]  TRUE FALSE  TRUE  TRUE

[6,] FALSE  TRUE  TRUE FALSE

colGroups <- c(3, 2, 2, 1)

Then the resulting matrix would be

      [,1]  [,2]  [,3]

[1,]  TRUE FALSE  TRUE

[2,]  TRUE FALSE  TRUE

[3,] FALSE  TRUE FALSE

[4,]  TRUE FALSE  TRUE

[5,]  TRUE FALSE  TRUE

[6,] FALSE  TRUE FALSE

Where in the resulting matrix

[,1] = exampleMatrix[,4] 

[,2] = exampleMatrix[,2] & exampleMatrix[,3]

[,3] = exampleMatrix[,1]

My current way of doing this looks basically like this:

finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)

for (i in 1:3){

    selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]

    finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)

}

Where rowSums(selectedColumns)==ncol(selectedColumns) is an efficient way to AND all of the columns of a matrix together.

For a more representative example, this is a much larger exampleMatrix:

matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)

exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)

edited Jan 3 at 0:54

Henrik

42.1k994110

asked Jan 2 at 22:52

Walker in the City

113112

I am looking for an efficient way to combine selected columns in a logical matrix by "ANDing" them together and ending up with a new matrix. An example of what I am looking for:

matrixData <- rep(c(TRUE, TRUE, FALSE), 8)

exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)

exampleMatrix

      [,1]  [,2]  [,3]  [,4]

[1,]  TRUE  TRUE FALSE  TRUE

[2,]  TRUE FALSE  TRUE  TRUE

[3,] FALSE  TRUE  TRUE FALSE

[4,]  TRUE  TRUE FALSE  TRUE

[5,]  TRUE FALSE  TRUE  TRUE

[6,] FALSE  TRUE  TRUE FALSE

colGroups <- c(3, 2, 2, 1)

Then the resulting matrix would be

      [,1]  [,2]  [,3]

[1,]  TRUE FALSE  TRUE

[2,]  TRUE FALSE  TRUE

[3,] FALSE  TRUE FALSE

[4,]  TRUE FALSE  TRUE

[5,]  TRUE FALSE  TRUE

[6,] FALSE  TRUE FALSE

Where in the resulting matrix

[,1] = exampleMatrix[,4] 

[,2] = exampleMatrix[,2] & exampleMatrix[,3]

[,3] = exampleMatrix[,1]

My current way of doing this looks basically like this:

finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)

for (i in 1:3){

    selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]

    finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)

}

Where rowSums(selectedColumns)==ncol(selectedColumns) is an efficient way to AND all of the columns of a matrix together.

For a more representative example, this is a much larger exampleMatrix:

matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)

exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)

r matrix

edited Jan 3 at 0:54

Henrik

42.1k994110

asked Jan 2 at 22:52

Walker in the City

113112

edited Jan 3 at 0:54

Henrik

42.1k994110

asked Jan 2 at 22:52

Walker in the City

113112

edited Jan 3 at 0:54

Henrik

42.1k994110

edited Jan 3 at 0:54

Henrik

42.1k994110

edited Jan 3 at 0:54

Henrik

42.1k994110

asked Jan 2 at 22:52

Walker in the City

113112

asked Jan 2 at 22:52

Walker in the City

113112

asked Jan 2 at 22:52

Walker in the City

113112

1

@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question

– Walker in the City
Jan 3 at 0:19

add a comment |

1

@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question

– Walker in the City
Jan 3 at 0:19

@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question

– Walker in the City
Jan 3 at 0:19

add a comment |

2 Answers
2

active

oldest

votes

From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups (30% improvement over your suggestion):

for (jj in seq_along(colGroups)) 

  finalMatrix[ , colGroups[jj]] = 

    finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]

I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).

answered Jan 3 at 1:41

MichaelChirico

20.5k863117

add a comment |

As far as I can tell, this is an aggregation across columns using the all function. So if you transpose to rows, then use colGroups as the grouping factor to apply all, then transpose back to columns, you should get the intended result:

t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])



#    [,1]  [,2]  [,3]

#V1  TRUE FALSE  TRUE

#V2  TRUE FALSE  TRUE

#V3 FALSE  TRUE FALSE

#V4  TRUE FALSE  TRUE

#V5  TRUE FALSE  TRUE

#V6 FALSE  TRUE FALSE

The [-1] just drops the group-identifier variable which you don't require in the final output.

If you're working with stupid big data, the by-group aggregation could be done in data.table as well:

library(data.table)

t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])

edited Jan 2 at 23:43

answered Jan 2 at 23:27

thelatemail

68k883151

1

This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

– Walker in the City
Jan 3 at 0:00

@WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

– thelatemail
Jan 3 at 0:07

I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

– Walker in the City
Jan 3 at 0:25

1

@WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

– thelatemail
Jan 3 at 0:32

I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

– Walker in the City
Jan 3 at 0:55

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014215%2fefficiently-combine-and-groups-of-columns-in-a-logical-matrix%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups (30% improvement over your suggestion):

for (jj in seq_along(colGroups)) 

  finalMatrix[ , colGroups[jj]] = 

    finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]

I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).

answered Jan 3 at 1:41

MichaelChirico

20.5k863117

add a comment |

From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups (30% improvement over your suggestion):

for (jj in seq_along(colGroups)) 

  finalMatrix[ , colGroups[jj]] = 

    finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]

I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).

answered Jan 3 at 1:41

MichaelChirico

20.5k863117

add a comment |

From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups (30% improvement over your suggestion):

for (jj in seq_along(colGroups)) 

  finalMatrix[ , colGroups[jj]] = 

    finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]

I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).

answered Jan 3 at 1:41

MichaelChirico

20.5k863117

From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups (30% improvement over your suggestion):

for (jj in seq_along(colGroups)) 

  finalMatrix[ , colGroups[jj]] = 

    finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]

I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).

answered Jan 3 at 1:41

MichaelChirico

20.5k863117

answered Jan 3 at 1:41

MichaelChirico

20.5k863117

answered Jan 3 at 1:41

MichaelChirico

20.5k863117

answered Jan 3 at 1:41

MichaelChirico

20.5k863117

add a comment |

t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])



#    [,1]  [,2]  [,3]

#V1  TRUE FALSE  TRUE

#V2  TRUE FALSE  TRUE

#V3 FALSE  TRUE FALSE

#V4  TRUE FALSE  TRUE

#V5  TRUE FALSE  TRUE

#V6 FALSE  TRUE FALSE

The [-1] just drops the group-identifier variable which you don't require in the final output.

If you're working with stupid big data, the by-group aggregation could be done in data.table as well:

library(data.table)

t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])

edited Jan 2 at 23:43

answered Jan 2 at 23:27

thelatemail

68k883151

1

This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

– Walker in the City
Jan 3 at 0:00

@WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

– thelatemail
Jan 3 at 0:07

I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

– Walker in the City
Jan 3 at 0:25

1

@WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

– thelatemail
Jan 3 at 0:32

I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

– Walker in the City
Jan 3 at 0:55

add a comment |

t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])



#    [,1]  [,2]  [,3]

#V1  TRUE FALSE  TRUE

#V2  TRUE FALSE  TRUE

#V3 FALSE  TRUE FALSE

#V4  TRUE FALSE  TRUE

#V5  TRUE FALSE  TRUE

#V6 FALSE  TRUE FALSE

The [-1] just drops the group-identifier variable which you don't require in the final output.

If you're working with stupid big data, the by-group aggregation could be done in data.table as well:

library(data.table)

t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])

edited Jan 2 at 23:43

answered Jan 2 at 23:27

thelatemail

68k883151

1

This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

– Walker in the City
Jan 3 at 0:00

@WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

– thelatemail
Jan 3 at 0:07

I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

– Walker in the City
Jan 3 at 0:25

1

@WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

– thelatemail
Jan 3 at 0:32

I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

– Walker in the City
Jan 3 at 0:55

add a comment |

t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])



#    [,1]  [,2]  [,3]

#V1  TRUE FALSE  TRUE

#V2  TRUE FALSE  TRUE

#V3 FALSE  TRUE FALSE

#V4  TRUE FALSE  TRUE

#V5  TRUE FALSE  TRUE

#V6 FALSE  TRUE FALSE

The [-1] just drops the group-identifier variable which you don't require in the final output.

If you're working with stupid big data, the by-group aggregation could be done in data.table as well:

library(data.table)

t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])

edited Jan 2 at 23:43

answered Jan 2 at 23:27

thelatemail

68k883151

t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])



#    [,1]  [,2]  [,3]

#V1  TRUE FALSE  TRUE

#V2  TRUE FALSE  TRUE

#V3 FALSE  TRUE FALSE

#V4  TRUE FALSE  TRUE

#V5  TRUE FALSE  TRUE

#V6 FALSE  TRUE FALSE

The [-1] just drops the group-identifier variable which you don't require in the final output.

If you're working with stupid big data, the by-group aggregation could be done in data.table as well:

library(data.table)

t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])

edited Jan 2 at 23:43

answered Jan 2 at 23:27

thelatemail

68k883151

edited Jan 2 at 23:43

answered Jan 2 at 23:27

thelatemail

68k883151

answered Jan 2 at 23:27

thelatemail

68k883151

answered Jan 2 at 23:27

thelatemail

68k883151

1

This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

– Walker in the City
Jan 3 at 0:00

@WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

– thelatemail
Jan 3 at 0:07

I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

– Walker in the City
Jan 3 at 0:25

1

@WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

– thelatemail
Jan 3 at 0:32

I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

– Walker in the City
Jan 3 at 0:55

add a comment |

1

This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

– Walker in the City
Jan 3 at 0:00

@WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

– thelatemail
Jan 3 at 0:07

I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

– Walker in the City
Jan 3 at 0:25

1

@WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

– thelatemail
Jan 3 at 0:32

I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

– Walker in the City
Jan 3 at 0:55

This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With nrow=6e7, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.

– Walker in the City
Jan 3 at 0:00

@WalkerintheCity - the issue would be the t for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.

– thelatemail
Jan 3 at 0:07

I am still running into a memory allocation issue after transposing exampleMatrix and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.

– Walker in the City
Jan 3 at 0:25

@WalkerintheCity - the more I think about this, the for loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.

– thelatemail
Jan 3 at 0:32

I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.

– Walker in the City
Jan 3 at 0:55

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk