Efficiently combine (AND) groups of columns in a logical matrix
I am looking for an efficient way to combine selected columns in a logical matrix by "AND
ing" them together and ending up with a new matrix. An example of what I am looking for:
matrixData <- rep(c(TRUE, TRUE, FALSE), 8)
exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)
exampleMatrix
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE FALSE TRUE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE TRUE TRUE FALSE
[4,] TRUE TRUE FALSE TRUE
[5,] TRUE FALSE TRUE TRUE
[6,] FALSE TRUE TRUE FALSE
The columns to be ANDed to each other are specified in a numeric vector of length ncol(exampleMatrix)
, where the columns to be grouped together ANDed have the same value (a value from 1
to n
, where n <= ncol(exampleMatrix)
and every value in 1:n
is used at least once). The resulting matrix should have the columns in order from 1:n
. For example, if the vector that specifies the column groups is
colGroups <- c(3, 2, 2, 1)
Then the resulting matrix would be
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
[5,] TRUE FALSE TRUE
[6,] FALSE TRUE FALSE
Where in the resulting matrix
[,1] = exampleMatrix[,4]
[,2] = exampleMatrix[,2] & exampleMatrix[,3]
[,3] = exampleMatrix[,1]
My current way of doing this looks basically like this:
finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)
for (i in 1:3){
selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]
finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)
}
Where rowSums(selectedColumns)==ncol(selectedColumns)
is an efficient way to AND all of the columns of a matrix together.
My problem is that I am doing this on very big matrices (millions of rows) and I am looking for any way to make this quicker. My first instinct would be to use apply
in some way but I can't see any way to use that to improve efficiency as I am not performing the operation in the for
loop many times but instead it is the operation in the loop that is slow.
In addition, any tips to reduce memory allocation would be very useful, as I currently have to run gc()
within the loop frequently to avoid running out of memory completely, and it is a very expensive operation that significantly slows everything down as well. Thanks!
For a more representative example, this is a much larger exampleMatrix
:
matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)
exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)
r matrix
add a comment |
I am looking for an efficient way to combine selected columns in a logical matrix by "AND
ing" them together and ending up with a new matrix. An example of what I am looking for:
matrixData <- rep(c(TRUE, TRUE, FALSE), 8)
exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)
exampleMatrix
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE FALSE TRUE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE TRUE TRUE FALSE
[4,] TRUE TRUE FALSE TRUE
[5,] TRUE FALSE TRUE TRUE
[6,] FALSE TRUE TRUE FALSE
The columns to be ANDed to each other are specified in a numeric vector of length ncol(exampleMatrix)
, where the columns to be grouped together ANDed have the same value (a value from 1
to n
, where n <= ncol(exampleMatrix)
and every value in 1:n
is used at least once). The resulting matrix should have the columns in order from 1:n
. For example, if the vector that specifies the column groups is
colGroups <- c(3, 2, 2, 1)
Then the resulting matrix would be
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
[5,] TRUE FALSE TRUE
[6,] FALSE TRUE FALSE
Where in the resulting matrix
[,1] = exampleMatrix[,4]
[,2] = exampleMatrix[,2] & exampleMatrix[,3]
[,3] = exampleMatrix[,1]
My current way of doing this looks basically like this:
finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)
for (i in 1:3){
selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]
finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)
}
Where rowSums(selectedColumns)==ncol(selectedColumns)
is an efficient way to AND all of the columns of a matrix together.
My problem is that I am doing this on very big matrices (millions of rows) and I am looking for any way to make this quicker. My first instinct would be to use apply
in some way but I can't see any way to use that to improve efficiency as I am not performing the operation in the for
loop many times but instead it is the operation in the loop that is slow.
In addition, any tips to reduce memory allocation would be very useful, as I currently have to run gc()
within the loop frequently to avoid running out of memory completely, and it is a very expensive operation that significantly slows everything down as well. Thanks!
For a more representative example, this is a much larger exampleMatrix
:
matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)
exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)
r matrix
1
@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question
– Walker in the City
Jan 3 at 0:19
add a comment |
I am looking for an efficient way to combine selected columns in a logical matrix by "AND
ing" them together and ending up with a new matrix. An example of what I am looking for:
matrixData <- rep(c(TRUE, TRUE, FALSE), 8)
exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)
exampleMatrix
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE FALSE TRUE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE TRUE TRUE FALSE
[4,] TRUE TRUE FALSE TRUE
[5,] TRUE FALSE TRUE TRUE
[6,] FALSE TRUE TRUE FALSE
The columns to be ANDed to each other are specified in a numeric vector of length ncol(exampleMatrix)
, where the columns to be grouped together ANDed have the same value (a value from 1
to n
, where n <= ncol(exampleMatrix)
and every value in 1:n
is used at least once). The resulting matrix should have the columns in order from 1:n
. For example, if the vector that specifies the column groups is
colGroups <- c(3, 2, 2, 1)
Then the resulting matrix would be
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
[5,] TRUE FALSE TRUE
[6,] FALSE TRUE FALSE
Where in the resulting matrix
[,1] = exampleMatrix[,4]
[,2] = exampleMatrix[,2] & exampleMatrix[,3]
[,3] = exampleMatrix[,1]
My current way of doing this looks basically like this:
finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)
for (i in 1:3){
selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]
finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)
}
Where rowSums(selectedColumns)==ncol(selectedColumns)
is an efficient way to AND all of the columns of a matrix together.
My problem is that I am doing this on very big matrices (millions of rows) and I am looking for any way to make this quicker. My first instinct would be to use apply
in some way but I can't see any way to use that to improve efficiency as I am not performing the operation in the for
loop many times but instead it is the operation in the loop that is slow.
In addition, any tips to reduce memory allocation would be very useful, as I currently have to run gc()
within the loop frequently to avoid running out of memory completely, and it is a very expensive operation that significantly slows everything down as well. Thanks!
For a more representative example, this is a much larger exampleMatrix
:
matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)
exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)
r matrix
I am looking for an efficient way to combine selected columns in a logical matrix by "AND
ing" them together and ending up with a new matrix. An example of what I am looking for:
matrixData <- rep(c(TRUE, TRUE, FALSE), 8)
exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)
exampleMatrix
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE FALSE TRUE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE TRUE TRUE FALSE
[4,] TRUE TRUE FALSE TRUE
[5,] TRUE FALSE TRUE TRUE
[6,] FALSE TRUE TRUE FALSE
The columns to be ANDed to each other are specified in a numeric vector of length ncol(exampleMatrix)
, where the columns to be grouped together ANDed have the same value (a value from 1
to n
, where n <= ncol(exampleMatrix)
and every value in 1:n
is used at least once). The resulting matrix should have the columns in order from 1:n
. For example, if the vector that specifies the column groups is
colGroups <- c(3, 2, 2, 1)
Then the resulting matrix would be
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
[5,] TRUE FALSE TRUE
[6,] FALSE TRUE FALSE
Where in the resulting matrix
[,1] = exampleMatrix[,4]
[,2] = exampleMatrix[,2] & exampleMatrix[,3]
[,3] = exampleMatrix[,1]
My current way of doing this looks basically like this:
finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)
for (i in 1:3){
selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]
finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)
}
Where rowSums(selectedColumns)==ncol(selectedColumns)
is an efficient way to AND all of the columns of a matrix together.
My problem is that I am doing this on very big matrices (millions of rows) and I am looking for any way to make this quicker. My first instinct would be to use apply
in some way but I can't see any way to use that to improve efficiency as I am not performing the operation in the for
loop many times but instead it is the operation in the loop that is slow.
In addition, any tips to reduce memory allocation would be very useful, as I currently have to run gc()
within the loop frequently to avoid running out of memory completely, and it is a very expensive operation that significantly slows everything down as well. Thanks!
For a more representative example, this is a much larger exampleMatrix
:
matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)
exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)
r matrix
r matrix
edited Jan 3 at 0:54
Henrik
42.1k994110
42.1k994110
asked Jan 2 at 22:52
Walker in the CityWalker in the City
113112
113112
1
@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question
– Walker in the City
Jan 3 at 0:19
add a comment |
1
@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question
– Walker in the City
Jan 3 at 0:19
1
1
@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question
– Walker in the City
Jan 3 at 0:19
@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question
– Walker in the City
Jan 3 at 0:19
add a comment |
2 Answers
2
active
oldest
votes
From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups
(30% improvement over your suggestion):
for (jj in seq_along(colGroups))
finalMatrix[ , colGroups[jj]] =
finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]
I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).
add a comment |
As far as I can tell, this is an aggregation across columns using the all
function. So if you t
ranspose to rows, then use colGroups
as the grouping factor to apply all
, then t
ranspose back to columns, you should get the intended result:
t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])
# [,1] [,2] [,3]
#V1 TRUE FALSE TRUE
#V2 TRUE FALSE TRUE
#V3 FALSE TRUE FALSE
#V4 TRUE FALSE TRUE
#V5 TRUE FALSE TRUE
#V6 FALSE TRUE FALSE
The [-1]
just drops the group-identifier variable which you don't require in the final output.
If you're working with stupid big data, the by-group aggregation could be done in data.table
as well:
library(data.table)
t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])
1
This is a super straightforward and elegant way to do this, but I am not seeing any time savings. Withnrow=6e7
, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.
– Walker in the City
Jan 3 at 0:00
@WalkerintheCity - the issue would be thet
for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.
– thelatemail
Jan 3 at 0:07
I am still running into a memory allocation issue after transposingexampleMatrix
and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.
– Walker in the City
Jan 3 at 0:25
1
@WalkerintheCity - the more I think about this, thefor
loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.
– thelatemail
Jan 3 at 0:32
I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.
– Walker in the City
Jan 3 at 0:55
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014215%2fefficiently-combine-and-groups-of-columns-in-a-logical-matrix%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups
(30% improvement over your suggestion):
for (jj in seq_along(colGroups))
finalMatrix[ , colGroups[jj]] =
finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]
I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).
add a comment |
From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups
(30% improvement over your suggestion):
for (jj in seq_along(colGroups))
finalMatrix[ , colGroups[jj]] =
finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]
I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).
add a comment |
From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups
(30% improvement over your suggestion):
for (jj in seq_along(colGroups))
finalMatrix[ , colGroups[jj]] =
finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]
I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).
From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups
(30% improvement over your suggestion):
for (jj in seq_along(colGroups))
finalMatrix[ , colGroups[jj]] =
finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]
I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).
answered Jan 3 at 1:41
MichaelChiricoMichaelChirico
20.5k863117
20.5k863117
add a comment |
add a comment |
As far as I can tell, this is an aggregation across columns using the all
function. So if you t
ranspose to rows, then use colGroups
as the grouping factor to apply all
, then t
ranspose back to columns, you should get the intended result:
t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])
# [,1] [,2] [,3]
#V1 TRUE FALSE TRUE
#V2 TRUE FALSE TRUE
#V3 FALSE TRUE FALSE
#V4 TRUE FALSE TRUE
#V5 TRUE FALSE TRUE
#V6 FALSE TRUE FALSE
The [-1]
just drops the group-identifier variable which you don't require in the final output.
If you're working with stupid big data, the by-group aggregation could be done in data.table
as well:
library(data.table)
t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])
1
This is a super straightforward and elegant way to do this, but I am not seeing any time savings. Withnrow=6e7
, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.
– Walker in the City
Jan 3 at 0:00
@WalkerintheCity - the issue would be thet
for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.
– thelatemail
Jan 3 at 0:07
I am still running into a memory allocation issue after transposingexampleMatrix
and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.
– Walker in the City
Jan 3 at 0:25
1
@WalkerintheCity - the more I think about this, thefor
loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.
– thelatemail
Jan 3 at 0:32
I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.
– Walker in the City
Jan 3 at 0:55
add a comment |
As far as I can tell, this is an aggregation across columns using the all
function. So if you t
ranspose to rows, then use colGroups
as the grouping factor to apply all
, then t
ranspose back to columns, you should get the intended result:
t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])
# [,1] [,2] [,3]
#V1 TRUE FALSE TRUE
#V2 TRUE FALSE TRUE
#V3 FALSE TRUE FALSE
#V4 TRUE FALSE TRUE
#V5 TRUE FALSE TRUE
#V6 FALSE TRUE FALSE
The [-1]
just drops the group-identifier variable which you don't require in the final output.
If you're working with stupid big data, the by-group aggregation could be done in data.table
as well:
library(data.table)
t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])
1
This is a super straightforward and elegant way to do this, but I am not seeing any time savings. Withnrow=6e7
, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.
– Walker in the City
Jan 3 at 0:00
@WalkerintheCity - the issue would be thet
for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.
– thelatemail
Jan 3 at 0:07
I am still running into a memory allocation issue after transposingexampleMatrix
and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.
– Walker in the City
Jan 3 at 0:25
1
@WalkerintheCity - the more I think about this, thefor
loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.
– thelatemail
Jan 3 at 0:32
I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.
– Walker in the City
Jan 3 at 0:55
add a comment |
As far as I can tell, this is an aggregation across columns using the all
function. So if you t
ranspose to rows, then use colGroups
as the grouping factor to apply all
, then t
ranspose back to columns, you should get the intended result:
t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])
# [,1] [,2] [,3]
#V1 TRUE FALSE TRUE
#V2 TRUE FALSE TRUE
#V3 FALSE TRUE FALSE
#V4 TRUE FALSE TRUE
#V5 TRUE FALSE TRUE
#V6 FALSE TRUE FALSE
The [-1]
just drops the group-identifier variable which you don't require in the final output.
If you're working with stupid big data, the by-group aggregation could be done in data.table
as well:
library(data.table)
t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])
As far as I can tell, this is an aggregation across columns using the all
function. So if you t
ranspose to rows, then use colGroups
as the grouping factor to apply all
, then t
ranspose back to columns, you should get the intended result:
t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])
# [,1] [,2] [,3]
#V1 TRUE FALSE TRUE
#V2 TRUE FALSE TRUE
#V3 FALSE TRUE FALSE
#V4 TRUE FALSE TRUE
#V5 TRUE FALSE TRUE
#V6 FALSE TRUE FALSE
The [-1]
just drops the group-identifier variable which you don't require in the final output.
If you're working with stupid big data, the by-group aggregation could be done in data.table
as well:
library(data.table)
t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])
edited Jan 2 at 23:43
answered Jan 2 at 23:27
thelatemailthelatemail
68k883151
68k883151
1
This is a super straightforward and elegant way to do this, but I am not seeing any time savings. Withnrow=6e7
, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.
– Walker in the City
Jan 3 at 0:00
@WalkerintheCity - the issue would be thet
for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.
– thelatemail
Jan 3 at 0:07
I am still running into a memory allocation issue after transposingexampleMatrix
and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.
– Walker in the City
Jan 3 at 0:25
1
@WalkerintheCity - the more I think about this, thefor
loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.
– thelatemail
Jan 3 at 0:32
I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.
– Walker in the City
Jan 3 at 0:55
add a comment |
1
This is a super straightforward and elegant way to do this, but I am not seeing any time savings. Withnrow=6e7
, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.
– Walker in the City
Jan 3 at 0:00
@WalkerintheCity - the issue would be thet
for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.
– thelatemail
Jan 3 at 0:07
I am still running into a memory allocation issue after transposingexampleMatrix
and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.
– Walker in the City
Jan 3 at 0:25
1
@WalkerintheCity - the more I think about this, thefor
loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.
– thelatemail
Jan 3 at 0:32
I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.
– Walker in the City
Jan 3 at 0:55
1
1
This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With
nrow=6e7
, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.– Walker in the City
Jan 3 at 0:00
This is a super straightforward and elegant way to do this, but I am not seeing any time savings. With
nrow=6e7
, both solutions ran out of memory on my system , whereas the solution from @michaelchirico took 66 seconds and mine took 5 seconds.– Walker in the City
Jan 3 at 0:00
@WalkerintheCity - the issue would be the
t
for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.– thelatemail
Jan 3 at 0:07
@WalkerintheCity - the issue would be the
t
for transposing the matrix I believe which means you have the original plus a copy of the whole object. Are you able to save the transposed copy (and remove the original) first and then work with it instead? Storing as rows or columns shouldn't make any real difference to storage size.– thelatemail
Jan 3 at 0:07
I am still running into a memory allocation issue after transposing
exampleMatrix
and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.– Walker in the City
Jan 3 at 0:25
I am still running into a memory allocation issue after transposing
exampleMatrix
and removing the original. I am going to increase the memory limit and try it on a machine with more RAM.– Walker in the City
Jan 3 at 0:25
1
1
@WalkerintheCity - the more I think about this, the
for
loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.– thelatemail
Jan 3 at 0:32
@WalkerintheCity - the more I think about this, the
for
loop is actually an excellent way to do this as you only have to copy the needed chunks rather than work on (and potentially copy) a massive object.– thelatemail
Jan 3 at 0:32
I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.
– Walker in the City
Jan 3 at 0:55
I have been running this with more memory for more than 10 minutes (not yet finished) and I think my conclusion is that is is very slow. I am interested as to why that is and what is holding it up. I do suspect that my solution is already pretty speedy as I have spent a fair bit of time trying to optimize this chunk and it is quite the bottleneck.
– Walker in the City
Jan 3 at 0:55
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014215%2fefficiently-combine-and-groups-of-columns-in-a-logical-matrix%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
@Henrik the data I have been using to benchmark the current answers as well as my original solution is now at the end of the question
– Walker in the City
Jan 3 at 0:19