How to efficiently return all the column names across 1m records when certain conditions met












-2















Updated with dummy data and dummycode - apologies, I assumed my question was simple and you could advice the best way without a reproducible example.



dummy<-data.frame(prodA=c(0,0,0,1,1,0,0,1),
prodB=c(0,0,1,1,0,1,1,0),
prodC=c(1,1,1,0,0,0,0,1))

dummy[,4:6]<-dummy[,1:3]

for (j in (1:nrow(dummy))){
for (i in 4:6){
dummy[j,i]<-ifelse(dummy[j,i]==1,colnames(dummy[i]),"")}
}
dummy2<-dummy[,4:6]
dummy$NewProds<-apply(dummy2,1,paste,collapse="")
dummy$NewProds<-gsub(".1","//",dummy$NewProds)


My second attempt is as:



prods<-dummy[,1:3]
prods[,4:6]<-dummy[,1:3]
for (i in 4:6){
prods[,i]<-colnames(prods[i-3])
}

prods[,7:9]<-prods[,4:6]
#works, but I will need multiple ifs for this to work, suggesting this
#won't be very efficient
prods[,10]<-ifelse(prods[,1]==1,prods[,4],"")


Original Post Follows:
I am playing with the Santander Product recommendation dataset from Kaggle. I have identified which products have been purchased from one month to another. This means I have 23 columns of 1's ( when a new product is added) and 0's (when not).
I created the following code to return the column name when a product has been purchased. It works great on a sample of 6 lines, but it runs forever when I try this on the 48k customers who changed, let alone the million in the dataset.



Is there another way to do this?



df2[,99:122]<-df2[,72:95]

for (j in (1:nrow(df2))){
for (i in 99:122){
df2[j,i]<-ifelse(df2[j,i]==1,colnames(df2[i]),"")}
}
df22<-df2[,99:122]
df2$NewProds<-apply(df22,1,paste,collapse="")
df2$NewProds<-gsub("change.1","//",df2$NewProds)


I figured the challenge was that I am looking at every variable and so started with another approach whereby I would take a couple of versions of the data, and then do an if variable is 1 then take the name. However I couldn't get this to work, and I think I come to the same issue.



#copy a bunch of 1's and 0's
prods<-df2[,72:95]
#repeat and overwrite with colnames
prods[,25:48]<-df2[,72:95]
for (i in 25:48){
prods[,i]<-colnames(prods[i-24])
}
prods[,49:72]<-prods[,25:48]
#attempt to only populate colnames if it was originally a 1 - doesn't work
prod[,49]<-ifelse(prod[,1]==1,prod[,25],"")


I haven't provided any data but I hope you can see what I am tring to do and can advise on efficient ways of doing this.
Thanks in advance,
J










share|improve this question




















  • 5





    So you actually note that you haven't provided any data, but why would you not just include some and make it a reproducible example. If you're not going to take the time to write a good question, why would we take the time to write a good answer

    – Conor Neilson
    Dec 29 '18 at 18:28











  • Can you post sample data? Please edit the question with the output of dput(df2). Or, if it is too big with the output of dput(df2[1:20, 72:95])).

    – Rui Barradas
    Dec 29 '18 at 18:36






  • 1





    I don't understand the output you want. The column names of the columns with at least one 1?

    – Rui Barradas
    Dec 29 '18 at 18:54











  • I apologise. I thought my question was simple and that this would not need dummy data. I have provided it now and the working example. The point here is that this works, but for the mass of data it takes far too long. I am looking for someone who can give me a more effective way of doing this. Thank you in advance.

    – James Oliver
    Dec 29 '18 at 21:20
















-2















Updated with dummy data and dummycode - apologies, I assumed my question was simple and you could advice the best way without a reproducible example.



dummy<-data.frame(prodA=c(0,0,0,1,1,0,0,1),
prodB=c(0,0,1,1,0,1,1,0),
prodC=c(1,1,1,0,0,0,0,1))

dummy[,4:6]<-dummy[,1:3]

for (j in (1:nrow(dummy))){
for (i in 4:6){
dummy[j,i]<-ifelse(dummy[j,i]==1,colnames(dummy[i]),"")}
}
dummy2<-dummy[,4:6]
dummy$NewProds<-apply(dummy2,1,paste,collapse="")
dummy$NewProds<-gsub(".1","//",dummy$NewProds)


My second attempt is as:



prods<-dummy[,1:3]
prods[,4:6]<-dummy[,1:3]
for (i in 4:6){
prods[,i]<-colnames(prods[i-3])
}

prods[,7:9]<-prods[,4:6]
#works, but I will need multiple ifs for this to work, suggesting this
#won't be very efficient
prods[,10]<-ifelse(prods[,1]==1,prods[,4],"")


Original Post Follows:
I am playing with the Santander Product recommendation dataset from Kaggle. I have identified which products have been purchased from one month to another. This means I have 23 columns of 1's ( when a new product is added) and 0's (when not).
I created the following code to return the column name when a product has been purchased. It works great on a sample of 6 lines, but it runs forever when I try this on the 48k customers who changed, let alone the million in the dataset.



Is there another way to do this?



df2[,99:122]<-df2[,72:95]

for (j in (1:nrow(df2))){
for (i in 99:122){
df2[j,i]<-ifelse(df2[j,i]==1,colnames(df2[i]),"")}
}
df22<-df2[,99:122]
df2$NewProds<-apply(df22,1,paste,collapse="")
df2$NewProds<-gsub("change.1","//",df2$NewProds)


I figured the challenge was that I am looking at every variable and so started with another approach whereby I would take a couple of versions of the data, and then do an if variable is 1 then take the name. However I couldn't get this to work, and I think I come to the same issue.



#copy a bunch of 1's and 0's
prods<-df2[,72:95]
#repeat and overwrite with colnames
prods[,25:48]<-df2[,72:95]
for (i in 25:48){
prods[,i]<-colnames(prods[i-24])
}
prods[,49:72]<-prods[,25:48]
#attempt to only populate colnames if it was originally a 1 - doesn't work
prod[,49]<-ifelse(prod[,1]==1,prod[,25],"")


I haven't provided any data but I hope you can see what I am tring to do and can advise on efficient ways of doing this.
Thanks in advance,
J










share|improve this question




















  • 5





    So you actually note that you haven't provided any data, but why would you not just include some and make it a reproducible example. If you're not going to take the time to write a good question, why would we take the time to write a good answer

    – Conor Neilson
    Dec 29 '18 at 18:28











  • Can you post sample data? Please edit the question with the output of dput(df2). Or, if it is too big with the output of dput(df2[1:20, 72:95])).

    – Rui Barradas
    Dec 29 '18 at 18:36






  • 1





    I don't understand the output you want. The column names of the columns with at least one 1?

    – Rui Barradas
    Dec 29 '18 at 18:54











  • I apologise. I thought my question was simple and that this would not need dummy data. I have provided it now and the working example. The point here is that this works, but for the mass of data it takes far too long. I am looking for someone who can give me a more effective way of doing this. Thank you in advance.

    – James Oliver
    Dec 29 '18 at 21:20














-2












-2








-2








Updated with dummy data and dummycode - apologies, I assumed my question was simple and you could advice the best way without a reproducible example.



dummy<-data.frame(prodA=c(0,0,0,1,1,0,0,1),
prodB=c(0,0,1,1,0,1,1,0),
prodC=c(1,1,1,0,0,0,0,1))

dummy[,4:6]<-dummy[,1:3]

for (j in (1:nrow(dummy))){
for (i in 4:6){
dummy[j,i]<-ifelse(dummy[j,i]==1,colnames(dummy[i]),"")}
}
dummy2<-dummy[,4:6]
dummy$NewProds<-apply(dummy2,1,paste,collapse="")
dummy$NewProds<-gsub(".1","//",dummy$NewProds)


My second attempt is as:



prods<-dummy[,1:3]
prods[,4:6]<-dummy[,1:3]
for (i in 4:6){
prods[,i]<-colnames(prods[i-3])
}

prods[,7:9]<-prods[,4:6]
#works, but I will need multiple ifs for this to work, suggesting this
#won't be very efficient
prods[,10]<-ifelse(prods[,1]==1,prods[,4],"")


Original Post Follows:
I am playing with the Santander Product recommendation dataset from Kaggle. I have identified which products have been purchased from one month to another. This means I have 23 columns of 1's ( when a new product is added) and 0's (when not).
I created the following code to return the column name when a product has been purchased. It works great on a sample of 6 lines, but it runs forever when I try this on the 48k customers who changed, let alone the million in the dataset.



Is there another way to do this?



df2[,99:122]<-df2[,72:95]

for (j in (1:nrow(df2))){
for (i in 99:122){
df2[j,i]<-ifelse(df2[j,i]==1,colnames(df2[i]),"")}
}
df22<-df2[,99:122]
df2$NewProds<-apply(df22,1,paste,collapse="")
df2$NewProds<-gsub("change.1","//",df2$NewProds)


I figured the challenge was that I am looking at every variable and so started with another approach whereby I would take a couple of versions of the data, and then do an if variable is 1 then take the name. However I couldn't get this to work, and I think I come to the same issue.



#copy a bunch of 1's and 0's
prods<-df2[,72:95]
#repeat and overwrite with colnames
prods[,25:48]<-df2[,72:95]
for (i in 25:48){
prods[,i]<-colnames(prods[i-24])
}
prods[,49:72]<-prods[,25:48]
#attempt to only populate colnames if it was originally a 1 - doesn't work
prod[,49]<-ifelse(prod[,1]==1,prod[,25],"")


I haven't provided any data but I hope you can see what I am tring to do and can advise on efficient ways of doing this.
Thanks in advance,
J










share|improve this question
















Updated with dummy data and dummycode - apologies, I assumed my question was simple and you could advice the best way without a reproducible example.



dummy<-data.frame(prodA=c(0,0,0,1,1,0,0,1),
prodB=c(0,0,1,1,0,1,1,0),
prodC=c(1,1,1,0,0,0,0,1))

dummy[,4:6]<-dummy[,1:3]

for (j in (1:nrow(dummy))){
for (i in 4:6){
dummy[j,i]<-ifelse(dummy[j,i]==1,colnames(dummy[i]),"")}
}
dummy2<-dummy[,4:6]
dummy$NewProds<-apply(dummy2,1,paste,collapse="")
dummy$NewProds<-gsub(".1","//",dummy$NewProds)


My second attempt is as:



prods<-dummy[,1:3]
prods[,4:6]<-dummy[,1:3]
for (i in 4:6){
prods[,i]<-colnames(prods[i-3])
}

prods[,7:9]<-prods[,4:6]
#works, but I will need multiple ifs for this to work, suggesting this
#won't be very efficient
prods[,10]<-ifelse(prods[,1]==1,prods[,4],"")


Original Post Follows:
I am playing with the Santander Product recommendation dataset from Kaggle. I have identified which products have been purchased from one month to another. This means I have 23 columns of 1's ( when a new product is added) and 0's (when not).
I created the following code to return the column name when a product has been purchased. It works great on a sample of 6 lines, but it runs forever when I try this on the 48k customers who changed, let alone the million in the dataset.



Is there another way to do this?



df2[,99:122]<-df2[,72:95]

for (j in (1:nrow(df2))){
for (i in 99:122){
df2[j,i]<-ifelse(df2[j,i]==1,colnames(df2[i]),"")}
}
df22<-df2[,99:122]
df2$NewProds<-apply(df22,1,paste,collapse="")
df2$NewProds<-gsub("change.1","//",df2$NewProds)


I figured the challenge was that I am looking at every variable and so started with another approach whereby I would take a couple of versions of the data, and then do an if variable is 1 then take the name. However I couldn't get this to work, and I think I come to the same issue.



#copy a bunch of 1's and 0's
prods<-df2[,72:95]
#repeat and overwrite with colnames
prods[,25:48]<-df2[,72:95]
for (i in 25:48){
prods[,i]<-colnames(prods[i-24])
}
prods[,49:72]<-prods[,25:48]
#attempt to only populate colnames if it was originally a 1 - doesn't work
prod[,49]<-ifelse(prod[,1]==1,prod[,25],"")


I haven't provided any data but I hope you can see what I am tring to do and can advise on efficient ways of doing this.
Thanks in advance,
J







r loops






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 29 '18 at 21:27







James Oliver

















asked Dec 29 '18 at 18:00









James OliverJames Oliver

5816




5816








  • 5





    So you actually note that you haven't provided any data, but why would you not just include some and make it a reproducible example. If you're not going to take the time to write a good question, why would we take the time to write a good answer

    – Conor Neilson
    Dec 29 '18 at 18:28











  • Can you post sample data? Please edit the question with the output of dput(df2). Or, if it is too big with the output of dput(df2[1:20, 72:95])).

    – Rui Barradas
    Dec 29 '18 at 18:36






  • 1





    I don't understand the output you want. The column names of the columns with at least one 1?

    – Rui Barradas
    Dec 29 '18 at 18:54











  • I apologise. I thought my question was simple and that this would not need dummy data. I have provided it now and the working example. The point here is that this works, but for the mass of data it takes far too long. I am looking for someone who can give me a more effective way of doing this. Thank you in advance.

    – James Oliver
    Dec 29 '18 at 21:20














  • 5





    So you actually note that you haven't provided any data, but why would you not just include some and make it a reproducible example. If you're not going to take the time to write a good question, why would we take the time to write a good answer

    – Conor Neilson
    Dec 29 '18 at 18:28











  • Can you post sample data? Please edit the question with the output of dput(df2). Or, if it is too big with the output of dput(df2[1:20, 72:95])).

    – Rui Barradas
    Dec 29 '18 at 18:36






  • 1





    I don't understand the output you want. The column names of the columns with at least one 1?

    – Rui Barradas
    Dec 29 '18 at 18:54











  • I apologise. I thought my question was simple and that this would not need dummy data. I have provided it now and the working example. The point here is that this works, but for the mass of data it takes far too long. I am looking for someone who can give me a more effective way of doing this. Thank you in advance.

    – James Oliver
    Dec 29 '18 at 21:20








5




5





So you actually note that you haven't provided any data, but why would you not just include some and make it a reproducible example. If you're not going to take the time to write a good question, why would we take the time to write a good answer

– Conor Neilson
Dec 29 '18 at 18:28





So you actually note that you haven't provided any data, but why would you not just include some and make it a reproducible example. If you're not going to take the time to write a good question, why would we take the time to write a good answer

– Conor Neilson
Dec 29 '18 at 18:28













Can you post sample data? Please edit the question with the output of dput(df2). Or, if it is too big with the output of dput(df2[1:20, 72:95])).

– Rui Barradas
Dec 29 '18 at 18:36





Can you post sample data? Please edit the question with the output of dput(df2). Or, if it is too big with the output of dput(df2[1:20, 72:95])).

– Rui Barradas
Dec 29 '18 at 18:36




1




1





I don't understand the output you want. The column names of the columns with at least one 1?

– Rui Barradas
Dec 29 '18 at 18:54





I don't understand the output you want. The column names of the columns with at least one 1?

– Rui Barradas
Dec 29 '18 at 18:54













I apologise. I thought my question was simple and that this would not need dummy data. I have provided it now and the working example. The point here is that this works, but for the mass of data it takes far too long. I am looking for someone who can give me a more effective way of doing this. Thank you in advance.

– James Oliver
Dec 29 '18 at 21:20





I apologise. I thought my question was simple and that this would not need dummy data. I have provided it now and the working example. The point here is that this works, but for the mass of data it takes far too long. I am looking for someone who can give me a more effective way of doing this. Thank you in advance.

– James Oliver
Dec 29 '18 at 21:20












2 Answers
2






active

oldest

votes


















1














Using apply as @AndersEllernBilgrau illustrated is one obvious way to do it, but it will be slow for data sets with many rows.



dummy[["NewProds"]] <- do.call(
paste,
c(mapply(ifelse,
dummy,
names(dummy),
MoreArgs = list(no = ""),
SIMPLIFY = FALSE),
sep = "//"))


is a bit harder to follow, but it will be much faster:



library(microbenchmark)

n <- 10000
dummy <- data.frame(prodA = rep(c(0,0,0,1,1,0,0,1), n),
prodB = rep(c(0,0,1,1,0,1,1,0), n),
prodC = rep(c(1,1,1,0,0,0,0,1), n))

microbenchmark(
do.call = do.call(
paste,
c(mapply(ifelse,
dummy,
names(dummy),
MoreArgs = list(no = ""),
SIMPLIFY = FALSE),
sep = "//")),
apply = apply(
dummy == 1,
1,
function(x) paste0(names(which(x)), collapse = "//")
))
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## do.call 63.92695 65.44777 72.07261 67.8667 73.3850 184.5151 100 a
## apply 296.81323 364.31947 404.71894 397.0927 443.7223 683.3892 100 b





share|improve this answer
























  • Wow! I cannot believe how quick that was. Thank you. I need to get closer to these functions.

    – James Oliver
    Jan 5 at 13:17



















1














Without data, I have a hard time understanding precisely what you want to do.
A couple of things are (almost) certain however:




  • You probably do not need for loops.

  • You should used R's vectorized functions, the dataset is not that big


Using some toy data, does the following do what you want?



d <- 23
n <- 46e3

# Simulate some toy data
df <- data.frame(matrix(rbinom(d*n, 1, 0.1), n, d),
row.names = paste0("row", 1:n))
head(df)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23
row1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
row2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
row3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
row4 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0
row5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
row6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0



# Paste together the colnames of all non-zero rows
res <- apply(df == 1, 1, function(x) paste0(names(which(x)), collapse = "-"))
head(res)
# row1 row2 row3 row4 row5 row6
#"X8-X16" "X1" "X8-X20" "X4-X11-X20" "X7-X15" "X4-X18-X21"


I.e. res is here a character vector of length n with the colnames of each row the corresponding to 1 entries pasted together (with separator -). This it at least what it appears to me what your code is doing conceptually.






share|improve this answer





















  • 1





    The OP wants colnames.

    – Rui Barradas
    Dec 29 '18 at 18:41











  • @RuiBarradas Arh, doh

    – Anders Ellern Bilgrau
    Dec 29 '18 at 18:42











  • Thank you for trying. I have updated the question with dummy data and my amended code so that it works with the dummy code.

    – James Oliver
    Dec 29 '18 at 21:24











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53972034%2fhow-to-efficiently-return-all-the-column-names-across-1m-records-when-certain-co%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Using apply as @AndersEllernBilgrau illustrated is one obvious way to do it, but it will be slow for data sets with many rows.



dummy[["NewProds"]] <- do.call(
paste,
c(mapply(ifelse,
dummy,
names(dummy),
MoreArgs = list(no = ""),
SIMPLIFY = FALSE),
sep = "//"))


is a bit harder to follow, but it will be much faster:



library(microbenchmark)

n <- 10000
dummy <- data.frame(prodA = rep(c(0,0,0,1,1,0,0,1), n),
prodB = rep(c(0,0,1,1,0,1,1,0), n),
prodC = rep(c(1,1,1,0,0,0,0,1), n))

microbenchmark(
do.call = do.call(
paste,
c(mapply(ifelse,
dummy,
names(dummy),
MoreArgs = list(no = ""),
SIMPLIFY = FALSE),
sep = "//")),
apply = apply(
dummy == 1,
1,
function(x) paste0(names(which(x)), collapse = "//")
))
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## do.call 63.92695 65.44777 72.07261 67.8667 73.3850 184.5151 100 a
## apply 296.81323 364.31947 404.71894 397.0927 443.7223 683.3892 100 b





share|improve this answer
























  • Wow! I cannot believe how quick that was. Thank you. I need to get closer to these functions.

    – James Oliver
    Jan 5 at 13:17
















1














Using apply as @AndersEllernBilgrau illustrated is one obvious way to do it, but it will be slow for data sets with many rows.



dummy[["NewProds"]] <- do.call(
paste,
c(mapply(ifelse,
dummy,
names(dummy),
MoreArgs = list(no = ""),
SIMPLIFY = FALSE),
sep = "//"))


is a bit harder to follow, but it will be much faster:



library(microbenchmark)

n <- 10000
dummy <- data.frame(prodA = rep(c(0,0,0,1,1,0,0,1), n),
prodB = rep(c(0,0,1,1,0,1,1,0), n),
prodC = rep(c(1,1,1,0,0,0,0,1), n))

microbenchmark(
do.call = do.call(
paste,
c(mapply(ifelse,
dummy,
names(dummy),
MoreArgs = list(no = ""),
SIMPLIFY = FALSE),
sep = "//")),
apply = apply(
dummy == 1,
1,
function(x) paste0(names(which(x)), collapse = "//")
))
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## do.call 63.92695 65.44777 72.07261 67.8667 73.3850 184.5151 100 a
## apply 296.81323 364.31947 404.71894 397.0927 443.7223 683.3892 100 b





share|improve this answer
























  • Wow! I cannot believe how quick that was. Thank you. I need to get closer to these functions.

    – James Oliver
    Jan 5 at 13:17














1












1








1







Using apply as @AndersEllernBilgrau illustrated is one obvious way to do it, but it will be slow for data sets with many rows.



dummy[["NewProds"]] <- do.call(
paste,
c(mapply(ifelse,
dummy,
names(dummy),
MoreArgs = list(no = ""),
SIMPLIFY = FALSE),
sep = "//"))


is a bit harder to follow, but it will be much faster:



library(microbenchmark)

n <- 10000
dummy <- data.frame(prodA = rep(c(0,0,0,1,1,0,0,1), n),
prodB = rep(c(0,0,1,1,0,1,1,0), n),
prodC = rep(c(1,1,1,0,0,0,0,1), n))

microbenchmark(
do.call = do.call(
paste,
c(mapply(ifelse,
dummy,
names(dummy),
MoreArgs = list(no = ""),
SIMPLIFY = FALSE),
sep = "//")),
apply = apply(
dummy == 1,
1,
function(x) paste0(names(which(x)), collapse = "//")
))
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## do.call 63.92695 65.44777 72.07261 67.8667 73.3850 184.5151 100 a
## apply 296.81323 364.31947 404.71894 397.0927 443.7223 683.3892 100 b





share|improve this answer













Using apply as @AndersEllernBilgrau illustrated is one obvious way to do it, but it will be slow for data sets with many rows.



dummy[["NewProds"]] <- do.call(
paste,
c(mapply(ifelse,
dummy,
names(dummy),
MoreArgs = list(no = ""),
SIMPLIFY = FALSE),
sep = "//"))


is a bit harder to follow, but it will be much faster:



library(microbenchmark)

n <- 10000
dummy <- data.frame(prodA = rep(c(0,0,0,1,1,0,0,1), n),
prodB = rep(c(0,0,1,1,0,1,1,0), n),
prodC = rep(c(1,1,1,0,0,0,0,1), n))

microbenchmark(
do.call = do.call(
paste,
c(mapply(ifelse,
dummy,
names(dummy),
MoreArgs = list(no = ""),
SIMPLIFY = FALSE),
sep = "//")),
apply = apply(
dummy == 1,
1,
function(x) paste0(names(which(x)), collapse = "//")
))
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## do.call 63.92695 65.44777 72.07261 67.8667 73.3850 184.5151 100 a
## apply 296.81323 364.31947 404.71894 397.0927 443.7223 683.3892 100 b






share|improve this answer












share|improve this answer



share|improve this answer










answered Dec 30 '18 at 1:21









IstaIsta

7,69712426




7,69712426













  • Wow! I cannot believe how quick that was. Thank you. I need to get closer to these functions.

    – James Oliver
    Jan 5 at 13:17



















  • Wow! I cannot believe how quick that was. Thank you. I need to get closer to these functions.

    – James Oliver
    Jan 5 at 13:17

















Wow! I cannot believe how quick that was. Thank you. I need to get closer to these functions.

– James Oliver
Jan 5 at 13:17





Wow! I cannot believe how quick that was. Thank you. I need to get closer to these functions.

– James Oliver
Jan 5 at 13:17













1














Without data, I have a hard time understanding precisely what you want to do.
A couple of things are (almost) certain however:




  • You probably do not need for loops.

  • You should used R's vectorized functions, the dataset is not that big


Using some toy data, does the following do what you want?



d <- 23
n <- 46e3

# Simulate some toy data
df <- data.frame(matrix(rbinom(d*n, 1, 0.1), n, d),
row.names = paste0("row", 1:n))
head(df)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23
row1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
row2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
row3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
row4 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0
row5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
row6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0



# Paste together the colnames of all non-zero rows
res <- apply(df == 1, 1, function(x) paste0(names(which(x)), collapse = "-"))
head(res)
# row1 row2 row3 row4 row5 row6
#"X8-X16" "X1" "X8-X20" "X4-X11-X20" "X7-X15" "X4-X18-X21"


I.e. res is here a character vector of length n with the colnames of each row the corresponding to 1 entries pasted together (with separator -). This it at least what it appears to me what your code is doing conceptually.






share|improve this answer





















  • 1





    The OP wants colnames.

    – Rui Barradas
    Dec 29 '18 at 18:41











  • @RuiBarradas Arh, doh

    – Anders Ellern Bilgrau
    Dec 29 '18 at 18:42











  • Thank you for trying. I have updated the question with dummy data and my amended code so that it works with the dummy code.

    – James Oliver
    Dec 29 '18 at 21:24
















1














Without data, I have a hard time understanding precisely what you want to do.
A couple of things are (almost) certain however:




  • You probably do not need for loops.

  • You should used R's vectorized functions, the dataset is not that big


Using some toy data, does the following do what you want?



d <- 23
n <- 46e3

# Simulate some toy data
df <- data.frame(matrix(rbinom(d*n, 1, 0.1), n, d),
row.names = paste0("row", 1:n))
head(df)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23
row1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
row2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
row3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
row4 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0
row5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
row6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0



# Paste together the colnames of all non-zero rows
res <- apply(df == 1, 1, function(x) paste0(names(which(x)), collapse = "-"))
head(res)
# row1 row2 row3 row4 row5 row6
#"X8-X16" "X1" "X8-X20" "X4-X11-X20" "X7-X15" "X4-X18-X21"


I.e. res is here a character vector of length n with the colnames of each row the corresponding to 1 entries pasted together (with separator -). This it at least what it appears to me what your code is doing conceptually.






share|improve this answer





















  • 1





    The OP wants colnames.

    – Rui Barradas
    Dec 29 '18 at 18:41











  • @RuiBarradas Arh, doh

    – Anders Ellern Bilgrau
    Dec 29 '18 at 18:42











  • Thank you for trying. I have updated the question with dummy data and my amended code so that it works with the dummy code.

    – James Oliver
    Dec 29 '18 at 21:24














1












1








1







Without data, I have a hard time understanding precisely what you want to do.
A couple of things are (almost) certain however:




  • You probably do not need for loops.

  • You should used R's vectorized functions, the dataset is not that big


Using some toy data, does the following do what you want?



d <- 23
n <- 46e3

# Simulate some toy data
df <- data.frame(matrix(rbinom(d*n, 1, 0.1), n, d),
row.names = paste0("row", 1:n))
head(df)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23
row1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
row2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
row3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
row4 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0
row5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
row6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0



# Paste together the colnames of all non-zero rows
res <- apply(df == 1, 1, function(x) paste0(names(which(x)), collapse = "-"))
head(res)
# row1 row2 row3 row4 row5 row6
#"X8-X16" "X1" "X8-X20" "X4-X11-X20" "X7-X15" "X4-X18-X21"


I.e. res is here a character vector of length n with the colnames of each row the corresponding to 1 entries pasted together (with separator -). This it at least what it appears to me what your code is doing conceptually.






share|improve this answer















Without data, I have a hard time understanding precisely what you want to do.
A couple of things are (almost) certain however:




  • You probably do not need for loops.

  • You should used R's vectorized functions, the dataset is not that big


Using some toy data, does the following do what you want?



d <- 23
n <- 46e3

# Simulate some toy data
df <- data.frame(matrix(rbinom(d*n, 1, 0.1), n, d),
row.names = paste0("row", 1:n))
head(df)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23
row1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
row2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
row3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
row4 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0
row5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
row6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0



# Paste together the colnames of all non-zero rows
res <- apply(df == 1, 1, function(x) paste0(names(which(x)), collapse = "-"))
head(res)
# row1 row2 row3 row4 row5 row6
#"X8-X16" "X1" "X8-X20" "X4-X11-X20" "X7-X15" "X4-X18-X21"


I.e. res is here a character vector of length n with the colnames of each row the corresponding to 1 entries pasted together (with separator -). This it at least what it appears to me what your code is doing conceptually.







share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 29 '18 at 18:56

























answered Dec 29 '18 at 18:40









Anders Ellern BilgrauAnders Ellern Bilgrau

6,4231730




6,4231730








  • 1





    The OP wants colnames.

    – Rui Barradas
    Dec 29 '18 at 18:41











  • @RuiBarradas Arh, doh

    – Anders Ellern Bilgrau
    Dec 29 '18 at 18:42











  • Thank you for trying. I have updated the question with dummy data and my amended code so that it works with the dummy code.

    – James Oliver
    Dec 29 '18 at 21:24














  • 1





    The OP wants colnames.

    – Rui Barradas
    Dec 29 '18 at 18:41











  • @RuiBarradas Arh, doh

    – Anders Ellern Bilgrau
    Dec 29 '18 at 18:42











  • Thank you for trying. I have updated the question with dummy data and my amended code so that it works with the dummy code.

    – James Oliver
    Dec 29 '18 at 21:24








1




1





The OP wants colnames.

– Rui Barradas
Dec 29 '18 at 18:41





The OP wants colnames.

– Rui Barradas
Dec 29 '18 at 18:41













@RuiBarradas Arh, doh

– Anders Ellern Bilgrau
Dec 29 '18 at 18:42





@RuiBarradas Arh, doh

– Anders Ellern Bilgrau
Dec 29 '18 at 18:42













Thank you for trying. I have updated the question with dummy data and my amended code so that it works with the dummy code.

– James Oliver
Dec 29 '18 at 21:24





Thank you for trying. I have updated the question with dummy data and my amended code so that it works with the dummy code.

– James Oliver
Dec 29 '18 at 21:24


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53972034%2fhow-to-efficiently-return-all-the-column-names-across-1m-records-when-certain-co%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Mossoró

Error while reading .h5 file using the rhdf5 package in R

Pushsharp Apns notification error: 'InvalidToken'