Reshaping into long form when there are no column names for certain repetition
I have a following sample dataset, and I am trying to reshape to long form using data.table package (no dplyr solution please).
test_data<-structure(list(id = c("a", "b", "c", "d"), demo_d1_alt = c(2,
3, 4, 5), demo_d1_pre = c(5, 4, 5, 5), demo_d2_alt = c(4, 5,
6, 7), demo_d3_alt = c(8, 9, 10, 11), demo_d3_pre = c(5, 5, 5,
5)), .Names = c("id", "demo_d1_alt", "demo_d1_pre", "demo_d2_alt",
"demo_d3_alt", "demo_d3_pre"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))
id demo_d1_alt demo_d1_pre demo_d2_alt demo_d3_alt demo_d3_pre
a 2 5 4 8 5
b 3 4 5 9 5
c 4 5 6 10 5
d 5 5 7 11 5
Following is my code and the output:
library(data.table)
data_long <- melt.data.table(
setDT(test_data), id.vars = "id",
measure.vars = patterns(alt = "alt",
prec= "pre"))
data_long
id variable alt prec
1: a 1 2 5
2: b 1 3 4
3: c 1 4 5
4: d 1 5 5
5: a 2 4 5
6: b 2 5 5
7: c 2 6 5
8: d 2 7 5
9: a 3 8 NA
10: b 3 9 NA
11: c 3 10 NA
12: d 3 11 NA
My expected output is (since only for demo_d2 there is no prec)
data_long
id variable alt prec
1: a 1 2 5
2: b 1 3 4
3: c 1 4 5
4: d 1 5 5
5: a 2 4 NA
6: b 2 5 NA
7: c 2 6 NA
8: d 2 7 NA
9: a 3 8 5
10: b 3 9 5
11: c 3 10 5
12: d 3 11 5
Please suggest.
r data.table reshape
add a comment |
I have a following sample dataset, and I am trying to reshape to long form using data.table package (no dplyr solution please).
test_data<-structure(list(id = c("a", "b", "c", "d"), demo_d1_alt = c(2,
3, 4, 5), demo_d1_pre = c(5, 4, 5, 5), demo_d2_alt = c(4, 5,
6, 7), demo_d3_alt = c(8, 9, 10, 11), demo_d3_pre = c(5, 5, 5,
5)), .Names = c("id", "demo_d1_alt", "demo_d1_pre", "demo_d2_alt",
"demo_d3_alt", "demo_d3_pre"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))
id demo_d1_alt demo_d1_pre demo_d2_alt demo_d3_alt demo_d3_pre
a 2 5 4 8 5
b 3 4 5 9 5
c 4 5 6 10 5
d 5 5 7 11 5
Following is my code and the output:
library(data.table)
data_long <- melt.data.table(
setDT(test_data), id.vars = "id",
measure.vars = patterns(alt = "alt",
prec= "pre"))
data_long
id variable alt prec
1: a 1 2 5
2: b 1 3 4
3: c 1 4 5
4: d 1 5 5
5: a 2 4 5
6: b 2 5 5
7: c 2 6 5
8: d 2 7 5
9: a 3 8 NA
10: b 3 9 NA
11: c 3 10 NA
12: d 3 11 NA
My expected output is (since only for demo_d2 there is no prec)
data_long
id variable alt prec
1: a 1 2 5
2: b 1 3 4
3: c 1 4 5
4: d 1 5 5
5: a 2 4 NA
6: b 2 5 NA
7: c 2 6 NA
8: d 2 7 NA
9: a 3 8 5
10: b 3 9 5
11: c 3 10 5
12: d 3 11 5
Please suggest.
r data.table reshape
1
See this issue and a related discussion for what led to my draftReshapeLong_()function.
– A5C1D2H2I1M1N2O1R2T1
Mar 3 '18 at 16:45
add a comment |
I have a following sample dataset, and I am trying to reshape to long form using data.table package (no dplyr solution please).
test_data<-structure(list(id = c("a", "b", "c", "d"), demo_d1_alt = c(2,
3, 4, 5), demo_d1_pre = c(5, 4, 5, 5), demo_d2_alt = c(4, 5,
6, 7), demo_d3_alt = c(8, 9, 10, 11), demo_d3_pre = c(5, 5, 5,
5)), .Names = c("id", "demo_d1_alt", "demo_d1_pre", "demo_d2_alt",
"demo_d3_alt", "demo_d3_pre"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))
id demo_d1_alt demo_d1_pre demo_d2_alt demo_d3_alt demo_d3_pre
a 2 5 4 8 5
b 3 4 5 9 5
c 4 5 6 10 5
d 5 5 7 11 5
Following is my code and the output:
library(data.table)
data_long <- melt.data.table(
setDT(test_data), id.vars = "id",
measure.vars = patterns(alt = "alt",
prec= "pre"))
data_long
id variable alt prec
1: a 1 2 5
2: b 1 3 4
3: c 1 4 5
4: d 1 5 5
5: a 2 4 5
6: b 2 5 5
7: c 2 6 5
8: d 2 7 5
9: a 3 8 NA
10: b 3 9 NA
11: c 3 10 NA
12: d 3 11 NA
My expected output is (since only for demo_d2 there is no prec)
data_long
id variable alt prec
1: a 1 2 5
2: b 1 3 4
3: c 1 4 5
4: d 1 5 5
5: a 2 4 NA
6: b 2 5 NA
7: c 2 6 NA
8: d 2 7 NA
9: a 3 8 5
10: b 3 9 5
11: c 3 10 5
12: d 3 11 5
Please suggest.
r data.table reshape
I have a following sample dataset, and I am trying to reshape to long form using data.table package (no dplyr solution please).
test_data<-structure(list(id = c("a", "b", "c", "d"), demo_d1_alt = c(2,
3, 4, 5), demo_d1_pre = c(5, 4, 5, 5), demo_d2_alt = c(4, 5,
6, 7), demo_d3_alt = c(8, 9, 10, 11), demo_d3_pre = c(5, 5, 5,
5)), .Names = c("id", "demo_d1_alt", "demo_d1_pre", "demo_d2_alt",
"demo_d3_alt", "demo_d3_pre"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))
id demo_d1_alt demo_d1_pre demo_d2_alt demo_d3_alt demo_d3_pre
a 2 5 4 8 5
b 3 4 5 9 5
c 4 5 6 10 5
d 5 5 7 11 5
Following is my code and the output:
library(data.table)
data_long <- melt.data.table(
setDT(test_data), id.vars = "id",
measure.vars = patterns(alt = "alt",
prec= "pre"))
data_long
id variable alt prec
1: a 1 2 5
2: b 1 3 4
3: c 1 4 5
4: d 1 5 5
5: a 2 4 5
6: b 2 5 5
7: c 2 6 5
8: d 2 7 5
9: a 3 8 NA
10: b 3 9 NA
11: c 3 10 NA
12: d 3 11 NA
My expected output is (since only for demo_d2 there is no prec)
data_long
id variable alt prec
1: a 1 2 5
2: b 1 3 4
3: c 1 4 5
4: d 1 5 5
5: a 2 4 NA
6: b 2 5 NA
7: c 2 6 NA
8: d 2 7 NA
9: a 3 8 5
10: b 3 9 5
11: c 3 10 5
12: d 3 11 5
Please suggest.
r data.table reshape
r data.table reshape
asked Feb 27 '18 at 6:56
MetricsMetrics
11.3k43871
11.3k43871
1
See this issue and a related discussion for what led to my draftReshapeLong_()function.
– A5C1D2H2I1M1N2O1R2T1
Mar 3 '18 at 16:45
add a comment |
1
See this issue and a related discussion for what led to my draftReshapeLong_()function.
– A5C1D2H2I1M1N2O1R2T1
Mar 3 '18 at 16:45
1
1
See this issue and a related discussion for what led to my draft
ReshapeLong_() function.– A5C1D2H2I1M1N2O1R2T1
Mar 3 '18 at 16:45
See this issue and a related discussion for what led to my draft
ReshapeLong_() function.– A5C1D2H2I1M1N2O1R2T1
Mar 3 '18 at 16:45
add a comment |
3 Answers
3
active
oldest
votes
One option would be to do a melt/dcast
dcast(melt(setDT(test_data), measure = patterns("d\d+"))[,
c("var1", "var2") := tstrsplit(variable, "_")[-1]],
id + var1 ~ var2, value.var = 'value')[order(var1)]
# id var1 alt pre
# 1: a d1 2 5
# 2: b d1 3 4
# 3: c d1 4 5
# 4: d d1 5 5
# 5: a d2 4 NA
# 6: b d2 5 NA
# 7: c d2 6 NA
# 8: d d2 7 NA
# 9: a d3 8 5
#10: b d3 9 5
#11: c d3 10 5
#12: d d3 11 5
Thank you. So,meltalone can't handle it?.
– Metrics
Feb 27 '18 at 7:31
1
@Metrics For your code to work, there should be corresponding pairwise column. Here, we have the d3_pre at the end. So, it thinks that is is the pairwise column for the 'd2_alt'. To avoid that create a column of NAs, i.e.cbind(test_data[1:4], demo_d2_pre = NA_real_, test_data[5:6])before converting to data.table
– akrun
Feb 27 '18 at 7:38
1
Thanks a lot. That worked since I have only one variable to be created.
– Metrics
Feb 27 '18 at 7:50
add a comment |
Currently, most R approaches don't handle these types of unbalanced datasets very neatly, but instead require you to first convert the data into a "long" format, do some transformation, and then go back into a "wide" format. For this, I had written the merged.stack function as part of my "splitstackshape" package, so named because it would first stack related columns of data into a list and then merge them together.
The approach would be something like:
library(splitstackshape)
merged.stack(test_data, var.stubs = c("_alt", "_pre"), sep = "var.stubs", atStart = FALSE)
# id .time_1 _alt _pre
# 1: a demo_d1 2 5
# 2: a demo_d2 4 NA
# 3: a demo_d3 8 5
# 4: b demo_d1 3 4
# 5: b demo_d2 5 NA
# 6: b demo_d3 9 5
# 7: c demo_d1 4 5
# 8: c demo_d2 6 NA
# 9: c demo_d3 10 5
# 10: d demo_d1 5 5
# 11: d demo_d2 7 NA
# 12: d demo_d3 11 5
However, with the improvements that "data.table" has brought to melt, it made sense to revisit the approach.
The working draft can be found here, and you can load the relevant functions with:
source("https://gist.githubusercontent.com/mrdwab/ceded54b616bdc61cb0f/raw/c5a583267f8682a5afea1b551347eafdab618471/MELT.R")
Usage is straightforward: specify the dataset, the stubs, and whether the stubs are at the start or at the end of the variable names.
library(data.table)
ReshapeLong_(test_data, stubs = c(alt = "_alt", pre = "_pre"), end_stub = TRUE)
# id variable alt pre
# 1: a demo_d1 2 5
# 2: b demo_d1 3 4
# 3: c demo_d1 4 5
# 4: d demo_d1 5 5
# 5: a demo_d2 4 NA
# 6: b demo_d2 5 NA
# 7: c demo_d2 6 NA
# 8: d demo_d2 7 NA
# 9: a demo_d3 8 5
# 10: b demo_d3 9 5
# 11: c demo_d3 10 5
# 12: d demo_d3 11 5
Thank you for the answer.
– Metrics
Mar 5 '18 at 5:55
add a comment |
Simple hack would be
test_data <- as.data.table(test_data)
test_data[,.NA:=NA]
melt(test_data, id='id',
measure = list(c('demo_d1_alt', 'demo_d2_alt', 'demo_d3_alt'),
c('demo_d1_pre', '.NA', 'demo_d3_pre')))
align the variables by hand and insert '.NA' where no observation.
If there are mulitple missing variables,
test_data <- as.data.table(test_data)
test_data[,.NA:=NA]
melt(test_data, id='id',
measure = list(c('demo_d1_alt', 'demo_d2_alt', '.NA'),
c('demo_d1_pre', '.NA', 'demo_d3_pre')))
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f49002855%2freshaping-into-long-form-when-there-are-no-column-names-for-certain-repetition%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
One option would be to do a melt/dcast
dcast(melt(setDT(test_data), measure = patterns("d\d+"))[,
c("var1", "var2") := tstrsplit(variable, "_")[-1]],
id + var1 ~ var2, value.var = 'value')[order(var1)]
# id var1 alt pre
# 1: a d1 2 5
# 2: b d1 3 4
# 3: c d1 4 5
# 4: d d1 5 5
# 5: a d2 4 NA
# 6: b d2 5 NA
# 7: c d2 6 NA
# 8: d d2 7 NA
# 9: a d3 8 5
#10: b d3 9 5
#11: c d3 10 5
#12: d d3 11 5
Thank you. So,meltalone can't handle it?.
– Metrics
Feb 27 '18 at 7:31
1
@Metrics For your code to work, there should be corresponding pairwise column. Here, we have the d3_pre at the end. So, it thinks that is is the pairwise column for the 'd2_alt'. To avoid that create a column of NAs, i.e.cbind(test_data[1:4], demo_d2_pre = NA_real_, test_data[5:6])before converting to data.table
– akrun
Feb 27 '18 at 7:38
1
Thanks a lot. That worked since I have only one variable to be created.
– Metrics
Feb 27 '18 at 7:50
add a comment |
One option would be to do a melt/dcast
dcast(melt(setDT(test_data), measure = patterns("d\d+"))[,
c("var1", "var2") := tstrsplit(variable, "_")[-1]],
id + var1 ~ var2, value.var = 'value')[order(var1)]
# id var1 alt pre
# 1: a d1 2 5
# 2: b d1 3 4
# 3: c d1 4 5
# 4: d d1 5 5
# 5: a d2 4 NA
# 6: b d2 5 NA
# 7: c d2 6 NA
# 8: d d2 7 NA
# 9: a d3 8 5
#10: b d3 9 5
#11: c d3 10 5
#12: d d3 11 5
Thank you. So,meltalone can't handle it?.
– Metrics
Feb 27 '18 at 7:31
1
@Metrics For your code to work, there should be corresponding pairwise column. Here, we have the d3_pre at the end. So, it thinks that is is the pairwise column for the 'd2_alt'. To avoid that create a column of NAs, i.e.cbind(test_data[1:4], demo_d2_pre = NA_real_, test_data[5:6])before converting to data.table
– akrun
Feb 27 '18 at 7:38
1
Thanks a lot. That worked since I have only one variable to be created.
– Metrics
Feb 27 '18 at 7:50
add a comment |
One option would be to do a melt/dcast
dcast(melt(setDT(test_data), measure = patterns("d\d+"))[,
c("var1", "var2") := tstrsplit(variable, "_")[-1]],
id + var1 ~ var2, value.var = 'value')[order(var1)]
# id var1 alt pre
# 1: a d1 2 5
# 2: b d1 3 4
# 3: c d1 4 5
# 4: d d1 5 5
# 5: a d2 4 NA
# 6: b d2 5 NA
# 7: c d2 6 NA
# 8: d d2 7 NA
# 9: a d3 8 5
#10: b d3 9 5
#11: c d3 10 5
#12: d d3 11 5
One option would be to do a melt/dcast
dcast(melt(setDT(test_data), measure = patterns("d\d+"))[,
c("var1", "var2") := tstrsplit(variable, "_")[-1]],
id + var1 ~ var2, value.var = 'value')[order(var1)]
# id var1 alt pre
# 1: a d1 2 5
# 2: b d1 3 4
# 3: c d1 4 5
# 4: d d1 5 5
# 5: a d2 4 NA
# 6: b d2 5 NA
# 7: c d2 6 NA
# 8: d d2 7 NA
# 9: a d3 8 5
#10: b d3 9 5
#11: c d3 10 5
#12: d d3 11 5
answered Feb 27 '18 at 7:11
akrunakrun
403k13196269
403k13196269
Thank you. So,meltalone can't handle it?.
– Metrics
Feb 27 '18 at 7:31
1
@Metrics For your code to work, there should be corresponding pairwise column. Here, we have the d3_pre at the end. So, it thinks that is is the pairwise column for the 'd2_alt'. To avoid that create a column of NAs, i.e.cbind(test_data[1:4], demo_d2_pre = NA_real_, test_data[5:6])before converting to data.table
– akrun
Feb 27 '18 at 7:38
1
Thanks a lot. That worked since I have only one variable to be created.
– Metrics
Feb 27 '18 at 7:50
add a comment |
Thank you. So,meltalone can't handle it?.
– Metrics
Feb 27 '18 at 7:31
1
@Metrics For your code to work, there should be corresponding pairwise column. Here, we have the d3_pre at the end. So, it thinks that is is the pairwise column for the 'd2_alt'. To avoid that create a column of NAs, i.e.cbind(test_data[1:4], demo_d2_pre = NA_real_, test_data[5:6])before converting to data.table
– akrun
Feb 27 '18 at 7:38
1
Thanks a lot. That worked since I have only one variable to be created.
– Metrics
Feb 27 '18 at 7:50
Thank you. So,
melt alone can't handle it?.– Metrics
Feb 27 '18 at 7:31
Thank you. So,
melt alone can't handle it?.– Metrics
Feb 27 '18 at 7:31
1
1
@Metrics For your code to work, there should be corresponding pairwise column. Here, we have the d3_pre at the end. So, it thinks that is is the pairwise column for the 'd2_alt'. To avoid that create a column of NAs, i.e.
cbind(test_data[1:4], demo_d2_pre = NA_real_, test_data[5:6]) before converting to data.table– akrun
Feb 27 '18 at 7:38
@Metrics For your code to work, there should be corresponding pairwise column. Here, we have the d3_pre at the end. So, it thinks that is is the pairwise column for the 'd2_alt'. To avoid that create a column of NAs, i.e.
cbind(test_data[1:4], demo_d2_pre = NA_real_, test_data[5:6]) before converting to data.table– akrun
Feb 27 '18 at 7:38
1
1
Thanks a lot. That worked since I have only one variable to be created.
– Metrics
Feb 27 '18 at 7:50
Thanks a lot. That worked since I have only one variable to be created.
– Metrics
Feb 27 '18 at 7:50
add a comment |
Currently, most R approaches don't handle these types of unbalanced datasets very neatly, but instead require you to first convert the data into a "long" format, do some transformation, and then go back into a "wide" format. For this, I had written the merged.stack function as part of my "splitstackshape" package, so named because it would first stack related columns of data into a list and then merge them together.
The approach would be something like:
library(splitstackshape)
merged.stack(test_data, var.stubs = c("_alt", "_pre"), sep = "var.stubs", atStart = FALSE)
# id .time_1 _alt _pre
# 1: a demo_d1 2 5
# 2: a demo_d2 4 NA
# 3: a demo_d3 8 5
# 4: b demo_d1 3 4
# 5: b demo_d2 5 NA
# 6: b demo_d3 9 5
# 7: c demo_d1 4 5
# 8: c demo_d2 6 NA
# 9: c demo_d3 10 5
# 10: d demo_d1 5 5
# 11: d demo_d2 7 NA
# 12: d demo_d3 11 5
However, with the improvements that "data.table" has brought to melt, it made sense to revisit the approach.
The working draft can be found here, and you can load the relevant functions with:
source("https://gist.githubusercontent.com/mrdwab/ceded54b616bdc61cb0f/raw/c5a583267f8682a5afea1b551347eafdab618471/MELT.R")
Usage is straightforward: specify the dataset, the stubs, and whether the stubs are at the start or at the end of the variable names.
library(data.table)
ReshapeLong_(test_data, stubs = c(alt = "_alt", pre = "_pre"), end_stub = TRUE)
# id variable alt pre
# 1: a demo_d1 2 5
# 2: b demo_d1 3 4
# 3: c demo_d1 4 5
# 4: d demo_d1 5 5
# 5: a demo_d2 4 NA
# 6: b demo_d2 5 NA
# 7: c demo_d2 6 NA
# 8: d demo_d2 7 NA
# 9: a demo_d3 8 5
# 10: b demo_d3 9 5
# 11: c demo_d3 10 5
# 12: d demo_d3 11 5
Thank you for the answer.
– Metrics
Mar 5 '18 at 5:55
add a comment |
Currently, most R approaches don't handle these types of unbalanced datasets very neatly, but instead require you to first convert the data into a "long" format, do some transformation, and then go back into a "wide" format. For this, I had written the merged.stack function as part of my "splitstackshape" package, so named because it would first stack related columns of data into a list and then merge them together.
The approach would be something like:
library(splitstackshape)
merged.stack(test_data, var.stubs = c("_alt", "_pre"), sep = "var.stubs", atStart = FALSE)
# id .time_1 _alt _pre
# 1: a demo_d1 2 5
# 2: a demo_d2 4 NA
# 3: a demo_d3 8 5
# 4: b demo_d1 3 4
# 5: b demo_d2 5 NA
# 6: b demo_d3 9 5
# 7: c demo_d1 4 5
# 8: c demo_d2 6 NA
# 9: c demo_d3 10 5
# 10: d demo_d1 5 5
# 11: d demo_d2 7 NA
# 12: d demo_d3 11 5
However, with the improvements that "data.table" has brought to melt, it made sense to revisit the approach.
The working draft can be found here, and you can load the relevant functions with:
source("https://gist.githubusercontent.com/mrdwab/ceded54b616bdc61cb0f/raw/c5a583267f8682a5afea1b551347eafdab618471/MELT.R")
Usage is straightforward: specify the dataset, the stubs, and whether the stubs are at the start or at the end of the variable names.
library(data.table)
ReshapeLong_(test_data, stubs = c(alt = "_alt", pre = "_pre"), end_stub = TRUE)
# id variable alt pre
# 1: a demo_d1 2 5
# 2: b demo_d1 3 4
# 3: c demo_d1 4 5
# 4: d demo_d1 5 5
# 5: a demo_d2 4 NA
# 6: b demo_d2 5 NA
# 7: c demo_d2 6 NA
# 8: d demo_d2 7 NA
# 9: a demo_d3 8 5
# 10: b demo_d3 9 5
# 11: c demo_d3 10 5
# 12: d demo_d3 11 5
Thank you for the answer.
– Metrics
Mar 5 '18 at 5:55
add a comment |
Currently, most R approaches don't handle these types of unbalanced datasets very neatly, but instead require you to first convert the data into a "long" format, do some transformation, and then go back into a "wide" format. For this, I had written the merged.stack function as part of my "splitstackshape" package, so named because it would first stack related columns of data into a list and then merge them together.
The approach would be something like:
library(splitstackshape)
merged.stack(test_data, var.stubs = c("_alt", "_pre"), sep = "var.stubs", atStart = FALSE)
# id .time_1 _alt _pre
# 1: a demo_d1 2 5
# 2: a demo_d2 4 NA
# 3: a demo_d3 8 5
# 4: b demo_d1 3 4
# 5: b demo_d2 5 NA
# 6: b demo_d3 9 5
# 7: c demo_d1 4 5
# 8: c demo_d2 6 NA
# 9: c demo_d3 10 5
# 10: d demo_d1 5 5
# 11: d demo_d2 7 NA
# 12: d demo_d3 11 5
However, with the improvements that "data.table" has brought to melt, it made sense to revisit the approach.
The working draft can be found here, and you can load the relevant functions with:
source("https://gist.githubusercontent.com/mrdwab/ceded54b616bdc61cb0f/raw/c5a583267f8682a5afea1b551347eafdab618471/MELT.R")
Usage is straightforward: specify the dataset, the stubs, and whether the stubs are at the start or at the end of the variable names.
library(data.table)
ReshapeLong_(test_data, stubs = c(alt = "_alt", pre = "_pre"), end_stub = TRUE)
# id variable alt pre
# 1: a demo_d1 2 5
# 2: b demo_d1 3 4
# 3: c demo_d1 4 5
# 4: d demo_d1 5 5
# 5: a demo_d2 4 NA
# 6: b demo_d2 5 NA
# 7: c demo_d2 6 NA
# 8: d demo_d2 7 NA
# 9: a demo_d3 8 5
# 10: b demo_d3 9 5
# 11: c demo_d3 10 5
# 12: d demo_d3 11 5
Currently, most R approaches don't handle these types of unbalanced datasets very neatly, but instead require you to first convert the data into a "long" format, do some transformation, and then go back into a "wide" format. For this, I had written the merged.stack function as part of my "splitstackshape" package, so named because it would first stack related columns of data into a list and then merge them together.
The approach would be something like:
library(splitstackshape)
merged.stack(test_data, var.stubs = c("_alt", "_pre"), sep = "var.stubs", atStart = FALSE)
# id .time_1 _alt _pre
# 1: a demo_d1 2 5
# 2: a demo_d2 4 NA
# 3: a demo_d3 8 5
# 4: b demo_d1 3 4
# 5: b demo_d2 5 NA
# 6: b demo_d3 9 5
# 7: c demo_d1 4 5
# 8: c demo_d2 6 NA
# 9: c demo_d3 10 5
# 10: d demo_d1 5 5
# 11: d demo_d2 7 NA
# 12: d demo_d3 11 5
However, with the improvements that "data.table" has brought to melt, it made sense to revisit the approach.
The working draft can be found here, and you can load the relevant functions with:
source("https://gist.githubusercontent.com/mrdwab/ceded54b616bdc61cb0f/raw/c5a583267f8682a5afea1b551347eafdab618471/MELT.R")
Usage is straightforward: specify the dataset, the stubs, and whether the stubs are at the start or at the end of the variable names.
library(data.table)
ReshapeLong_(test_data, stubs = c(alt = "_alt", pre = "_pre"), end_stub = TRUE)
# id variable alt pre
# 1: a demo_d1 2 5
# 2: b demo_d1 3 4
# 3: c demo_d1 4 5
# 4: d demo_d1 5 5
# 5: a demo_d2 4 NA
# 6: b demo_d2 5 NA
# 7: c demo_d2 6 NA
# 8: d demo_d2 7 NA
# 9: a demo_d3 8 5
# 10: b demo_d3 9 5
# 11: c demo_d3 10 5
# 12: d demo_d3 11 5
answered Mar 3 '18 at 16:43
A5C1D2H2I1M1N2O1R2T1A5C1D2H2I1M1N2O1R2T1
153k18284378
153k18284378
Thank you for the answer.
– Metrics
Mar 5 '18 at 5:55
add a comment |
Thank you for the answer.
– Metrics
Mar 5 '18 at 5:55
Thank you for the answer.
– Metrics
Mar 5 '18 at 5:55
Thank you for the answer.
– Metrics
Mar 5 '18 at 5:55
add a comment |
Simple hack would be
test_data <- as.data.table(test_data)
test_data[,.NA:=NA]
melt(test_data, id='id',
measure = list(c('demo_d1_alt', 'demo_d2_alt', 'demo_d3_alt'),
c('demo_d1_pre', '.NA', 'demo_d3_pre')))
align the variables by hand and insert '.NA' where no observation.
If there are mulitple missing variables,
test_data <- as.data.table(test_data)
test_data[,.NA:=NA]
melt(test_data, id='id',
measure = list(c('demo_d1_alt', 'demo_d2_alt', '.NA'),
c('demo_d1_pre', '.NA', 'demo_d3_pre')))
add a comment |
Simple hack would be
test_data <- as.data.table(test_data)
test_data[,.NA:=NA]
melt(test_data, id='id',
measure = list(c('demo_d1_alt', 'demo_d2_alt', 'demo_d3_alt'),
c('demo_d1_pre', '.NA', 'demo_d3_pre')))
align the variables by hand and insert '.NA' where no observation.
If there are mulitple missing variables,
test_data <- as.data.table(test_data)
test_data[,.NA:=NA]
melt(test_data, id='id',
measure = list(c('demo_d1_alt', 'demo_d2_alt', '.NA'),
c('demo_d1_pre', '.NA', 'demo_d3_pre')))
add a comment |
Simple hack would be
test_data <- as.data.table(test_data)
test_data[,.NA:=NA]
melt(test_data, id='id',
measure = list(c('demo_d1_alt', 'demo_d2_alt', 'demo_d3_alt'),
c('demo_d1_pre', '.NA', 'demo_d3_pre')))
align the variables by hand and insert '.NA' where no observation.
If there are mulitple missing variables,
test_data <- as.data.table(test_data)
test_data[,.NA:=NA]
melt(test_data, id='id',
measure = list(c('demo_d1_alt', 'demo_d2_alt', '.NA'),
c('demo_d1_pre', '.NA', 'demo_d3_pre')))
Simple hack would be
test_data <- as.data.table(test_data)
test_data[,.NA:=NA]
melt(test_data, id='id',
measure = list(c('demo_d1_alt', 'demo_d2_alt', 'demo_d3_alt'),
c('demo_d1_pre', '.NA', 'demo_d3_pre')))
align the variables by hand and insert '.NA' where no observation.
If there are mulitple missing variables,
test_data <- as.data.table(test_data)
test_data[,.NA:=NA]
melt(test_data, id='id',
measure = list(c('demo_d1_alt', 'demo_d2_alt', '.NA'),
c('demo_d1_pre', '.NA', 'demo_d3_pre')))
answered Dec 30 '18 at 4:15
KH KimKH Kim
3681311
3681311
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f49002855%2freshaping-into-long-form-when-there-are-no-column-names-for-certain-repetition%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
See this issue and a related discussion for what led to my draft
ReshapeLong_()function.– A5C1D2H2I1M1N2O1R2T1
Mar 3 '18 at 16:45