For data frames already in 'long' format, how can one make a measured variable an ID variable without...
I am making calls to a database using a package (brapi) that return the necessary data in long format. The api treats certain ID variables as measured variables which is problematic because my analysis scripts need them as ID variables. I can't modify the call to the database in such a way that it will do it for me and I do not want to just reshape2::cast and then melt the entire thing because there are hundreds of variables which would make it expensive. Ideally, I am looking for a way to do this with reshape2, plyr, or the like.
My problem can be seen using the mtcars built-in data frame. Imagine you get the long data frame, meltedcars, with gear as an ID variable from the database but you want cyl as an ID variable as well. I tried obvious dcast commands (eg shown below) but I knew it wouldn't work because the function won't find cyl as a variable.
meltedcars<-melt(mtcars, id.vars = c("gear"))
head(meltedcars)
gear variable value
1 4 mpg 21.0
2 4 mpg 21.0
3 4 mpg 22.8
4 3 mpg 21.4
5 3 mpg 18.7
6 3 mpg 18.1
c<-dcast(d, gear + cyl ~ variable, value.var= "value")
Error in FUN(X[[i]], ...) : object 'cyl' not found
I have tried a number of iterations of the dcast function above to no avail. I have looked for similar problems for quite sometime but haven't found a fit. I know I could do this with some for loops or by melting and casting all over again, but I want to see if there is a more elegant solution. Thoughts?
EDIT
I should be clearer. For the purposes of this post, ID columns are the variables in long format that are used to identify the object that is being observed rather than what was being measured. Let's say we were measuring mpg and disp on all the cars in the mtcars data frame and we wanted additional identifying information about the car besides the model name in the data frame as its own column. To identify the car you have a column of car names (I have adjusted mtcars so that the row names are now a column within mtcars called model), a column of what gear they are, and one for how many cyl each car has. Then we have a variable column and value column where the kind of observation and the value of it are listed respectively. When I make the call to the database this is what I get:
head(mtcarsFromDB)
model gear variable value
1 Mazda RX4 4 mpg 21.0
2 Mazda RX4 Wag 4 mpg 21.0
3 Datsun 710 4 mpg 22.8
4 Hornet 4 Drive 3 mpg 21.4
5 Hornet Sportabout 3 mpg 18.7
6 Valiant 3 mpg 18.1
But I want to reformat this df with cyl, which is currently a value of variable, as an ID column like gear without casting and melting the entire thing. It should look like this:
model gear cyl variable value
1 Mazda RX4 4 6 mpg 21.0
2 Mazda RX4 Wag 4 6 mpg 21.0
3 Datsun 710 4 4 mpg 22.8
4 Hornet 4 Drive 3 6 mpg 21.4
5 Hornet Sportabout 3 8 mpg 18.7
6 Valiant 3 6 mpg 18.1
r dataframe
add a comment |
I am making calls to a database using a package (brapi) that return the necessary data in long format. The api treats certain ID variables as measured variables which is problematic because my analysis scripts need them as ID variables. I can't modify the call to the database in such a way that it will do it for me and I do not want to just reshape2::cast and then melt the entire thing because there are hundreds of variables which would make it expensive. Ideally, I am looking for a way to do this with reshape2, plyr, or the like.
My problem can be seen using the mtcars built-in data frame. Imagine you get the long data frame, meltedcars, with gear as an ID variable from the database but you want cyl as an ID variable as well. I tried obvious dcast commands (eg shown below) but I knew it wouldn't work because the function won't find cyl as a variable.
meltedcars<-melt(mtcars, id.vars = c("gear"))
head(meltedcars)
gear variable value
1 4 mpg 21.0
2 4 mpg 21.0
3 4 mpg 22.8
4 3 mpg 21.4
5 3 mpg 18.7
6 3 mpg 18.1
c<-dcast(d, gear + cyl ~ variable, value.var= "value")
Error in FUN(X[[i]], ...) : object 'cyl' not found
I have tried a number of iterations of the dcast function above to no avail. I have looked for similar problems for quite sometime but haven't found a fit. I know I could do this with some for loops or by melting and casting all over again, but I want to see if there is a more elegant solution. Thoughts?
EDIT
I should be clearer. For the purposes of this post, ID columns are the variables in long format that are used to identify the object that is being observed rather than what was being measured. Let's say we were measuring mpg and disp on all the cars in the mtcars data frame and we wanted additional identifying information about the car besides the model name in the data frame as its own column. To identify the car you have a column of car names (I have adjusted mtcars so that the row names are now a column within mtcars called model), a column of what gear they are, and one for how many cyl each car has. Then we have a variable column and value column where the kind of observation and the value of it are listed respectively. When I make the call to the database this is what I get:
head(mtcarsFromDB)
model gear variable value
1 Mazda RX4 4 mpg 21.0
2 Mazda RX4 Wag 4 mpg 21.0
3 Datsun 710 4 mpg 22.8
4 Hornet 4 Drive 3 mpg 21.4
5 Hornet Sportabout 3 mpg 18.7
6 Valiant 3 mpg 18.1
But I want to reformat this df with cyl, which is currently a value of variable, as an ID column like gear without casting and melting the entire thing. It should look like this:
model gear cyl variable value
1 Mazda RX4 4 6 mpg 21.0
2 Mazda RX4 Wag 4 6 mpg 21.0
3 Datsun 710 4 4 mpg 22.8
4 Hornet 4 Drive 3 6 mpg 21.4
5 Hornet Sportabout 3 8 mpg 18.7
6 Valiant 3 6 mpg 18.1
r dataframe
4
If you check themeltedcars, there is nocylcolumn.id.varscan take multiple columnsmeltedcars<-melt(mtcars, id.vars = c("gear", "cyl"))
– akrun
Dec 31 '18 at 14:42
if you have several columns as ID identifier, maybe simplyinteraction(x,y)for your ID will work. Or usetidyr::uniteto create a new ID column
– Tjebo
Dec 31 '18 at 14:59
@akrun, the problem is not the melting. I am receiving the data frame in that melted format from the database with gears as an ID column but with "cyl" in the "variable" column much like mpg above. I want to turn the "cyl" into a header (like "gear") post facto. That is after I have received it in that melted format.
– Sam Kruse
Dec 31 '18 at 17:32
For clarity, will you include the rows with variable=="cyl" in your first example data set? It still looks like there isn't any cyl in what you get.
– Aaron
Dec 31 '18 at 18:13
add a comment |
I am making calls to a database using a package (brapi) that return the necessary data in long format. The api treats certain ID variables as measured variables which is problematic because my analysis scripts need them as ID variables. I can't modify the call to the database in such a way that it will do it for me and I do not want to just reshape2::cast and then melt the entire thing because there are hundreds of variables which would make it expensive. Ideally, I am looking for a way to do this with reshape2, plyr, or the like.
My problem can be seen using the mtcars built-in data frame. Imagine you get the long data frame, meltedcars, with gear as an ID variable from the database but you want cyl as an ID variable as well. I tried obvious dcast commands (eg shown below) but I knew it wouldn't work because the function won't find cyl as a variable.
meltedcars<-melt(mtcars, id.vars = c("gear"))
head(meltedcars)
gear variable value
1 4 mpg 21.0
2 4 mpg 21.0
3 4 mpg 22.8
4 3 mpg 21.4
5 3 mpg 18.7
6 3 mpg 18.1
c<-dcast(d, gear + cyl ~ variable, value.var= "value")
Error in FUN(X[[i]], ...) : object 'cyl' not found
I have tried a number of iterations of the dcast function above to no avail. I have looked for similar problems for quite sometime but haven't found a fit. I know I could do this with some for loops or by melting and casting all over again, but I want to see if there is a more elegant solution. Thoughts?
EDIT
I should be clearer. For the purposes of this post, ID columns are the variables in long format that are used to identify the object that is being observed rather than what was being measured. Let's say we were measuring mpg and disp on all the cars in the mtcars data frame and we wanted additional identifying information about the car besides the model name in the data frame as its own column. To identify the car you have a column of car names (I have adjusted mtcars so that the row names are now a column within mtcars called model), a column of what gear they are, and one for how many cyl each car has. Then we have a variable column and value column where the kind of observation and the value of it are listed respectively. When I make the call to the database this is what I get:
head(mtcarsFromDB)
model gear variable value
1 Mazda RX4 4 mpg 21.0
2 Mazda RX4 Wag 4 mpg 21.0
3 Datsun 710 4 mpg 22.8
4 Hornet 4 Drive 3 mpg 21.4
5 Hornet Sportabout 3 mpg 18.7
6 Valiant 3 mpg 18.1
But I want to reformat this df with cyl, which is currently a value of variable, as an ID column like gear without casting and melting the entire thing. It should look like this:
model gear cyl variable value
1 Mazda RX4 4 6 mpg 21.0
2 Mazda RX4 Wag 4 6 mpg 21.0
3 Datsun 710 4 4 mpg 22.8
4 Hornet 4 Drive 3 6 mpg 21.4
5 Hornet Sportabout 3 8 mpg 18.7
6 Valiant 3 6 mpg 18.1
r dataframe
I am making calls to a database using a package (brapi) that return the necessary data in long format. The api treats certain ID variables as measured variables which is problematic because my analysis scripts need them as ID variables. I can't modify the call to the database in such a way that it will do it for me and I do not want to just reshape2::cast and then melt the entire thing because there are hundreds of variables which would make it expensive. Ideally, I am looking for a way to do this with reshape2, plyr, or the like.
My problem can be seen using the mtcars built-in data frame. Imagine you get the long data frame, meltedcars, with gear as an ID variable from the database but you want cyl as an ID variable as well. I tried obvious dcast commands (eg shown below) but I knew it wouldn't work because the function won't find cyl as a variable.
meltedcars<-melt(mtcars, id.vars = c("gear"))
head(meltedcars)
gear variable value
1 4 mpg 21.0
2 4 mpg 21.0
3 4 mpg 22.8
4 3 mpg 21.4
5 3 mpg 18.7
6 3 mpg 18.1
c<-dcast(d, gear + cyl ~ variable, value.var= "value")
Error in FUN(X[[i]], ...) : object 'cyl' not found
I have tried a number of iterations of the dcast function above to no avail. I have looked for similar problems for quite sometime but haven't found a fit. I know I could do this with some for loops or by melting and casting all over again, but I want to see if there is a more elegant solution. Thoughts?
EDIT
I should be clearer. For the purposes of this post, ID columns are the variables in long format that are used to identify the object that is being observed rather than what was being measured. Let's say we were measuring mpg and disp on all the cars in the mtcars data frame and we wanted additional identifying information about the car besides the model name in the data frame as its own column. To identify the car you have a column of car names (I have adjusted mtcars so that the row names are now a column within mtcars called model), a column of what gear they are, and one for how many cyl each car has. Then we have a variable column and value column where the kind of observation and the value of it are listed respectively. When I make the call to the database this is what I get:
head(mtcarsFromDB)
model gear variable value
1 Mazda RX4 4 mpg 21.0
2 Mazda RX4 Wag 4 mpg 21.0
3 Datsun 710 4 mpg 22.8
4 Hornet 4 Drive 3 mpg 21.4
5 Hornet Sportabout 3 mpg 18.7
6 Valiant 3 mpg 18.1
But I want to reformat this df with cyl, which is currently a value of variable, as an ID column like gear without casting and melting the entire thing. It should look like this:
model gear cyl variable value
1 Mazda RX4 4 6 mpg 21.0
2 Mazda RX4 Wag 4 6 mpg 21.0
3 Datsun 710 4 4 mpg 22.8
4 Hornet 4 Drive 3 6 mpg 21.4
5 Hornet Sportabout 3 8 mpg 18.7
6 Valiant 3 6 mpg 18.1
r dataframe
r dataframe
edited Dec 31 '18 at 18:03
Sam Kruse
asked Dec 31 '18 at 14:41
Sam KruseSam Kruse
32
32
4
If you check themeltedcars, there is nocylcolumn.id.varscan take multiple columnsmeltedcars<-melt(mtcars, id.vars = c("gear", "cyl"))
– akrun
Dec 31 '18 at 14:42
if you have several columns as ID identifier, maybe simplyinteraction(x,y)for your ID will work. Or usetidyr::uniteto create a new ID column
– Tjebo
Dec 31 '18 at 14:59
@akrun, the problem is not the melting. I am receiving the data frame in that melted format from the database with gears as an ID column but with "cyl" in the "variable" column much like mpg above. I want to turn the "cyl" into a header (like "gear") post facto. That is after I have received it in that melted format.
– Sam Kruse
Dec 31 '18 at 17:32
For clarity, will you include the rows with variable=="cyl" in your first example data set? It still looks like there isn't any cyl in what you get.
– Aaron
Dec 31 '18 at 18:13
add a comment |
4
If you check themeltedcars, there is nocylcolumn.id.varscan take multiple columnsmeltedcars<-melt(mtcars, id.vars = c("gear", "cyl"))
– akrun
Dec 31 '18 at 14:42
if you have several columns as ID identifier, maybe simplyinteraction(x,y)for your ID will work. Or usetidyr::uniteto create a new ID column
– Tjebo
Dec 31 '18 at 14:59
@akrun, the problem is not the melting. I am receiving the data frame in that melted format from the database with gears as an ID column but with "cyl" in the "variable" column much like mpg above. I want to turn the "cyl" into a header (like "gear") post facto. That is after I have received it in that melted format.
– Sam Kruse
Dec 31 '18 at 17:32
For clarity, will you include the rows with variable=="cyl" in your first example data set? It still looks like there isn't any cyl in what you get.
– Aaron
Dec 31 '18 at 18:13
4
4
If you check the
meltedcars, there is no cyl column. id.vars can take multiple columns meltedcars<-melt(mtcars, id.vars = c("gear", "cyl"))– akrun
Dec 31 '18 at 14:42
If you check the
meltedcars, there is no cyl column. id.vars can take multiple columns meltedcars<-melt(mtcars, id.vars = c("gear", "cyl"))– akrun
Dec 31 '18 at 14:42
if you have several columns as ID identifier, maybe simply
interaction(x,y) for your ID will work. Or use tidyr::unite to create a new ID column– Tjebo
Dec 31 '18 at 14:59
if you have several columns as ID identifier, maybe simply
interaction(x,y) for your ID will work. Or use tidyr::unite to create a new ID column– Tjebo
Dec 31 '18 at 14:59
@akrun, the problem is not the melting. I am receiving the data frame in that melted format from the database with gears as an ID column but with "cyl" in the "variable" column much like mpg above. I want to turn the "cyl" into a header (like "gear") post facto. That is after I have received it in that melted format.
– Sam Kruse
Dec 31 '18 at 17:32
@akrun, the problem is not the melting. I am receiving the data frame in that melted format from the database with gears as an ID column but with "cyl" in the "variable" column much like mpg above. I want to turn the "cyl" into a header (like "gear") post facto. That is after I have received it in that melted format.
– Sam Kruse
Dec 31 '18 at 17:32
For clarity, will you include the rows with variable=="cyl" in your first example data set? It still looks like there isn't any cyl in what you get.
– Aaron
Dec 31 '18 at 18:13
For clarity, will you include the rows with variable=="cyl" in your first example data set? It still looks like there isn't any cyl in what you get.
– Aaron
Dec 31 '18 at 18:13
add a comment |
2 Answers
2
active
oldest
votes
I'll replicate the problem using your approach of using mtcars:
df <- head(mtcars) # get a small chunk to work with
df <- df %>%
mutate(id = paste0("id", row_number())) # create an id variable
# use tidyr::gather() to "melt" the data:
df_long <- df %>%
gather()
At this point df_long should be in the "problem" format that you indicated.
# Here's the problem area. Focus in on the last '10':
df_long %>% tail(10)
key value
63 carb 1
64 carb 1
65 carb 2
66 carb 1
67 id id1
68 id id2
69 id id3
70 id id4
71 id id5
72 id id6
So, the ids are mixed in. I'm assuming that this is your issue? The following is not very elegant, but it works:
# get the id strings, store in `ids`
ids <- df_long %>%
filter(key == "id") %>%
select(value) %>%
.[[1]]
Now, repeat the ids across the rows and finally, remove the original id rows in the tail above:
df_long <- df_long %>%
mutate(newid = rep(ids, length(unique(.$key)))) %>%
filter(key != "id") %>%
select(newid, key, value)
head(df_long)
Result:
newid key value
1 id1 mpg 21
2 id2 mpg 21
3 id3 mpg 22.8
4 id4 mpg 21.4
5 id5 mpg 18.7
6 id6 mpg 18.1
Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.
– Sam Kruse
Dec 31 '18 at 17:34
I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.
– Aaron
Dec 31 '18 at 18:12
@SamKruse sounds good
– Marian Minar
Dec 31 '18 at 23:29
@SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.
– Marian Minar
Dec 31 '18 at 23:36
Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!
– Sam Kruse
Jan 5 at 15:35
add a comment |
I'm creating the same data set as in your edit.
# prepare data
cars <- mtcars
cars$model <- row.names(cars)
row.names(cars) <- seq_along(cars$mpg)
df <- melt(cars, id.vars = c("model", "gear"))
Now I subset df through "cyl" observations and replicate it 9 times to match melted data frame. What gives me "cyl" column with its values in the long format repeated 9 times what's consistent with the order from melted df.
df1 <- df[df$variable == "cyl", ]
names(df1)[4] <- "cyl"
df1$variable <- NULL
df <- df[df$variable != "cyl", ]
# replicate data frame 9 times
df2 <- do.call("rbind", replicate(9, df1, simplify = FALSE))
Simply add new column to data frame from the starting point.
I think the solution is quite context-specific, so I'm not sure whether it'll work in case of your data, but maybe it'll help to think about a problem.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988644%2ffor-data-frames-already-in-long-format-how-can-one-make-a-measured-variable-a%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'll replicate the problem using your approach of using mtcars:
df <- head(mtcars) # get a small chunk to work with
df <- df %>%
mutate(id = paste0("id", row_number())) # create an id variable
# use tidyr::gather() to "melt" the data:
df_long <- df %>%
gather()
At this point df_long should be in the "problem" format that you indicated.
# Here's the problem area. Focus in on the last '10':
df_long %>% tail(10)
key value
63 carb 1
64 carb 1
65 carb 2
66 carb 1
67 id id1
68 id id2
69 id id3
70 id id4
71 id id5
72 id id6
So, the ids are mixed in. I'm assuming that this is your issue? The following is not very elegant, but it works:
# get the id strings, store in `ids`
ids <- df_long %>%
filter(key == "id") %>%
select(value) %>%
.[[1]]
Now, repeat the ids across the rows and finally, remove the original id rows in the tail above:
df_long <- df_long %>%
mutate(newid = rep(ids, length(unique(.$key)))) %>%
filter(key != "id") %>%
select(newid, key, value)
head(df_long)
Result:
newid key value
1 id1 mpg 21
2 id2 mpg 21
3 id3 mpg 22.8
4 id4 mpg 21.4
5 id5 mpg 18.7
6 id6 mpg 18.1
Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.
– Sam Kruse
Dec 31 '18 at 17:34
I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.
– Aaron
Dec 31 '18 at 18:12
@SamKruse sounds good
– Marian Minar
Dec 31 '18 at 23:29
@SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.
– Marian Minar
Dec 31 '18 at 23:36
Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!
– Sam Kruse
Jan 5 at 15:35
add a comment |
I'll replicate the problem using your approach of using mtcars:
df <- head(mtcars) # get a small chunk to work with
df <- df %>%
mutate(id = paste0("id", row_number())) # create an id variable
# use tidyr::gather() to "melt" the data:
df_long <- df %>%
gather()
At this point df_long should be in the "problem" format that you indicated.
# Here's the problem area. Focus in on the last '10':
df_long %>% tail(10)
key value
63 carb 1
64 carb 1
65 carb 2
66 carb 1
67 id id1
68 id id2
69 id id3
70 id id4
71 id id5
72 id id6
So, the ids are mixed in. I'm assuming that this is your issue? The following is not very elegant, but it works:
# get the id strings, store in `ids`
ids <- df_long %>%
filter(key == "id") %>%
select(value) %>%
.[[1]]
Now, repeat the ids across the rows and finally, remove the original id rows in the tail above:
df_long <- df_long %>%
mutate(newid = rep(ids, length(unique(.$key)))) %>%
filter(key != "id") %>%
select(newid, key, value)
head(df_long)
Result:
newid key value
1 id1 mpg 21
2 id2 mpg 21
3 id3 mpg 22.8
4 id4 mpg 21.4
5 id5 mpg 18.7
6 id6 mpg 18.1
Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.
– Sam Kruse
Dec 31 '18 at 17:34
I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.
– Aaron
Dec 31 '18 at 18:12
@SamKruse sounds good
– Marian Minar
Dec 31 '18 at 23:29
@SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.
– Marian Minar
Dec 31 '18 at 23:36
Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!
– Sam Kruse
Jan 5 at 15:35
add a comment |
I'll replicate the problem using your approach of using mtcars:
df <- head(mtcars) # get a small chunk to work with
df <- df %>%
mutate(id = paste0("id", row_number())) # create an id variable
# use tidyr::gather() to "melt" the data:
df_long <- df %>%
gather()
At this point df_long should be in the "problem" format that you indicated.
# Here's the problem area. Focus in on the last '10':
df_long %>% tail(10)
key value
63 carb 1
64 carb 1
65 carb 2
66 carb 1
67 id id1
68 id id2
69 id id3
70 id id4
71 id id5
72 id id6
So, the ids are mixed in. I'm assuming that this is your issue? The following is not very elegant, but it works:
# get the id strings, store in `ids`
ids <- df_long %>%
filter(key == "id") %>%
select(value) %>%
.[[1]]
Now, repeat the ids across the rows and finally, remove the original id rows in the tail above:
df_long <- df_long %>%
mutate(newid = rep(ids, length(unique(.$key)))) %>%
filter(key != "id") %>%
select(newid, key, value)
head(df_long)
Result:
newid key value
1 id1 mpg 21
2 id2 mpg 21
3 id3 mpg 22.8
4 id4 mpg 21.4
5 id5 mpg 18.7
6 id6 mpg 18.1
I'll replicate the problem using your approach of using mtcars:
df <- head(mtcars) # get a small chunk to work with
df <- df %>%
mutate(id = paste0("id", row_number())) # create an id variable
# use tidyr::gather() to "melt" the data:
df_long <- df %>%
gather()
At this point df_long should be in the "problem" format that you indicated.
# Here's the problem area. Focus in on the last '10':
df_long %>% tail(10)
key value
63 carb 1
64 carb 1
65 carb 2
66 carb 1
67 id id1
68 id id2
69 id id3
70 id id4
71 id id5
72 id id6
So, the ids are mixed in. I'm assuming that this is your issue? The following is not very elegant, but it works:
# get the id strings, store in `ids`
ids <- df_long %>%
filter(key == "id") %>%
select(value) %>%
.[[1]]
Now, repeat the ids across the rows and finally, remove the original id rows in the tail above:
df_long <- df_long %>%
mutate(newid = rep(ids, length(unique(.$key)))) %>%
filter(key != "id") %>%
select(newid, key, value)
head(df_long)
Result:
newid key value
1 id1 mpg 21
2 id2 mpg 21
3 id3 mpg 22.8
4 id4 mpg 21.4
5 id5 mpg 18.7
6 id6 mpg 18.1
answered Dec 31 '18 at 15:58
Marian MinarMarian Minar
34510
34510
Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.
– Sam Kruse
Dec 31 '18 at 17:34
I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.
– Aaron
Dec 31 '18 at 18:12
@SamKruse sounds good
– Marian Minar
Dec 31 '18 at 23:29
@SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.
– Marian Minar
Dec 31 '18 at 23:36
Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!
– Sam Kruse
Jan 5 at 15:35
add a comment |
Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.
– Sam Kruse
Dec 31 '18 at 17:34
I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.
– Aaron
Dec 31 '18 at 18:12
@SamKruse sounds good
– Marian Minar
Dec 31 '18 at 23:29
@SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.
– Marian Minar
Dec 31 '18 at 23:36
Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!
– Sam Kruse
Jan 5 at 15:35
Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.
– Sam Kruse
Dec 31 '18 at 17:34
Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.
– Sam Kruse
Dec 31 '18 at 17:34
I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.
– Aaron
Dec 31 '18 at 18:12
I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.
– Aaron
Dec 31 '18 at 18:12
@SamKruse sounds good
– Marian Minar
Dec 31 '18 at 23:29
@SamKruse sounds good
– Marian Minar
Dec 31 '18 at 23:29
@SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.
– Marian Minar
Dec 31 '18 at 23:36
@SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.
– Marian Minar
Dec 31 '18 at 23:36
Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!
– Sam Kruse
Jan 5 at 15:35
Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!
– Sam Kruse
Jan 5 at 15:35
add a comment |
I'm creating the same data set as in your edit.
# prepare data
cars <- mtcars
cars$model <- row.names(cars)
row.names(cars) <- seq_along(cars$mpg)
df <- melt(cars, id.vars = c("model", "gear"))
Now I subset df through "cyl" observations and replicate it 9 times to match melted data frame. What gives me "cyl" column with its values in the long format repeated 9 times what's consistent with the order from melted df.
df1 <- df[df$variable == "cyl", ]
names(df1)[4] <- "cyl"
df1$variable <- NULL
df <- df[df$variable != "cyl", ]
# replicate data frame 9 times
df2 <- do.call("rbind", replicate(9, df1, simplify = FALSE))
Simply add new column to data frame from the starting point.
I think the solution is quite context-specific, so I'm not sure whether it'll work in case of your data, but maybe it'll help to think about a problem.
add a comment |
I'm creating the same data set as in your edit.
# prepare data
cars <- mtcars
cars$model <- row.names(cars)
row.names(cars) <- seq_along(cars$mpg)
df <- melt(cars, id.vars = c("model", "gear"))
Now I subset df through "cyl" observations and replicate it 9 times to match melted data frame. What gives me "cyl" column with its values in the long format repeated 9 times what's consistent with the order from melted df.
df1 <- df[df$variable == "cyl", ]
names(df1)[4] <- "cyl"
df1$variable <- NULL
df <- df[df$variable != "cyl", ]
# replicate data frame 9 times
df2 <- do.call("rbind", replicate(9, df1, simplify = FALSE))
Simply add new column to data frame from the starting point.
I think the solution is quite context-specific, so I'm not sure whether it'll work in case of your data, but maybe it'll help to think about a problem.
add a comment |
I'm creating the same data set as in your edit.
# prepare data
cars <- mtcars
cars$model <- row.names(cars)
row.names(cars) <- seq_along(cars$mpg)
df <- melt(cars, id.vars = c("model", "gear"))
Now I subset df through "cyl" observations and replicate it 9 times to match melted data frame. What gives me "cyl" column with its values in the long format repeated 9 times what's consistent with the order from melted df.
df1 <- df[df$variable == "cyl", ]
names(df1)[4] <- "cyl"
df1$variable <- NULL
df <- df[df$variable != "cyl", ]
# replicate data frame 9 times
df2 <- do.call("rbind", replicate(9, df1, simplify = FALSE))
Simply add new column to data frame from the starting point.
I think the solution is quite context-specific, so I'm not sure whether it'll work in case of your data, but maybe it'll help to think about a problem.
I'm creating the same data set as in your edit.
# prepare data
cars <- mtcars
cars$model <- row.names(cars)
row.names(cars) <- seq_along(cars$mpg)
df <- melt(cars, id.vars = c("model", "gear"))
Now I subset df through "cyl" observations and replicate it 9 times to match melted data frame. What gives me "cyl" column with its values in the long format repeated 9 times what's consistent with the order from melted df.
df1 <- df[df$variable == "cyl", ]
names(df1)[4] <- "cyl"
df1$variable <- NULL
df <- df[df$variable != "cyl", ]
# replicate data frame 9 times
df2 <- do.call("rbind", replicate(9, df1, simplify = FALSE))
Simply add new column to data frame from the starting point.
I think the solution is quite context-specific, so I'm not sure whether it'll work in case of your data, but maybe it'll help to think about a problem.
answered Jan 1 at 0:39
bronek_baczkiewiczbronek_baczkiewicz
113
113
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988644%2ffor-data-frames-already-in-long-format-how-can-one-make-a-measured-variable-a%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
If you check the
meltedcars, there is nocylcolumn.id.varscan take multiple columnsmeltedcars<-melt(mtcars, id.vars = c("gear", "cyl"))– akrun
Dec 31 '18 at 14:42
if you have several columns as ID identifier, maybe simply
interaction(x,y)for your ID will work. Or usetidyr::uniteto create a new ID column– Tjebo
Dec 31 '18 at 14:59
@akrun, the problem is not the melting. I am receiving the data frame in that melted format from the database with gears as an ID column but with "cyl" in the "variable" column much like mpg above. I want to turn the "cyl" into a header (like "gear") post facto. That is after I have received it in that melted format.
– Sam Kruse
Dec 31 '18 at 17:32
For clarity, will you include the rows with variable=="cyl" in your first example data set? It still looks like there isn't any cyl in what you get.
– Aaron
Dec 31 '18 at 18:13