For data frames already in 'long' format, how can one make a measured variable an ID variable without...












0















I am making calls to a database using a package (brapi) that return the necessary data in long format. The api treats certain ID variables as measured variables which is problematic because my analysis scripts need them as ID variables. I can't modify the call to the database in such a way that it will do it for me and I do not want to just reshape2::cast and then melt the entire thing because there are hundreds of variables which would make it expensive. Ideally, I am looking for a way to do this with reshape2, plyr, or the like.



My problem can be seen using the mtcars built-in data frame. Imagine you get the long data frame, meltedcars, with gear as an ID variable from the database but you want cyl as an ID variable as well. I tried obvious dcast commands (eg shown below) but I knew it wouldn't work because the function won't find cyl as a variable.



meltedcars<-melt(mtcars, id.vars = c("gear"))

head(meltedcars)

gear variable value
1 4 mpg 21.0
2 4 mpg 21.0
3 4 mpg 22.8
4 3 mpg 21.4
5 3 mpg 18.7
6 3 mpg 18.1

c<-dcast(d, gear + cyl ~ variable, value.var= "value")
Error in FUN(X[[i]], ...) : object 'cyl' not found


I have tried a number of iterations of the dcast function above to no avail. I have looked for similar problems for quite sometime but haven't found a fit. I know I could do this with some for loops or by melting and casting all over again, but I want to see if there is a more elegant solution. Thoughts?



EDIT



I should be clearer. For the purposes of this post, ID columns are the variables in long format that are used to identify the object that is being observed rather than what was being measured. Let's say we were measuring mpg and disp on all the cars in the mtcars data frame and we wanted additional identifying information about the car besides the model name in the data frame as its own column. To identify the car you have a column of car names (I have adjusted mtcars so that the row names are now a column within mtcars called model), a column of what gear they are, and one for how many cyl each car has. Then we have a variable column and value column where the kind of observation and the value of it are listed respectively. When I make the call to the database this is what I get:



head(mtcarsFromDB)



               model gear variable value
1 Mazda RX4 4 mpg 21.0
2 Mazda RX4 Wag 4 mpg 21.0
3 Datsun 710 4 mpg 22.8
4 Hornet 4 Drive 3 mpg 21.4
5 Hornet Sportabout 3 mpg 18.7
6 Valiant 3 mpg 18.1


But I want to reformat this df with cyl, which is currently a value of variable, as an ID column like gear without casting and melting the entire thing. It should look like this:



              model gear cyl variable value
1 Mazda RX4 4 6 mpg 21.0
2 Mazda RX4 Wag 4 6 mpg 21.0
3 Datsun 710 4 4 mpg 22.8
4 Hornet 4 Drive 3 6 mpg 21.4
5 Hornet Sportabout 3 8 mpg 18.7
6 Valiant 3 6 mpg 18.1









share|improve this question




















  • 4





    If you check the meltedcars, there is no cyl column. id.vars can take multiple columns meltedcars<-melt(mtcars, id.vars = c("gear", "cyl"))

    – akrun
    Dec 31 '18 at 14:42













  • if you have several columns as ID identifier, maybe simply interaction(x,y) for your ID will work. Or use tidyr::unite to create a new ID column

    – Tjebo
    Dec 31 '18 at 14:59











  • @akrun, the problem is not the melting. I am receiving the data frame in that melted format from the database with gears as an ID column but with "cyl" in the "variable" column much like mpg above. I want to turn the "cyl" into a header (like "gear") post facto. That is after I have received it in that melted format.

    – Sam Kruse
    Dec 31 '18 at 17:32











  • For clarity, will you include the rows with variable=="cyl" in your first example data set? It still looks like there isn't any cyl in what you get.

    – Aaron
    Dec 31 '18 at 18:13
















0















I am making calls to a database using a package (brapi) that return the necessary data in long format. The api treats certain ID variables as measured variables which is problematic because my analysis scripts need them as ID variables. I can't modify the call to the database in such a way that it will do it for me and I do not want to just reshape2::cast and then melt the entire thing because there are hundreds of variables which would make it expensive. Ideally, I am looking for a way to do this with reshape2, plyr, or the like.



My problem can be seen using the mtcars built-in data frame. Imagine you get the long data frame, meltedcars, with gear as an ID variable from the database but you want cyl as an ID variable as well. I tried obvious dcast commands (eg shown below) but I knew it wouldn't work because the function won't find cyl as a variable.



meltedcars<-melt(mtcars, id.vars = c("gear"))

head(meltedcars)

gear variable value
1 4 mpg 21.0
2 4 mpg 21.0
3 4 mpg 22.8
4 3 mpg 21.4
5 3 mpg 18.7
6 3 mpg 18.1

c<-dcast(d, gear + cyl ~ variable, value.var= "value")
Error in FUN(X[[i]], ...) : object 'cyl' not found


I have tried a number of iterations of the dcast function above to no avail. I have looked for similar problems for quite sometime but haven't found a fit. I know I could do this with some for loops or by melting and casting all over again, but I want to see if there is a more elegant solution. Thoughts?



EDIT



I should be clearer. For the purposes of this post, ID columns are the variables in long format that are used to identify the object that is being observed rather than what was being measured. Let's say we were measuring mpg and disp on all the cars in the mtcars data frame and we wanted additional identifying information about the car besides the model name in the data frame as its own column. To identify the car you have a column of car names (I have adjusted mtcars so that the row names are now a column within mtcars called model), a column of what gear they are, and one for how many cyl each car has. Then we have a variable column and value column where the kind of observation and the value of it are listed respectively. When I make the call to the database this is what I get:



head(mtcarsFromDB)



               model gear variable value
1 Mazda RX4 4 mpg 21.0
2 Mazda RX4 Wag 4 mpg 21.0
3 Datsun 710 4 mpg 22.8
4 Hornet 4 Drive 3 mpg 21.4
5 Hornet Sportabout 3 mpg 18.7
6 Valiant 3 mpg 18.1


But I want to reformat this df with cyl, which is currently a value of variable, as an ID column like gear without casting and melting the entire thing. It should look like this:



              model gear cyl variable value
1 Mazda RX4 4 6 mpg 21.0
2 Mazda RX4 Wag 4 6 mpg 21.0
3 Datsun 710 4 4 mpg 22.8
4 Hornet 4 Drive 3 6 mpg 21.4
5 Hornet Sportabout 3 8 mpg 18.7
6 Valiant 3 6 mpg 18.1









share|improve this question




















  • 4





    If you check the meltedcars, there is no cyl column. id.vars can take multiple columns meltedcars<-melt(mtcars, id.vars = c("gear", "cyl"))

    – akrun
    Dec 31 '18 at 14:42













  • if you have several columns as ID identifier, maybe simply interaction(x,y) for your ID will work. Or use tidyr::unite to create a new ID column

    – Tjebo
    Dec 31 '18 at 14:59











  • @akrun, the problem is not the melting. I am receiving the data frame in that melted format from the database with gears as an ID column but with "cyl" in the "variable" column much like mpg above. I want to turn the "cyl" into a header (like "gear") post facto. That is after I have received it in that melted format.

    – Sam Kruse
    Dec 31 '18 at 17:32











  • For clarity, will you include the rows with variable=="cyl" in your first example data set? It still looks like there isn't any cyl in what you get.

    – Aaron
    Dec 31 '18 at 18:13














0












0








0








I am making calls to a database using a package (brapi) that return the necessary data in long format. The api treats certain ID variables as measured variables which is problematic because my analysis scripts need them as ID variables. I can't modify the call to the database in such a way that it will do it for me and I do not want to just reshape2::cast and then melt the entire thing because there are hundreds of variables which would make it expensive. Ideally, I am looking for a way to do this with reshape2, plyr, or the like.



My problem can be seen using the mtcars built-in data frame. Imagine you get the long data frame, meltedcars, with gear as an ID variable from the database but you want cyl as an ID variable as well. I tried obvious dcast commands (eg shown below) but I knew it wouldn't work because the function won't find cyl as a variable.



meltedcars<-melt(mtcars, id.vars = c("gear"))

head(meltedcars)

gear variable value
1 4 mpg 21.0
2 4 mpg 21.0
3 4 mpg 22.8
4 3 mpg 21.4
5 3 mpg 18.7
6 3 mpg 18.1

c<-dcast(d, gear + cyl ~ variable, value.var= "value")
Error in FUN(X[[i]], ...) : object 'cyl' not found


I have tried a number of iterations of the dcast function above to no avail. I have looked for similar problems for quite sometime but haven't found a fit. I know I could do this with some for loops or by melting and casting all over again, but I want to see if there is a more elegant solution. Thoughts?



EDIT



I should be clearer. For the purposes of this post, ID columns are the variables in long format that are used to identify the object that is being observed rather than what was being measured. Let's say we were measuring mpg and disp on all the cars in the mtcars data frame and we wanted additional identifying information about the car besides the model name in the data frame as its own column. To identify the car you have a column of car names (I have adjusted mtcars so that the row names are now a column within mtcars called model), a column of what gear they are, and one for how many cyl each car has. Then we have a variable column and value column where the kind of observation and the value of it are listed respectively. When I make the call to the database this is what I get:



head(mtcarsFromDB)



               model gear variable value
1 Mazda RX4 4 mpg 21.0
2 Mazda RX4 Wag 4 mpg 21.0
3 Datsun 710 4 mpg 22.8
4 Hornet 4 Drive 3 mpg 21.4
5 Hornet Sportabout 3 mpg 18.7
6 Valiant 3 mpg 18.1


But I want to reformat this df with cyl, which is currently a value of variable, as an ID column like gear without casting and melting the entire thing. It should look like this:



              model gear cyl variable value
1 Mazda RX4 4 6 mpg 21.0
2 Mazda RX4 Wag 4 6 mpg 21.0
3 Datsun 710 4 4 mpg 22.8
4 Hornet 4 Drive 3 6 mpg 21.4
5 Hornet Sportabout 3 8 mpg 18.7
6 Valiant 3 6 mpg 18.1









share|improve this question
















I am making calls to a database using a package (brapi) that return the necessary data in long format. The api treats certain ID variables as measured variables which is problematic because my analysis scripts need them as ID variables. I can't modify the call to the database in such a way that it will do it for me and I do not want to just reshape2::cast and then melt the entire thing because there are hundreds of variables which would make it expensive. Ideally, I am looking for a way to do this with reshape2, plyr, or the like.



My problem can be seen using the mtcars built-in data frame. Imagine you get the long data frame, meltedcars, with gear as an ID variable from the database but you want cyl as an ID variable as well. I tried obvious dcast commands (eg shown below) but I knew it wouldn't work because the function won't find cyl as a variable.



meltedcars<-melt(mtcars, id.vars = c("gear"))

head(meltedcars)

gear variable value
1 4 mpg 21.0
2 4 mpg 21.0
3 4 mpg 22.8
4 3 mpg 21.4
5 3 mpg 18.7
6 3 mpg 18.1

c<-dcast(d, gear + cyl ~ variable, value.var= "value")
Error in FUN(X[[i]], ...) : object 'cyl' not found


I have tried a number of iterations of the dcast function above to no avail. I have looked for similar problems for quite sometime but haven't found a fit. I know I could do this with some for loops or by melting and casting all over again, but I want to see if there is a more elegant solution. Thoughts?



EDIT



I should be clearer. For the purposes of this post, ID columns are the variables in long format that are used to identify the object that is being observed rather than what was being measured. Let's say we were measuring mpg and disp on all the cars in the mtcars data frame and we wanted additional identifying information about the car besides the model name in the data frame as its own column. To identify the car you have a column of car names (I have adjusted mtcars so that the row names are now a column within mtcars called model), a column of what gear they are, and one for how many cyl each car has. Then we have a variable column and value column where the kind of observation and the value of it are listed respectively. When I make the call to the database this is what I get:



head(mtcarsFromDB)



               model gear variable value
1 Mazda RX4 4 mpg 21.0
2 Mazda RX4 Wag 4 mpg 21.0
3 Datsun 710 4 mpg 22.8
4 Hornet 4 Drive 3 mpg 21.4
5 Hornet Sportabout 3 mpg 18.7
6 Valiant 3 mpg 18.1


But I want to reformat this df with cyl, which is currently a value of variable, as an ID column like gear without casting and melting the entire thing. It should look like this:



              model gear cyl variable value
1 Mazda RX4 4 6 mpg 21.0
2 Mazda RX4 Wag 4 6 mpg 21.0
3 Datsun 710 4 4 mpg 22.8
4 Hornet 4 Drive 3 6 mpg 21.4
5 Hornet Sportabout 3 8 mpg 18.7
6 Valiant 3 6 mpg 18.1






r dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 31 '18 at 18:03







Sam Kruse

















asked Dec 31 '18 at 14:41









Sam KruseSam Kruse

32




32








  • 4





    If you check the meltedcars, there is no cyl column. id.vars can take multiple columns meltedcars<-melt(mtcars, id.vars = c("gear", "cyl"))

    – akrun
    Dec 31 '18 at 14:42













  • if you have several columns as ID identifier, maybe simply interaction(x,y) for your ID will work. Or use tidyr::unite to create a new ID column

    – Tjebo
    Dec 31 '18 at 14:59











  • @akrun, the problem is not the melting. I am receiving the data frame in that melted format from the database with gears as an ID column but with "cyl" in the "variable" column much like mpg above. I want to turn the "cyl" into a header (like "gear") post facto. That is after I have received it in that melted format.

    – Sam Kruse
    Dec 31 '18 at 17:32











  • For clarity, will you include the rows with variable=="cyl" in your first example data set? It still looks like there isn't any cyl in what you get.

    – Aaron
    Dec 31 '18 at 18:13














  • 4





    If you check the meltedcars, there is no cyl column. id.vars can take multiple columns meltedcars<-melt(mtcars, id.vars = c("gear", "cyl"))

    – akrun
    Dec 31 '18 at 14:42













  • if you have several columns as ID identifier, maybe simply interaction(x,y) for your ID will work. Or use tidyr::unite to create a new ID column

    – Tjebo
    Dec 31 '18 at 14:59











  • @akrun, the problem is not the melting. I am receiving the data frame in that melted format from the database with gears as an ID column but with "cyl" in the "variable" column much like mpg above. I want to turn the "cyl" into a header (like "gear") post facto. That is after I have received it in that melted format.

    – Sam Kruse
    Dec 31 '18 at 17:32











  • For clarity, will you include the rows with variable=="cyl" in your first example data set? It still looks like there isn't any cyl in what you get.

    – Aaron
    Dec 31 '18 at 18:13








4




4





If you check the meltedcars, there is no cyl column. id.vars can take multiple columns meltedcars<-melt(mtcars, id.vars = c("gear", "cyl"))

– akrun
Dec 31 '18 at 14:42







If you check the meltedcars, there is no cyl column. id.vars can take multiple columns meltedcars<-melt(mtcars, id.vars = c("gear", "cyl"))

– akrun
Dec 31 '18 at 14:42















if you have several columns as ID identifier, maybe simply interaction(x,y) for your ID will work. Or use tidyr::unite to create a new ID column

– Tjebo
Dec 31 '18 at 14:59





if you have several columns as ID identifier, maybe simply interaction(x,y) for your ID will work. Or use tidyr::unite to create a new ID column

– Tjebo
Dec 31 '18 at 14:59













@akrun, the problem is not the melting. I am receiving the data frame in that melted format from the database with gears as an ID column but with "cyl" in the "variable" column much like mpg above. I want to turn the "cyl" into a header (like "gear") post facto. That is after I have received it in that melted format.

– Sam Kruse
Dec 31 '18 at 17:32





@akrun, the problem is not the melting. I am receiving the data frame in that melted format from the database with gears as an ID column but with "cyl" in the "variable" column much like mpg above. I want to turn the "cyl" into a header (like "gear") post facto. That is after I have received it in that melted format.

– Sam Kruse
Dec 31 '18 at 17:32













For clarity, will you include the rows with variable=="cyl" in your first example data set? It still looks like there isn't any cyl in what you get.

– Aaron
Dec 31 '18 at 18:13





For clarity, will you include the rows with variable=="cyl" in your first example data set? It still looks like there isn't any cyl in what you get.

– Aaron
Dec 31 '18 at 18:13












2 Answers
2






active

oldest

votes


















1














I'll replicate the problem using your approach of using mtcars:



df <- head(mtcars) # get a small chunk to work with

df <- df %>%
mutate(id = paste0("id", row_number())) # create an id variable

# use tidyr::gather() to "melt" the data:
df_long <- df %>%
gather()


At this point df_long should be in the "problem" format that you indicated.



# Here's the problem area. Focus in on the last '10':
df_long %>% tail(10)

key value
63 carb 1
64 carb 1
65 carb 2
66 carb 1
67 id id1
68 id id2
69 id id3
70 id id4
71 id id5
72 id id6


So, the ids are mixed in. I'm assuming that this is your issue? The following is not very elegant, but it works:



# get the id strings, store in `ids`
ids <- df_long %>%
filter(key == "id") %>%
select(value) %>%
.[[1]]


Now, repeat the ids across the rows and finally, remove the original id rows in the tail above:



df_long <- df_long %>%
mutate(newid = rep(ids, length(unique(.$key)))) %>%
filter(key != "id") %>%
select(newid, key, value)

head(df_long)


Result:



  newid key value
1 id1 mpg 21
2 id2 mpg 21
3 id3 mpg 22.8
4 id4 mpg 21.4
5 id5 mpg 18.7
6 id6 mpg 18.1





share|improve this answer
























  • Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.

    – Sam Kruse
    Dec 31 '18 at 17:34











  • I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.

    – Aaron
    Dec 31 '18 at 18:12











  • @SamKruse sounds good

    – Marian Minar
    Dec 31 '18 at 23:29











  • @SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.

    – Marian Minar
    Dec 31 '18 at 23:36











  • Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!

    – Sam Kruse
    Jan 5 at 15:35



















0














I'm creating the same data set as in your edit.



# prepare data
cars <- mtcars
cars$model <- row.names(cars)
row.names(cars) <- seq_along(cars$mpg)

df <- melt(cars, id.vars = c("model", "gear"))


Now I subset df through "cyl" observations and replicate it 9 times to match melted data frame. What gives me "cyl" column with its values in the long format repeated 9 times what's consistent with the order from melted df.



df1 <- df[df$variable == "cyl", ]
names(df1)[4] <- "cyl"
df1$variable <- NULL

df <- df[df$variable != "cyl", ]

# replicate data frame 9 times
df2 <- do.call("rbind", replicate(9, df1, simplify = FALSE))


Simply add new column to data frame from the starting point.



I think the solution is quite context-specific, so I'm not sure whether it'll work in case of your data, but maybe it'll help to think about a problem.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988644%2ffor-data-frames-already-in-long-format-how-can-one-make-a-measured-variable-a%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    I'll replicate the problem using your approach of using mtcars:



    df <- head(mtcars) # get a small chunk to work with

    df <- df %>%
    mutate(id = paste0("id", row_number())) # create an id variable

    # use tidyr::gather() to "melt" the data:
    df_long <- df %>%
    gather()


    At this point df_long should be in the "problem" format that you indicated.



    # Here's the problem area. Focus in on the last '10':
    df_long %>% tail(10)

    key value
    63 carb 1
    64 carb 1
    65 carb 2
    66 carb 1
    67 id id1
    68 id id2
    69 id id3
    70 id id4
    71 id id5
    72 id id6


    So, the ids are mixed in. I'm assuming that this is your issue? The following is not very elegant, but it works:



    # get the id strings, store in `ids`
    ids <- df_long %>%
    filter(key == "id") %>%
    select(value) %>%
    .[[1]]


    Now, repeat the ids across the rows and finally, remove the original id rows in the tail above:



    df_long <- df_long %>%
    mutate(newid = rep(ids, length(unique(.$key)))) %>%
    filter(key != "id") %>%
    select(newid, key, value)

    head(df_long)


    Result:



      newid key value
    1 id1 mpg 21
    2 id2 mpg 21
    3 id3 mpg 22.8
    4 id4 mpg 21.4
    5 id5 mpg 18.7
    6 id6 mpg 18.1





    share|improve this answer
























    • Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.

      – Sam Kruse
      Dec 31 '18 at 17:34











    • I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.

      – Aaron
      Dec 31 '18 at 18:12











    • @SamKruse sounds good

      – Marian Minar
      Dec 31 '18 at 23:29











    • @SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.

      – Marian Minar
      Dec 31 '18 at 23:36











    • Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!

      – Sam Kruse
      Jan 5 at 15:35
















    1














    I'll replicate the problem using your approach of using mtcars:



    df <- head(mtcars) # get a small chunk to work with

    df <- df %>%
    mutate(id = paste0("id", row_number())) # create an id variable

    # use tidyr::gather() to "melt" the data:
    df_long <- df %>%
    gather()


    At this point df_long should be in the "problem" format that you indicated.



    # Here's the problem area. Focus in on the last '10':
    df_long %>% tail(10)

    key value
    63 carb 1
    64 carb 1
    65 carb 2
    66 carb 1
    67 id id1
    68 id id2
    69 id id3
    70 id id4
    71 id id5
    72 id id6


    So, the ids are mixed in. I'm assuming that this is your issue? The following is not very elegant, but it works:



    # get the id strings, store in `ids`
    ids <- df_long %>%
    filter(key == "id") %>%
    select(value) %>%
    .[[1]]


    Now, repeat the ids across the rows and finally, remove the original id rows in the tail above:



    df_long <- df_long %>%
    mutate(newid = rep(ids, length(unique(.$key)))) %>%
    filter(key != "id") %>%
    select(newid, key, value)

    head(df_long)


    Result:



      newid key value
    1 id1 mpg 21
    2 id2 mpg 21
    3 id3 mpg 22.8
    4 id4 mpg 21.4
    5 id5 mpg 18.7
    6 id6 mpg 18.1





    share|improve this answer
























    • Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.

      – Sam Kruse
      Dec 31 '18 at 17:34











    • I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.

      – Aaron
      Dec 31 '18 at 18:12











    • @SamKruse sounds good

      – Marian Minar
      Dec 31 '18 at 23:29











    • @SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.

      – Marian Minar
      Dec 31 '18 at 23:36











    • Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!

      – Sam Kruse
      Jan 5 at 15:35














    1












    1








    1







    I'll replicate the problem using your approach of using mtcars:



    df <- head(mtcars) # get a small chunk to work with

    df <- df %>%
    mutate(id = paste0("id", row_number())) # create an id variable

    # use tidyr::gather() to "melt" the data:
    df_long <- df %>%
    gather()


    At this point df_long should be in the "problem" format that you indicated.



    # Here's the problem area. Focus in on the last '10':
    df_long %>% tail(10)

    key value
    63 carb 1
    64 carb 1
    65 carb 2
    66 carb 1
    67 id id1
    68 id id2
    69 id id3
    70 id id4
    71 id id5
    72 id id6


    So, the ids are mixed in. I'm assuming that this is your issue? The following is not very elegant, but it works:



    # get the id strings, store in `ids`
    ids <- df_long %>%
    filter(key == "id") %>%
    select(value) %>%
    .[[1]]


    Now, repeat the ids across the rows and finally, remove the original id rows in the tail above:



    df_long <- df_long %>%
    mutate(newid = rep(ids, length(unique(.$key)))) %>%
    filter(key != "id") %>%
    select(newid, key, value)

    head(df_long)


    Result:



      newid key value
    1 id1 mpg 21
    2 id2 mpg 21
    3 id3 mpg 22.8
    4 id4 mpg 21.4
    5 id5 mpg 18.7
    6 id6 mpg 18.1





    share|improve this answer













    I'll replicate the problem using your approach of using mtcars:



    df <- head(mtcars) # get a small chunk to work with

    df <- df %>%
    mutate(id = paste0("id", row_number())) # create an id variable

    # use tidyr::gather() to "melt" the data:
    df_long <- df %>%
    gather()


    At this point df_long should be in the "problem" format that you indicated.



    # Here's the problem area. Focus in on the last '10':
    df_long %>% tail(10)

    key value
    63 carb 1
    64 carb 1
    65 carb 2
    66 carb 1
    67 id id1
    68 id id2
    69 id id3
    70 id id4
    71 id id5
    72 id id6


    So, the ids are mixed in. I'm assuming that this is your issue? The following is not very elegant, but it works:



    # get the id strings, store in `ids`
    ids <- df_long %>%
    filter(key == "id") %>%
    select(value) %>%
    .[[1]]


    Now, repeat the ids across the rows and finally, remove the original id rows in the tail above:



    df_long <- df_long %>%
    mutate(newid = rep(ids, length(unique(.$key)))) %>%
    filter(key != "id") %>%
    select(newid, key, value)

    head(df_long)


    Result:



      newid key value
    1 id1 mpg 21
    2 id2 mpg 21
    3 id3 mpg 22.8
    4 id4 mpg 21.4
    5 id5 mpg 18.7
    6 id6 mpg 18.1






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Dec 31 '18 at 15:58









    Marian MinarMarian Minar

    34510




    34510













    • Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.

      – Sam Kruse
      Dec 31 '18 at 17:34











    • I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.

      – Aaron
      Dec 31 '18 at 18:12











    • @SamKruse sounds good

      – Marian Minar
      Dec 31 '18 at 23:29











    • @SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.

      – Marian Minar
      Dec 31 '18 at 23:36











    • Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!

      – Sam Kruse
      Jan 5 at 15:35



















    • Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.

      – Sam Kruse
      Dec 31 '18 at 17:34











    • I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.

      – Aaron
      Dec 31 '18 at 18:12











    • @SamKruse sounds good

      – Marian Minar
      Dec 31 '18 at 23:29











    • @SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.

      – Marian Minar
      Dec 31 '18 at 23:36











    • Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!

      – Sam Kruse
      Jan 5 at 15:35

















    Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.

    – Sam Kruse
    Dec 31 '18 at 17:34





    Hey @Marian Minar, that isn't quite it. Could be that I explained it wrong. I'll try to make it clearer with an edit.

    – Sam Kruse
    Dec 31 '18 at 17:34













    I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.

    – Aaron
    Dec 31 '18 at 18:12





    I think this does what you want, though; the id is your cyl. The basic idea is separate the data frame into cyl and not-cyl, then merge the cyl values with the not-cyl data. Here Marian reps it, which I think depends on it being sorted; my preference would be a true merge.

    – Aaron
    Dec 31 '18 at 18:12













    @SamKruse sounds good

    – Marian Minar
    Dec 31 '18 at 23:29





    @SamKruse sounds good

    – Marian Minar
    Dec 31 '18 at 23:29













    @SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.

    – Marian Minar
    Dec 31 '18 at 23:36





    @SamKruse my apologies, but I'm not sure how to help you, despite having read your edit. You seem to be avoiding putting the actual data that you are working with here, but if you can do that, and paste the code that you are using, we wouldn't have to talk past each other with dummy datasets. Best of luck.

    – Marian Minar
    Dec 31 '18 at 23:36













    Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!

    – Sam Kruse
    Jan 5 at 15:35





    Hey @MarianMinar, I think I found a way to modify your solution to fit the problem. Thanks for your help!

    – Sam Kruse
    Jan 5 at 15:35













    0














    I'm creating the same data set as in your edit.



    # prepare data
    cars <- mtcars
    cars$model <- row.names(cars)
    row.names(cars) <- seq_along(cars$mpg)

    df <- melt(cars, id.vars = c("model", "gear"))


    Now I subset df through "cyl" observations and replicate it 9 times to match melted data frame. What gives me "cyl" column with its values in the long format repeated 9 times what's consistent with the order from melted df.



    df1 <- df[df$variable == "cyl", ]
    names(df1)[4] <- "cyl"
    df1$variable <- NULL

    df <- df[df$variable != "cyl", ]

    # replicate data frame 9 times
    df2 <- do.call("rbind", replicate(9, df1, simplify = FALSE))


    Simply add new column to data frame from the starting point.



    I think the solution is quite context-specific, so I'm not sure whether it'll work in case of your data, but maybe it'll help to think about a problem.






    share|improve this answer




























      0














      I'm creating the same data set as in your edit.



      # prepare data
      cars <- mtcars
      cars$model <- row.names(cars)
      row.names(cars) <- seq_along(cars$mpg)

      df <- melt(cars, id.vars = c("model", "gear"))


      Now I subset df through "cyl" observations and replicate it 9 times to match melted data frame. What gives me "cyl" column with its values in the long format repeated 9 times what's consistent with the order from melted df.



      df1 <- df[df$variable == "cyl", ]
      names(df1)[4] <- "cyl"
      df1$variable <- NULL

      df <- df[df$variable != "cyl", ]

      # replicate data frame 9 times
      df2 <- do.call("rbind", replicate(9, df1, simplify = FALSE))


      Simply add new column to data frame from the starting point.



      I think the solution is quite context-specific, so I'm not sure whether it'll work in case of your data, but maybe it'll help to think about a problem.






      share|improve this answer


























        0












        0








        0







        I'm creating the same data set as in your edit.



        # prepare data
        cars <- mtcars
        cars$model <- row.names(cars)
        row.names(cars) <- seq_along(cars$mpg)

        df <- melt(cars, id.vars = c("model", "gear"))


        Now I subset df through "cyl" observations and replicate it 9 times to match melted data frame. What gives me "cyl" column with its values in the long format repeated 9 times what's consistent with the order from melted df.



        df1 <- df[df$variable == "cyl", ]
        names(df1)[4] <- "cyl"
        df1$variable <- NULL

        df <- df[df$variable != "cyl", ]

        # replicate data frame 9 times
        df2 <- do.call("rbind", replicate(9, df1, simplify = FALSE))


        Simply add new column to data frame from the starting point.



        I think the solution is quite context-specific, so I'm not sure whether it'll work in case of your data, but maybe it'll help to think about a problem.






        share|improve this answer













        I'm creating the same data set as in your edit.



        # prepare data
        cars <- mtcars
        cars$model <- row.names(cars)
        row.names(cars) <- seq_along(cars$mpg)

        df <- melt(cars, id.vars = c("model", "gear"))


        Now I subset df through "cyl" observations and replicate it 9 times to match melted data frame. What gives me "cyl" column with its values in the long format repeated 9 times what's consistent with the order from melted df.



        df1 <- df[df$variable == "cyl", ]
        names(df1)[4] <- "cyl"
        df1$variable <- NULL

        df <- df[df$variable != "cyl", ]

        # replicate data frame 9 times
        df2 <- do.call("rbind", replicate(9, df1, simplify = FALSE))


        Simply add new column to data frame from the starting point.



        I think the solution is quite context-specific, so I'm not sure whether it'll work in case of your data, but maybe it'll help to think about a problem.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 1 at 0:39









        bronek_baczkiewiczbronek_baczkiewicz

        113




        113






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988644%2ffor-data-frames-already-in-long-format-how-can-one-make-a-measured-variable-a%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Mossoró

            Error while reading .h5 file using the rhdf5 package in R

            Pushsharp Apns notification error: 'InvalidToken'