What is the performance difference between mutate_at and mutate?












0















I'm spending a bit of time improving performance of some tidyverse-based dataset analysis code which works on a dataset of between 100-500 columns and 250,000 rows of data (the number of columns you need depends upon the task you are performing).



The code reads data from a CSV file and then does some early data import tidying from CSV - for example fixing booleans which are recorded as "Y", "N", "" to be boolean rather than factors.



Originally we imported all the columns regardless of task - the work I've been doing has moved us to selective column import - skipping a load of work which may not be needed for certain tasks, and resulting in a factor-of-3 performance gain. Of course this makes data tidying a little more complex.



I've resorted to using mutate with a list of expressions to do the data tidy.



However I'm wondering at what point it is beneficial to group identical expressions together to perform a mutate_at (eg. do all the boolean columns which use "Y", "N", "NA" together), and then perform a subsequent mutate of the remaining columns; versus having a function per row to perform a mutate? Is there a performance difference between the two such that it's worth complicating the code?



Appreciating that there will be use-case variation here - just wondered whether anyone had a feeling about some general rules before I either attempt it or spend time building some test examples to check whether it's worthwhile.










share|improve this question


















  • 6





    You can check it using microbenchmark or system.time

    – akrun
    Jan 2 at 9:28
















0















I'm spending a bit of time improving performance of some tidyverse-based dataset analysis code which works on a dataset of between 100-500 columns and 250,000 rows of data (the number of columns you need depends upon the task you are performing).



The code reads data from a CSV file and then does some early data import tidying from CSV - for example fixing booleans which are recorded as "Y", "N", "" to be boolean rather than factors.



Originally we imported all the columns regardless of task - the work I've been doing has moved us to selective column import - skipping a load of work which may not be needed for certain tasks, and resulting in a factor-of-3 performance gain. Of course this makes data tidying a little more complex.



I've resorted to using mutate with a list of expressions to do the data tidy.



However I'm wondering at what point it is beneficial to group identical expressions together to perform a mutate_at (eg. do all the boolean columns which use "Y", "N", "NA" together), and then perform a subsequent mutate of the remaining columns; versus having a function per row to perform a mutate? Is there a performance difference between the two such that it's worth complicating the code?



Appreciating that there will be use-case variation here - just wondered whether anyone had a feeling about some general rules before I either attempt it or spend time building some test examples to check whether it's worthwhile.










share|improve this question


















  • 6





    You can check it using microbenchmark or system.time

    – akrun
    Jan 2 at 9:28














0












0








0


1






I'm spending a bit of time improving performance of some tidyverse-based dataset analysis code which works on a dataset of between 100-500 columns and 250,000 rows of data (the number of columns you need depends upon the task you are performing).



The code reads data from a CSV file and then does some early data import tidying from CSV - for example fixing booleans which are recorded as "Y", "N", "" to be boolean rather than factors.



Originally we imported all the columns regardless of task - the work I've been doing has moved us to selective column import - skipping a load of work which may not be needed for certain tasks, and resulting in a factor-of-3 performance gain. Of course this makes data tidying a little more complex.



I've resorted to using mutate with a list of expressions to do the data tidy.



However I'm wondering at what point it is beneficial to group identical expressions together to perform a mutate_at (eg. do all the boolean columns which use "Y", "N", "NA" together), and then perform a subsequent mutate of the remaining columns; versus having a function per row to perform a mutate? Is there a performance difference between the two such that it's worth complicating the code?



Appreciating that there will be use-case variation here - just wondered whether anyone had a feeling about some general rules before I either attempt it or spend time building some test examples to check whether it's worthwhile.










share|improve this question














I'm spending a bit of time improving performance of some tidyverse-based dataset analysis code which works on a dataset of between 100-500 columns and 250,000 rows of data (the number of columns you need depends upon the task you are performing).



The code reads data from a CSV file and then does some early data import tidying from CSV - for example fixing booleans which are recorded as "Y", "N", "" to be boolean rather than factors.



Originally we imported all the columns regardless of task - the work I've been doing has moved us to selective column import - skipping a load of work which may not be needed for certain tasks, and resulting in a factor-of-3 performance gain. Of course this makes data tidying a little more complex.



I've resorted to using mutate with a list of expressions to do the data tidy.



However I'm wondering at what point it is beneficial to group identical expressions together to perform a mutate_at (eg. do all the boolean columns which use "Y", "N", "NA" together), and then perform a subsequent mutate of the remaining columns; versus having a function per row to perform a mutate? Is there a performance difference between the two such that it's worth complicating the code?



Appreciating that there will be use-case variation here - just wondered whether anyone had a feeling about some general rules before I either attempt it or spend time building some test examples to check whether it's worthwhile.







r dplyr






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 2 at 9:28









Andrew HillAndrew Hill

8219




8219








  • 6





    You can check it using microbenchmark or system.time

    – akrun
    Jan 2 at 9:28














  • 6





    You can check it using microbenchmark or system.time

    – akrun
    Jan 2 at 9:28








6




6





You can check it using microbenchmark or system.time

– akrun
Jan 2 at 9:28





You can check it using microbenchmark or system.time

– akrun
Jan 2 at 9:28












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54003916%2fwhat-is-the-performance-difference-between-mutate-at-and-mutate%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54003916%2fwhat-is-the-performance-difference-between-mutate-at-and-mutate%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas