Efficient way of reading a file into an std::vector?












35















I'd like to avoid unnecessary copies. I'm aiming for something along the lines of:



std::ifstream testFile( "testfile", "rb" );
std::vector<char> fileContents;
int fileSize = getFileSize( testFile );
fileContents.reserve( fileSize );
testFile.read( &fileContents[0], fileSize );


(which doesn't work because reserve doesn't actually insert anything into the vector, so I can't access [0]).



Of course, std::vector<char> fileContents(fileSize) works, but there is an overhead of initializing all elements (fileSize can be rather big). Same for resize().



This question is not so much about how important that overhead would be. Rather, I'm just curious to know if there's another way.










share|improve this question




















  • 1





    If you want to avoid the reallocation cost required by push_back and you want to avoid the cost of zeroing the buffer required by using resize, don't use a std::vector at all: use a boost::scoped_array or something similar.

    – James McNellis
    Jan 21 '11 at 17:32
















35















I'd like to avoid unnecessary copies. I'm aiming for something along the lines of:



std::ifstream testFile( "testfile", "rb" );
std::vector<char> fileContents;
int fileSize = getFileSize( testFile );
fileContents.reserve( fileSize );
testFile.read( &fileContents[0], fileSize );


(which doesn't work because reserve doesn't actually insert anything into the vector, so I can't access [0]).



Of course, std::vector<char> fileContents(fileSize) works, but there is an overhead of initializing all elements (fileSize can be rather big). Same for resize().



This question is not so much about how important that overhead would be. Rather, I'm just curious to know if there's another way.










share|improve this question




















  • 1





    If you want to avoid the reallocation cost required by push_back and you want to avoid the cost of zeroing the buffer required by using resize, don't use a std::vector at all: use a boost::scoped_array or something similar.

    – James McNellis
    Jan 21 '11 at 17:32














35












35








35


10






I'd like to avoid unnecessary copies. I'm aiming for something along the lines of:



std::ifstream testFile( "testfile", "rb" );
std::vector<char> fileContents;
int fileSize = getFileSize( testFile );
fileContents.reserve( fileSize );
testFile.read( &fileContents[0], fileSize );


(which doesn't work because reserve doesn't actually insert anything into the vector, so I can't access [0]).



Of course, std::vector<char> fileContents(fileSize) works, but there is an overhead of initializing all elements (fileSize can be rather big). Same for resize().



This question is not so much about how important that overhead would be. Rather, I'm just curious to know if there's another way.










share|improve this question
















I'd like to avoid unnecessary copies. I'm aiming for something along the lines of:



std::ifstream testFile( "testfile", "rb" );
std::vector<char> fileContents;
int fileSize = getFileSize( testFile );
fileContents.reserve( fileSize );
testFile.read( &fileContents[0], fileSize );


(which doesn't work because reserve doesn't actually insert anything into the vector, so I can't access [0]).



Of course, std::vector<char> fileContents(fileSize) works, but there is an overhead of initializing all elements (fileSize can be rather big). Same for resize().



This question is not so much about how important that overhead would be. Rather, I'm just curious to know if there's another way.







c++ stl vector






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 21 '11 at 17:15







Pedro d'Aquino

















asked Jan 21 '11 at 16:59









Pedro d'AquinoPedro d'Aquino

3,06352642




3,06352642








  • 1





    If you want to avoid the reallocation cost required by push_back and you want to avoid the cost of zeroing the buffer required by using resize, don't use a std::vector at all: use a boost::scoped_array or something similar.

    – James McNellis
    Jan 21 '11 at 17:32














  • 1





    If you want to avoid the reallocation cost required by push_back and you want to avoid the cost of zeroing the buffer required by using resize, don't use a std::vector at all: use a boost::scoped_array or something similar.

    – James McNellis
    Jan 21 '11 at 17:32








1




1





If you want to avoid the reallocation cost required by push_back and you want to avoid the cost of zeroing the buffer required by using resize, don't use a std::vector at all: use a boost::scoped_array or something similar.

– James McNellis
Jan 21 '11 at 17:32





If you want to avoid the reallocation cost required by push_back and you want to avoid the cost of zeroing the buffer required by using resize, don't use a std::vector at all: use a boost::scoped_array or something similar.

– James McNellis
Jan 21 '11 at 17:32












3 Answers
3






active

oldest

votes


















57














The canonical form is this:



#include<iterator>
// ...

std::ifstream testFile("testfile", std::ios::binary);
std::vector<char> fileContents((std::istreambuf_iterator<char>(testFile)),
std::istreambuf_iterator<char>());


If you are worried about reallocations then reserve space in the vector:



#include<iterator>
// ...

std::ifstream testFile("testfile", std::ios::binary);
std::vector<char> fileContents;
fileContents.reserve(fileSize);
fileContents.assign(std::istreambuf_iterator<char>(testFile),
std::istreambuf_iterator<char>());





share|improve this answer


























  • Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)

    – Thomas
    Jan 21 '11 at 17:22













  • Yes, it would. If that's really a concern, then reserve and use std::copy(). Updated.

    – wilhelmtell
    Jan 21 '11 at 17:26






  • 6





    Yes, it is. As written, the code is incorrect because fileContents.begin() is not dereferenceable (it is equal to fileContents.end()). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.

    – James McNellis
    Jan 21 '11 at 17:48








  • 1





    Better late than never: simplified the code a little. Remove the <algorithm> dependency by replacing the std::copy() call with std::vector::assign(). Also, for std::ifstream there's no need to pass std::ios::in to the constructor. The constructor knows that.

    – wilhelmtell
    Nov 10 '11 at 11:10






  • 3





    @wilhelmtell is this (the 2nd option) more efficient than simply doing vector<char> fileContents(fileSize); and testFile.read(&fileContents[0], fileSize); ? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed

    – LyK
    Oct 24 '15 at 16:12



















4














If you want true zero-copy reading, that is, to eliminate copying from kernel to user space, just map the file into memory. Write your own mapped file wrapper or use one from boost::interprocess.






share|improve this answer































    0














    If I understand you correctly, you want to read each element but don't want to load it all into the fileContents, correct?
    I personally don't think this would make unnecessary copies because open files multiple times would decrease performance more. Read once into a fileContentsvector is a reasonable solution in this case.






    share|improve this answer
























    • I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.

      – ditkin
      Jan 21 '15 at 17:25











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f4761529%2fefficient-way-of-reading-a-file-into-an-stdvectorchar%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    57














    The canonical form is this:



    #include<iterator>
    // ...

    std::ifstream testFile("testfile", std::ios::binary);
    std::vector<char> fileContents((std::istreambuf_iterator<char>(testFile)),
    std::istreambuf_iterator<char>());


    If you are worried about reallocations then reserve space in the vector:



    #include<iterator>
    // ...

    std::ifstream testFile("testfile", std::ios::binary);
    std::vector<char> fileContents;
    fileContents.reserve(fileSize);
    fileContents.assign(std::istreambuf_iterator<char>(testFile),
    std::istreambuf_iterator<char>());





    share|improve this answer


























    • Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)

      – Thomas
      Jan 21 '11 at 17:22













    • Yes, it would. If that's really a concern, then reserve and use std::copy(). Updated.

      – wilhelmtell
      Jan 21 '11 at 17:26






    • 6





      Yes, it is. As written, the code is incorrect because fileContents.begin() is not dereferenceable (it is equal to fileContents.end()). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.

      – James McNellis
      Jan 21 '11 at 17:48








    • 1





      Better late than never: simplified the code a little. Remove the <algorithm> dependency by replacing the std::copy() call with std::vector::assign(). Also, for std::ifstream there's no need to pass std::ios::in to the constructor. The constructor knows that.

      – wilhelmtell
      Nov 10 '11 at 11:10






    • 3





      @wilhelmtell is this (the 2nd option) more efficient than simply doing vector<char> fileContents(fileSize); and testFile.read(&fileContents[0], fileSize); ? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed

      – LyK
      Oct 24 '15 at 16:12
















    57














    The canonical form is this:



    #include<iterator>
    // ...

    std::ifstream testFile("testfile", std::ios::binary);
    std::vector<char> fileContents((std::istreambuf_iterator<char>(testFile)),
    std::istreambuf_iterator<char>());


    If you are worried about reallocations then reserve space in the vector:



    #include<iterator>
    // ...

    std::ifstream testFile("testfile", std::ios::binary);
    std::vector<char> fileContents;
    fileContents.reserve(fileSize);
    fileContents.assign(std::istreambuf_iterator<char>(testFile),
    std::istreambuf_iterator<char>());





    share|improve this answer


























    • Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)

      – Thomas
      Jan 21 '11 at 17:22













    • Yes, it would. If that's really a concern, then reserve and use std::copy(). Updated.

      – wilhelmtell
      Jan 21 '11 at 17:26






    • 6





      Yes, it is. As written, the code is incorrect because fileContents.begin() is not dereferenceable (it is equal to fileContents.end()). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.

      – James McNellis
      Jan 21 '11 at 17:48








    • 1





      Better late than never: simplified the code a little. Remove the <algorithm> dependency by replacing the std::copy() call with std::vector::assign(). Also, for std::ifstream there's no need to pass std::ios::in to the constructor. The constructor knows that.

      – wilhelmtell
      Nov 10 '11 at 11:10






    • 3





      @wilhelmtell is this (the 2nd option) more efficient than simply doing vector<char> fileContents(fileSize); and testFile.read(&fileContents[0], fileSize); ? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed

      – LyK
      Oct 24 '15 at 16:12














    57












    57








    57







    The canonical form is this:



    #include<iterator>
    // ...

    std::ifstream testFile("testfile", std::ios::binary);
    std::vector<char> fileContents((std::istreambuf_iterator<char>(testFile)),
    std::istreambuf_iterator<char>());


    If you are worried about reallocations then reserve space in the vector:



    #include<iterator>
    // ...

    std::ifstream testFile("testfile", std::ios::binary);
    std::vector<char> fileContents;
    fileContents.reserve(fileSize);
    fileContents.assign(std::istreambuf_iterator<char>(testFile),
    std::istreambuf_iterator<char>());





    share|improve this answer















    The canonical form is this:



    #include<iterator>
    // ...

    std::ifstream testFile("testfile", std::ios::binary);
    std::vector<char> fileContents((std::istreambuf_iterator<char>(testFile)),
    std::istreambuf_iterator<char>());


    If you are worried about reallocations then reserve space in the vector:



    #include<iterator>
    // ...

    std::ifstream testFile("testfile", std::ios::binary);
    std::vector<char> fileContents;
    fileContents.reserve(fileSize);
    fileContents.assign(std::istreambuf_iterator<char>(testFile),
    std::istreambuf_iterator<char>());






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 10 '11 at 11:06

























    answered Jan 21 '11 at 17:21









    wilhelmtellwilhelmtell

    43k1782123




    43k1782123













    • Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)

      – Thomas
      Jan 21 '11 at 17:22













    • Yes, it would. If that's really a concern, then reserve and use std::copy(). Updated.

      – wilhelmtell
      Jan 21 '11 at 17:26






    • 6





      Yes, it is. As written, the code is incorrect because fileContents.begin() is not dereferenceable (it is equal to fileContents.end()). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.

      – James McNellis
      Jan 21 '11 at 17:48








    • 1





      Better late than never: simplified the code a little. Remove the <algorithm> dependency by replacing the std::copy() call with std::vector::assign(). Also, for std::ifstream there's no need to pass std::ios::in to the constructor. The constructor knows that.

      – wilhelmtell
      Nov 10 '11 at 11:10






    • 3





      @wilhelmtell is this (the 2nd option) more efficient than simply doing vector<char> fileContents(fileSize); and testFile.read(&fileContents[0], fileSize); ? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed

      – LyK
      Oct 24 '15 at 16:12



















    • Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)

      – Thomas
      Jan 21 '11 at 17:22













    • Yes, it would. If that's really a concern, then reserve and use std::copy(). Updated.

      – wilhelmtell
      Jan 21 '11 at 17:26






    • 6





      Yes, it is. As written, the code is incorrect because fileContents.begin() is not dereferenceable (it is equal to fileContents.end()). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.

      – James McNellis
      Jan 21 '11 at 17:48








    • 1





      Better late than never: simplified the code a little. Remove the <algorithm> dependency by replacing the std::copy() call with std::vector::assign(). Also, for std::ifstream there's no need to pass std::ios::in to the constructor. The constructor knows that.

      – wilhelmtell
      Nov 10 '11 at 11:10






    • 3





      @wilhelmtell is this (the 2nd option) more efficient than simply doing vector<char> fileContents(fileSize); and testFile.read(&fileContents[0], fileSize); ? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed

      – LyK
      Oct 24 '15 at 16:12

















    Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)

    – Thomas
    Jan 21 '11 at 17:22







    Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)

    – Thomas
    Jan 21 '11 at 17:22















    Yes, it would. If that's really a concern, then reserve and use std::copy(). Updated.

    – wilhelmtell
    Jan 21 '11 at 17:26





    Yes, it would. If that's really a concern, then reserve and use std::copy(). Updated.

    – wilhelmtell
    Jan 21 '11 at 17:26




    6




    6





    Yes, it is. As written, the code is incorrect because fileContents.begin() is not dereferenceable (it is equal to fileContents.end()). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.

    – James McNellis
    Jan 21 '11 at 17:48







    Yes, it is. As written, the code is incorrect because fileContents.begin() is not dereferenceable (it is equal to fileContents.end()). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.

    – James McNellis
    Jan 21 '11 at 17:48






    1




    1





    Better late than never: simplified the code a little. Remove the <algorithm> dependency by replacing the std::copy() call with std::vector::assign(). Also, for std::ifstream there's no need to pass std::ios::in to the constructor. The constructor knows that.

    – wilhelmtell
    Nov 10 '11 at 11:10





    Better late than never: simplified the code a little. Remove the <algorithm> dependency by replacing the std::copy() call with std::vector::assign(). Also, for std::ifstream there's no need to pass std::ios::in to the constructor. The constructor knows that.

    – wilhelmtell
    Nov 10 '11 at 11:10




    3




    3





    @wilhelmtell is this (the 2nd option) more efficient than simply doing vector<char> fileContents(fileSize); and testFile.read(&fileContents[0], fileSize); ? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed

    – LyK
    Oct 24 '15 at 16:12





    @wilhelmtell is this (the 2nd option) more efficient than simply doing vector<char> fileContents(fileSize); and testFile.read(&fileContents[0], fileSize); ? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed

    – LyK
    Oct 24 '15 at 16:12













    4














    If you want true zero-copy reading, that is, to eliminate copying from kernel to user space, just map the file into memory. Write your own mapped file wrapper or use one from boost::interprocess.






    share|improve this answer




























      4














      If you want true zero-copy reading, that is, to eliminate copying from kernel to user space, just map the file into memory. Write your own mapped file wrapper or use one from boost::interprocess.






      share|improve this answer


























        4












        4








        4







        If you want true zero-copy reading, that is, to eliminate copying from kernel to user space, just map the file into memory. Write your own mapped file wrapper or use one from boost::interprocess.






        share|improve this answer













        If you want true zero-copy reading, that is, to eliminate copying from kernel to user space, just map the file into memory. Write your own mapped file wrapper or use one from boost::interprocess.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 21 '11 at 18:21









        Maxim EgorushkinMaxim Egorushkin

        86.2k11100183




        86.2k11100183























            0














            If I understand you correctly, you want to read each element but don't want to load it all into the fileContents, correct?
            I personally don't think this would make unnecessary copies because open files multiple times would decrease performance more. Read once into a fileContentsvector is a reasonable solution in this case.






            share|improve this answer
























            • I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.

              – ditkin
              Jan 21 '15 at 17:25
















            0














            If I understand you correctly, you want to read each element but don't want to load it all into the fileContents, correct?
            I personally don't think this would make unnecessary copies because open files multiple times would decrease performance more. Read once into a fileContentsvector is a reasonable solution in this case.






            share|improve this answer
























            • I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.

              – ditkin
              Jan 21 '15 at 17:25














            0












            0








            0







            If I understand you correctly, you want to read each element but don't want to load it all into the fileContents, correct?
            I personally don't think this would make unnecessary copies because open files multiple times would decrease performance more. Read once into a fileContentsvector is a reasonable solution in this case.






            share|improve this answer













            If I understand you correctly, you want to read each element but don't want to load it all into the fileContents, correct?
            I personally don't think this would make unnecessary copies because open files multiple times would decrease performance more. Read once into a fileContentsvector is a reasonable solution in this case.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jan 21 '11 at 17:21









            ChanChan

            5,9213592144




            5,9213592144













            • I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.

              – ditkin
              Jan 21 '15 at 17:25



















            • I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.

              – ditkin
              Jan 21 '15 at 17:25

















            I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.

            – ditkin
            Jan 21 '15 at 17:25





            I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.

            – ditkin
            Jan 21 '15 at 17:25


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f4761529%2fefficient-way-of-reading-a-file-into-an-stdvectorchar%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas