Efficient way of reading a file into an std::vector?
I'd like to avoid unnecessary copies. I'm aiming for something along the lines of:
std::ifstream testFile( "testfile", "rb" );
std::vector<char> fileContents;
int fileSize = getFileSize( testFile );
fileContents.reserve( fileSize );
testFile.read( &fileContents[0], fileSize );
(which doesn't work because reserve
doesn't actually insert anything into the vector, so I can't access [0]
).
Of course, std::vector<char> fileContents(fileSize)
works, but there is an overhead of initializing all elements (fileSize
can be rather big). Same for resize()
.
This question is not so much about how important that overhead would be. Rather, I'm just curious to know if there's another way.
c++ stl vector
add a comment |
I'd like to avoid unnecessary copies. I'm aiming for something along the lines of:
std::ifstream testFile( "testfile", "rb" );
std::vector<char> fileContents;
int fileSize = getFileSize( testFile );
fileContents.reserve( fileSize );
testFile.read( &fileContents[0], fileSize );
(which doesn't work because reserve
doesn't actually insert anything into the vector, so I can't access [0]
).
Of course, std::vector<char> fileContents(fileSize)
works, but there is an overhead of initializing all elements (fileSize
can be rather big). Same for resize()
.
This question is not so much about how important that overhead would be. Rather, I'm just curious to know if there's another way.
c++ stl vector
1
If you want to avoid the reallocation cost required bypush_back
and you want to avoid the cost of zeroing the buffer required by usingresize
, don't use astd::vector
at all: use aboost::scoped_array
or something similar.
– James McNellis
Jan 21 '11 at 17:32
add a comment |
I'd like to avoid unnecessary copies. I'm aiming for something along the lines of:
std::ifstream testFile( "testfile", "rb" );
std::vector<char> fileContents;
int fileSize = getFileSize( testFile );
fileContents.reserve( fileSize );
testFile.read( &fileContents[0], fileSize );
(which doesn't work because reserve
doesn't actually insert anything into the vector, so I can't access [0]
).
Of course, std::vector<char> fileContents(fileSize)
works, but there is an overhead of initializing all elements (fileSize
can be rather big). Same for resize()
.
This question is not so much about how important that overhead would be. Rather, I'm just curious to know if there's another way.
c++ stl vector
I'd like to avoid unnecessary copies. I'm aiming for something along the lines of:
std::ifstream testFile( "testfile", "rb" );
std::vector<char> fileContents;
int fileSize = getFileSize( testFile );
fileContents.reserve( fileSize );
testFile.read( &fileContents[0], fileSize );
(which doesn't work because reserve
doesn't actually insert anything into the vector, so I can't access [0]
).
Of course, std::vector<char> fileContents(fileSize)
works, but there is an overhead of initializing all elements (fileSize
can be rather big). Same for resize()
.
This question is not so much about how important that overhead would be. Rather, I'm just curious to know if there's another way.
c++ stl vector
c++ stl vector
edited Jan 21 '11 at 17:15
Pedro d'Aquino
asked Jan 21 '11 at 16:59
Pedro d'AquinoPedro d'Aquino
3,06352642
3,06352642
1
If you want to avoid the reallocation cost required bypush_back
and you want to avoid the cost of zeroing the buffer required by usingresize
, don't use astd::vector
at all: use aboost::scoped_array
or something similar.
– James McNellis
Jan 21 '11 at 17:32
add a comment |
1
If you want to avoid the reallocation cost required bypush_back
and you want to avoid the cost of zeroing the buffer required by usingresize
, don't use astd::vector
at all: use aboost::scoped_array
or something similar.
– James McNellis
Jan 21 '11 at 17:32
1
1
If you want to avoid the reallocation cost required by
push_back
and you want to avoid the cost of zeroing the buffer required by using resize
, don't use a std::vector
at all: use a boost::scoped_array
or something similar.– James McNellis
Jan 21 '11 at 17:32
If you want to avoid the reallocation cost required by
push_back
and you want to avoid the cost of zeroing the buffer required by using resize
, don't use a std::vector
at all: use a boost::scoped_array
or something similar.– James McNellis
Jan 21 '11 at 17:32
add a comment |
3 Answers
3
active
oldest
votes
The canonical form is this:
#include<iterator>
// ...
std::ifstream testFile("testfile", std::ios::binary);
std::vector<char> fileContents((std::istreambuf_iterator<char>(testFile)),
std::istreambuf_iterator<char>());
If you are worried about reallocations then reserve space in the vector:
#include<iterator>
// ...
std::ifstream testFile("testfile", std::ios::binary);
std::vector<char> fileContents;
fileContents.reserve(fileSize);
fileContents.assign(std::istreambuf_iterator<char>(testFile),
std::istreambuf_iterator<char>());
Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)
– Thomas
Jan 21 '11 at 17:22
Yes, it would. If that's really a concern, then reserve and usestd::copy()
. Updated.
– wilhelmtell
Jan 21 '11 at 17:26
6
Yes, it is. As written, the code is incorrect becausefileContents.begin()
is not dereferenceable (it is equal tofileContents.end()
). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.
– James McNellis
Jan 21 '11 at 17:48
1
Better late than never: simplified the code a little. Remove the<algorithm>
dependency by replacing thestd::copy()
call withstd::vector::assign()
. Also, forstd::ifstream
there's no need to passstd::ios::in
to the constructor. The constructor knows that.
– wilhelmtell
Nov 10 '11 at 11:10
3
@wilhelmtell is this (the 2nd option) more efficient than simply doingvector<char> fileContents(fileSize);
andtestFile.read(&fileContents[0], fileSize);
? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed
– LyK
Oct 24 '15 at 16:12
|
show 5 more comments
If you want true zero-copy reading, that is, to eliminate copying from kernel to user space, just map the file into memory. Write your own mapped file wrapper or use one from boost::interprocess
.
add a comment |
If I understand you correctly, you want to read each element but don't want to load it all into the fileContents
, correct?
I personally don't think this would make unnecessary copies because open files multiple times would decrease performance more. Read once into a fileContents
vector is a reasonable solution in this case.
I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.
– ditkin
Jan 21 '15 at 17:25
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f4761529%2fefficient-way-of-reading-a-file-into-an-stdvectorchar%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
The canonical form is this:
#include<iterator>
// ...
std::ifstream testFile("testfile", std::ios::binary);
std::vector<char> fileContents((std::istreambuf_iterator<char>(testFile)),
std::istreambuf_iterator<char>());
If you are worried about reallocations then reserve space in the vector:
#include<iterator>
// ...
std::ifstream testFile("testfile", std::ios::binary);
std::vector<char> fileContents;
fileContents.reserve(fileSize);
fileContents.assign(std::istreambuf_iterator<char>(testFile),
std::istreambuf_iterator<char>());
Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)
– Thomas
Jan 21 '11 at 17:22
Yes, it would. If that's really a concern, then reserve and usestd::copy()
. Updated.
– wilhelmtell
Jan 21 '11 at 17:26
6
Yes, it is. As written, the code is incorrect becausefileContents.begin()
is not dereferenceable (it is equal tofileContents.end()
). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.
– James McNellis
Jan 21 '11 at 17:48
1
Better late than never: simplified the code a little. Remove the<algorithm>
dependency by replacing thestd::copy()
call withstd::vector::assign()
. Also, forstd::ifstream
there's no need to passstd::ios::in
to the constructor. The constructor knows that.
– wilhelmtell
Nov 10 '11 at 11:10
3
@wilhelmtell is this (the 2nd option) more efficient than simply doingvector<char> fileContents(fileSize);
andtestFile.read(&fileContents[0], fileSize);
? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed
– LyK
Oct 24 '15 at 16:12
|
show 5 more comments
The canonical form is this:
#include<iterator>
// ...
std::ifstream testFile("testfile", std::ios::binary);
std::vector<char> fileContents((std::istreambuf_iterator<char>(testFile)),
std::istreambuf_iterator<char>());
If you are worried about reallocations then reserve space in the vector:
#include<iterator>
// ...
std::ifstream testFile("testfile", std::ios::binary);
std::vector<char> fileContents;
fileContents.reserve(fileSize);
fileContents.assign(std::istreambuf_iterator<char>(testFile),
std::istreambuf_iterator<char>());
Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)
– Thomas
Jan 21 '11 at 17:22
Yes, it would. If that's really a concern, then reserve and usestd::copy()
. Updated.
– wilhelmtell
Jan 21 '11 at 17:26
6
Yes, it is. As written, the code is incorrect becausefileContents.begin()
is not dereferenceable (it is equal tofileContents.end()
). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.
– James McNellis
Jan 21 '11 at 17:48
1
Better late than never: simplified the code a little. Remove the<algorithm>
dependency by replacing thestd::copy()
call withstd::vector::assign()
. Also, forstd::ifstream
there's no need to passstd::ios::in
to the constructor. The constructor knows that.
– wilhelmtell
Nov 10 '11 at 11:10
3
@wilhelmtell is this (the 2nd option) more efficient than simply doingvector<char> fileContents(fileSize);
andtestFile.read(&fileContents[0], fileSize);
? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed
– LyK
Oct 24 '15 at 16:12
|
show 5 more comments
The canonical form is this:
#include<iterator>
// ...
std::ifstream testFile("testfile", std::ios::binary);
std::vector<char> fileContents((std::istreambuf_iterator<char>(testFile)),
std::istreambuf_iterator<char>());
If you are worried about reallocations then reserve space in the vector:
#include<iterator>
// ...
std::ifstream testFile("testfile", std::ios::binary);
std::vector<char> fileContents;
fileContents.reserve(fileSize);
fileContents.assign(std::istreambuf_iterator<char>(testFile),
std::istreambuf_iterator<char>());
The canonical form is this:
#include<iterator>
// ...
std::ifstream testFile("testfile", std::ios::binary);
std::vector<char> fileContents((std::istreambuf_iterator<char>(testFile)),
std::istreambuf_iterator<char>());
If you are worried about reallocations then reserve space in the vector:
#include<iterator>
// ...
std::ifstream testFile("testfile", std::ios::binary);
std::vector<char> fileContents;
fileContents.reserve(fileSize);
fileContents.assign(std::istreambuf_iterator<char>(testFile),
std::istreambuf_iterator<char>());
edited Nov 10 '11 at 11:06
answered Jan 21 '11 at 17:21
wilhelmtellwilhelmtell
43k1782123
43k1782123
Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)
– Thomas
Jan 21 '11 at 17:22
Yes, it would. If that's really a concern, then reserve and usestd::copy()
. Updated.
– wilhelmtell
Jan 21 '11 at 17:26
6
Yes, it is. As written, the code is incorrect becausefileContents.begin()
is not dereferenceable (it is equal tofileContents.end()
). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.
– James McNellis
Jan 21 '11 at 17:48
1
Better late than never: simplified the code a little. Remove the<algorithm>
dependency by replacing thestd::copy()
call withstd::vector::assign()
. Also, forstd::ifstream
there's no need to passstd::ios::in
to the constructor. The constructor knows that.
– wilhelmtell
Nov 10 '11 at 11:10
3
@wilhelmtell is this (the 2nd option) more efficient than simply doingvector<char> fileContents(fileSize);
andtestFile.read(&fileContents[0], fileSize);
? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed
– LyK
Oct 24 '15 at 16:12
|
show 5 more comments
Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)
– Thomas
Jan 21 '11 at 17:22
Yes, it would. If that's really a concern, then reserve and usestd::copy()
. Updated.
– wilhelmtell
Jan 21 '11 at 17:26
6
Yes, it is. As written, the code is incorrect becausefileContents.begin()
is not dereferenceable (it is equal tofileContents.end()
). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.
– James McNellis
Jan 21 '11 at 17:48
1
Better late than never: simplified the code a little. Remove the<algorithm>
dependency by replacing thestd::copy()
call withstd::vector::assign()
. Also, forstd::ifstream
there's no need to passstd::ios::in
to the constructor. The constructor knows that.
– wilhelmtell
Nov 10 '11 at 11:10
3
@wilhelmtell is this (the 2nd option) more efficient than simply doingvector<char> fileContents(fileSize);
andtestFile.read(&fileContents[0], fileSize);
? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed
– LyK
Oct 24 '15 at 16:12
Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)
– Thomas
Jan 21 '11 at 17:22
Won't that do reallocations while the vector is growing? (Since the iterators might not support subtraction, the constructor cannot determine the size in advance.)
– Thomas
Jan 21 '11 at 17:22
Yes, it would. If that's really a concern, then reserve and use
std::copy()
. Updated.– wilhelmtell
Jan 21 '11 at 17:26
Yes, it would. If that's really a concern, then reserve and use
std::copy()
. Updated.– wilhelmtell
Jan 21 '11 at 17:26
6
6
Yes, it is. As written, the code is incorrect because
fileContents.begin()
is not dereferenceable (it is equal to fileContents.end()
). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.– James McNellis
Jan 21 '11 at 17:48
Yes, it is. As written, the code is incorrect because
fileContents.begin()
is not dereferenceable (it is equal to fileContents.end()
). An STL implementation with debugging support (like the Visual C++ 2010 STL) should raise an assertion when executing this code.– James McNellis
Jan 21 '11 at 17:48
1
1
Better late than never: simplified the code a little. Remove the
<algorithm>
dependency by replacing the std::copy()
call with std::vector::assign()
. Also, for std::ifstream
there's no need to pass std::ios::in
to the constructor. The constructor knows that.– wilhelmtell
Nov 10 '11 at 11:10
Better late than never: simplified the code a little. Remove the
<algorithm>
dependency by replacing the std::copy()
call with std::vector::assign()
. Also, for std::ifstream
there's no need to pass std::ios::in
to the constructor. The constructor knows that.– wilhelmtell
Nov 10 '11 at 11:10
3
3
@wilhelmtell is this (the 2nd option) more efficient than simply doing
vector<char> fileContents(fileSize);
and testFile.read(&fileContents[0], fileSize);
? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed– LyK
Oct 24 '15 at 16:12
@wilhelmtell is this (the 2nd option) more efficient than simply doing
vector<char> fileContents(fileSize);
and testFile.read(&fileContents[0], fileSize);
? Judging from a quick test (150MB file), using read seems quite more efficient in terms of speed– LyK
Oct 24 '15 at 16:12
|
show 5 more comments
If you want true zero-copy reading, that is, to eliminate copying from kernel to user space, just map the file into memory. Write your own mapped file wrapper or use one from boost::interprocess
.
add a comment |
If you want true zero-copy reading, that is, to eliminate copying from kernel to user space, just map the file into memory. Write your own mapped file wrapper or use one from boost::interprocess
.
add a comment |
If you want true zero-copy reading, that is, to eliminate copying from kernel to user space, just map the file into memory. Write your own mapped file wrapper or use one from boost::interprocess
.
If you want true zero-copy reading, that is, to eliminate copying from kernel to user space, just map the file into memory. Write your own mapped file wrapper or use one from boost::interprocess
.
answered Jan 21 '11 at 18:21
Maxim EgorushkinMaxim Egorushkin
86.2k11100183
86.2k11100183
add a comment |
add a comment |
If I understand you correctly, you want to read each element but don't want to load it all into the fileContents
, correct?
I personally don't think this would make unnecessary copies because open files multiple times would decrease performance more. Read once into a fileContents
vector is a reasonable solution in this case.
I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.
– ditkin
Jan 21 '15 at 17:25
add a comment |
If I understand you correctly, you want to read each element but don't want to load it all into the fileContents
, correct?
I personally don't think this would make unnecessary copies because open files multiple times would decrease performance more. Read once into a fileContents
vector is a reasonable solution in this case.
I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.
– ditkin
Jan 21 '15 at 17:25
add a comment |
If I understand you correctly, you want to read each element but don't want to load it all into the fileContents
, correct?
I personally don't think this would make unnecessary copies because open files multiple times would decrease performance more. Read once into a fileContents
vector is a reasonable solution in this case.
If I understand you correctly, you want to read each element but don't want to load it all into the fileContents
, correct?
I personally don't think this would make unnecessary copies because open files multiple times would decrease performance more. Read once into a fileContents
vector is a reasonable solution in this case.
answered Jan 21 '11 at 17:21
ChanChan
5,9213592144
5,9213592144
I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.
– ditkin
Jan 21 '15 at 17:25
add a comment |
I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.
– ditkin
Jan 21 '15 at 17:25
I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.
– ditkin
Jan 21 '15 at 17:25
I did not mean to vote this down, but it is locked in. If you edit the answer I can / will remove the down vote.
– ditkin
Jan 21 '15 at 17:25
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f4761529%2fefficient-way-of-reading-a-file-into-an-stdvectorchar%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
If you want to avoid the reallocation cost required by
push_back
and you want to avoid the cost of zeroing the buffer required by usingresize
, don't use astd::vector
at all: use aboost::scoped_array
or something similar.– James McNellis
Jan 21 '11 at 17:32