How can this combination algorithm be optimized?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am writing a molecular dynamics program that needs to take the atoms in a molecule and find the possible ways they can bond. To do this, I have a vector of Atom objects and I generate combination pairs using the following algorithm:
void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {
std::string bitmask(K, 1);
bitmask.resize(N, 0);
do {
/* This loop takes forever with larger N values (approx. 3000) */
std::vector<int> indexRow;
for (int i = 0; i < N; i++)
{
if (bitmask[i]) indexRow.push_back(i);
}
indices.push_back(indexRow);
} while (std::prev_permutation(bitmask.begin(), bitmask.end()));
}
It is a simple N choose K algorithm (i.e. the indices returned could contain (1, 2) but not (2, 1)) where in my case N is the number of atoms in the molecule and K is 2.
I then call the algorithm like this:
void CalculateBondGraph(const std::vector<Atom *> &atoms, std::map<int,
std::map<int, double>> &bondGraph, ForceField *forceField) {
int natoms = atoms.size();
std::vector<std::vector<int>> indices;
utils::CombinationKN(indices, 2, natoms);
for (auto &v : indices) {
int i = v[0];
int j = v[1];
/*... Check if atoms i and j are bonded based on their coordinates.*/
}
}
The issue with this algorithm is that it takes forever to complete for large molecules that have 3000+ atoms. I have thought about parallelizing it (specifically with OpenMP), but even then, the work would have to be split among a few threads and it would still take a lot of time to complete. I need a way to optimize this algorithm so it doesn't take so long to compute combinations. Any help is appreciated.
Thank you,
Vikas
c++ algorithm parallel-processing
add a comment |
I am writing a molecular dynamics program that needs to take the atoms in a molecule and find the possible ways they can bond. To do this, I have a vector of Atom objects and I generate combination pairs using the following algorithm:
void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {
std::string bitmask(K, 1);
bitmask.resize(N, 0);
do {
/* This loop takes forever with larger N values (approx. 3000) */
std::vector<int> indexRow;
for (int i = 0; i < N; i++)
{
if (bitmask[i]) indexRow.push_back(i);
}
indices.push_back(indexRow);
} while (std::prev_permutation(bitmask.begin(), bitmask.end()));
}
It is a simple N choose K algorithm (i.e. the indices returned could contain (1, 2) but not (2, 1)) where in my case N is the number of atoms in the molecule and K is 2.
I then call the algorithm like this:
void CalculateBondGraph(const std::vector<Atom *> &atoms, std::map<int,
std::map<int, double>> &bondGraph, ForceField *forceField) {
int natoms = atoms.size();
std::vector<std::vector<int>> indices;
utils::CombinationKN(indices, 2, natoms);
for (auto &v : indices) {
int i = v[0];
int j = v[1];
/*... Check if atoms i and j are bonded based on their coordinates.*/
}
}
The issue with this algorithm is that it takes forever to complete for large molecules that have 3000+ atoms. I have thought about parallelizing it (specifically with OpenMP), but even then, the work would have to be split among a few threads and it would still take a lot of time to complete. I need a way to optimize this algorithm so it doesn't take so long to compute combinations. Any help is appreciated.
Thank you,
Vikas
c++ algorithm parallel-processing
Profile. Use a profiler to determine the code areas that take up the most time. Once the code is found, have compiler generate assembly listing. See if you can optimize the code better. Otherwise look at changing the design and data structures. Search the internet for "Data Driven Design" and "optimization for instruction cache" and "optimization for data cache". You may want to set the compiler's optimizations for highest and make a release build, then profile.
– Thomas Matthews
Jan 4 at 0:14
std::vector<std::vector<int>>can be surprisingly slow due to the lack of contiguous data. Often a singlevectormasquerading as 2D is a better option. It looks like you have something that works that you want to make work better. Ask your rubber duck if Code Review is right for you. And read their how to ask pages to make sure.
– user4581301
Jan 4 at 0:16
One major consumption of time is branching. Use algebra or Boolean Algebra to reduce instructions. If your processor supports conditional compilation, you may want to redesign your algorithm to take advantage of that. Although processors may be able to fit small loops in their instruction cache, evaluation of a branch takes more time than advancing to the next instruction.
– Thomas Matthews
Jan 4 at 0:20
As implemented,CombinationKN()resizes various containers multiple times, which (potentially) results in memory allocation and deallocation every loop iteration - so quality of implementation of the containers and their allocators is a key driver of performance. Before worrying about how to parallelise the loops, it would be worth optimising the memory reallocation (e.g. reserving a maximum possible size).
– Peter
Jan 4 at 0:35
Thank you for the tips. I used the matrix instead of the vector of vectors along with the new code to speed up the function.
– Vikas M
Jan 7 at 0:24
add a comment |
I am writing a molecular dynamics program that needs to take the atoms in a molecule and find the possible ways they can bond. To do this, I have a vector of Atom objects and I generate combination pairs using the following algorithm:
void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {
std::string bitmask(K, 1);
bitmask.resize(N, 0);
do {
/* This loop takes forever with larger N values (approx. 3000) */
std::vector<int> indexRow;
for (int i = 0; i < N; i++)
{
if (bitmask[i]) indexRow.push_back(i);
}
indices.push_back(indexRow);
} while (std::prev_permutation(bitmask.begin(), bitmask.end()));
}
It is a simple N choose K algorithm (i.e. the indices returned could contain (1, 2) but not (2, 1)) where in my case N is the number of atoms in the molecule and K is 2.
I then call the algorithm like this:
void CalculateBondGraph(const std::vector<Atom *> &atoms, std::map<int,
std::map<int, double>> &bondGraph, ForceField *forceField) {
int natoms = atoms.size();
std::vector<std::vector<int>> indices;
utils::CombinationKN(indices, 2, natoms);
for (auto &v : indices) {
int i = v[0];
int j = v[1];
/*... Check if atoms i and j are bonded based on their coordinates.*/
}
}
The issue with this algorithm is that it takes forever to complete for large molecules that have 3000+ atoms. I have thought about parallelizing it (specifically with OpenMP), but even then, the work would have to be split among a few threads and it would still take a lot of time to complete. I need a way to optimize this algorithm so it doesn't take so long to compute combinations. Any help is appreciated.
Thank you,
Vikas
c++ algorithm parallel-processing
I am writing a molecular dynamics program that needs to take the atoms in a molecule and find the possible ways they can bond. To do this, I have a vector of Atom objects and I generate combination pairs using the following algorithm:
void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {
std::string bitmask(K, 1);
bitmask.resize(N, 0);
do {
/* This loop takes forever with larger N values (approx. 3000) */
std::vector<int> indexRow;
for (int i = 0; i < N; i++)
{
if (bitmask[i]) indexRow.push_back(i);
}
indices.push_back(indexRow);
} while (std::prev_permutation(bitmask.begin(), bitmask.end()));
}
It is a simple N choose K algorithm (i.e. the indices returned could contain (1, 2) but not (2, 1)) where in my case N is the number of atoms in the molecule and K is 2.
I then call the algorithm like this:
void CalculateBondGraph(const std::vector<Atom *> &atoms, std::map<int,
std::map<int, double>> &bondGraph, ForceField *forceField) {
int natoms = atoms.size();
std::vector<std::vector<int>> indices;
utils::CombinationKN(indices, 2, natoms);
for (auto &v : indices) {
int i = v[0];
int j = v[1];
/*... Check if atoms i and j are bonded based on their coordinates.*/
}
}
The issue with this algorithm is that it takes forever to complete for large molecules that have 3000+ atoms. I have thought about parallelizing it (specifically with OpenMP), but even then, the work would have to be split among a few threads and it would still take a lot of time to complete. I need a way to optimize this algorithm so it doesn't take so long to compute combinations. Any help is appreciated.
Thank you,
Vikas
c++ algorithm parallel-processing
c++ algorithm parallel-processing
asked Jan 4 at 0:09
Vikas MVikas M
266
266
Profile. Use a profiler to determine the code areas that take up the most time. Once the code is found, have compiler generate assembly listing. See if you can optimize the code better. Otherwise look at changing the design and data structures. Search the internet for "Data Driven Design" and "optimization for instruction cache" and "optimization for data cache". You may want to set the compiler's optimizations for highest and make a release build, then profile.
– Thomas Matthews
Jan 4 at 0:14
std::vector<std::vector<int>>can be surprisingly slow due to the lack of contiguous data. Often a singlevectormasquerading as 2D is a better option. It looks like you have something that works that you want to make work better. Ask your rubber duck if Code Review is right for you. And read their how to ask pages to make sure.
– user4581301
Jan 4 at 0:16
One major consumption of time is branching. Use algebra or Boolean Algebra to reduce instructions. If your processor supports conditional compilation, you may want to redesign your algorithm to take advantage of that. Although processors may be able to fit small loops in their instruction cache, evaluation of a branch takes more time than advancing to the next instruction.
– Thomas Matthews
Jan 4 at 0:20
As implemented,CombinationKN()resizes various containers multiple times, which (potentially) results in memory allocation and deallocation every loop iteration - so quality of implementation of the containers and their allocators is a key driver of performance. Before worrying about how to parallelise the loops, it would be worth optimising the memory reallocation (e.g. reserving a maximum possible size).
– Peter
Jan 4 at 0:35
Thank you for the tips. I used the matrix instead of the vector of vectors along with the new code to speed up the function.
– Vikas M
Jan 7 at 0:24
add a comment |
Profile. Use a profiler to determine the code areas that take up the most time. Once the code is found, have compiler generate assembly listing. See if you can optimize the code better. Otherwise look at changing the design and data structures. Search the internet for "Data Driven Design" and "optimization for instruction cache" and "optimization for data cache". You may want to set the compiler's optimizations for highest and make a release build, then profile.
– Thomas Matthews
Jan 4 at 0:14
std::vector<std::vector<int>>can be surprisingly slow due to the lack of contiguous data. Often a singlevectormasquerading as 2D is a better option. It looks like you have something that works that you want to make work better. Ask your rubber duck if Code Review is right for you. And read their how to ask pages to make sure.
– user4581301
Jan 4 at 0:16
One major consumption of time is branching. Use algebra or Boolean Algebra to reduce instructions. If your processor supports conditional compilation, you may want to redesign your algorithm to take advantage of that. Although processors may be able to fit small loops in their instruction cache, evaluation of a branch takes more time than advancing to the next instruction.
– Thomas Matthews
Jan 4 at 0:20
As implemented,CombinationKN()resizes various containers multiple times, which (potentially) results in memory allocation and deallocation every loop iteration - so quality of implementation of the containers and their allocators is a key driver of performance. Before worrying about how to parallelise the loops, it would be worth optimising the memory reallocation (e.g. reserving a maximum possible size).
– Peter
Jan 4 at 0:35
Thank you for the tips. I used the matrix instead of the vector of vectors along with the new code to speed up the function.
– Vikas M
Jan 7 at 0:24
Profile. Use a profiler to determine the code areas that take up the most time. Once the code is found, have compiler generate assembly listing. See if you can optimize the code better. Otherwise look at changing the design and data structures. Search the internet for "Data Driven Design" and "optimization for instruction cache" and "optimization for data cache". You may want to set the compiler's optimizations for highest and make a release build, then profile.
– Thomas Matthews
Jan 4 at 0:14
Profile. Use a profiler to determine the code areas that take up the most time. Once the code is found, have compiler generate assembly listing. See if you can optimize the code better. Otherwise look at changing the design and data structures. Search the internet for "Data Driven Design" and "optimization for instruction cache" and "optimization for data cache". You may want to set the compiler's optimizations for highest and make a release build, then profile.
– Thomas Matthews
Jan 4 at 0:14
std::vector<std::vector<int>> can be surprisingly slow due to the lack of contiguous data. Often a single vector masquerading as 2D is a better option. It looks like you have something that works that you want to make work better. Ask your rubber duck if Code Review is right for you. And read their how to ask pages to make sure.– user4581301
Jan 4 at 0:16
std::vector<std::vector<int>> can be surprisingly slow due to the lack of contiguous data. Often a single vector masquerading as 2D is a better option. It looks like you have something that works that you want to make work better. Ask your rubber duck if Code Review is right for you. And read their how to ask pages to make sure.– user4581301
Jan 4 at 0:16
One major consumption of time is branching. Use algebra or Boolean Algebra to reduce instructions. If your processor supports conditional compilation, you may want to redesign your algorithm to take advantage of that. Although processors may be able to fit small loops in their instruction cache, evaluation of a branch takes more time than advancing to the next instruction.
– Thomas Matthews
Jan 4 at 0:20
One major consumption of time is branching. Use algebra or Boolean Algebra to reduce instructions. If your processor supports conditional compilation, you may want to redesign your algorithm to take advantage of that. Although processors may be able to fit small loops in their instruction cache, evaluation of a branch takes more time than advancing to the next instruction.
– Thomas Matthews
Jan 4 at 0:20
As implemented,
CombinationKN() resizes various containers multiple times, which (potentially) results in memory allocation and deallocation every loop iteration - so quality of implementation of the containers and their allocators is a key driver of performance. Before worrying about how to parallelise the loops, it would be worth optimising the memory reallocation (e.g. reserving a maximum possible size).– Peter
Jan 4 at 0:35
As implemented,
CombinationKN() resizes various containers multiple times, which (potentially) results in memory allocation and deallocation every loop iteration - so quality of implementation of the containers and their allocators is a key driver of performance. Before worrying about how to parallelise the loops, it would be worth optimising the memory reallocation (e.g. reserving a maximum possible size).– Peter
Jan 4 at 0:35
Thank you for the tips. I used the matrix instead of the vector of vectors along with the new code to speed up the function.
– Vikas M
Jan 7 at 0:24
Thank you for the tips. I used the matrix instead of the vector of vectors along with the new code to speed up the function.
– Vikas M
Jan 7 at 0:24
add a comment |
1 Answer
1
active
oldest
votes
Your CombinationKN function is way more expensive than it needs to be, if K is much smaller than N -- and if N is large then of course K is much smaller than N or you will run out of memory very quickly.
Notice that every valid index_row is a strictly monotonically increasing sequence of K integers less than N and vice-versa. It's easy enough to generate these directly:
void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {
std::vector<int> index_row;
// lexographically first valid row
for (int i=0; i<K; ++i) {
index_row.push_back(i);
}
for(;;) {
// output current row
indeces.push_back(index_row);
// increment index_row the the lexically next valid sequence
// find the right-most index we can increment
// This loop does O(1) amortized iterations if K is not large. O(K) worst case
int inc_index=K-1;
int index_limit=N-1;
while(inc_index >= 0 && index_row[inc_index] >= index_limit) {
--inc_index;
--index_limit;
}
if (inc_index < 0) {
break; //all done
}
// generate the lexically first valid row with matching prefix and
// larger value at inc_index
int val = index_row[inc_index]+1;
for (;inc_index<K; ++inc_index, ++val) {
index_row[inc_index] = val;
}
}
}
Also, if the only thing you're doing with these combinations is iterating through them, then there's no reason to waste the (possible very large amount of) memory required to store the whole list of them. The above function contains a procedure for generating the next one from the previous one when you need it.
Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.
– Vikas M
Jan 7 at 0:20
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54031528%2fhow-can-this-combination-algorithm-be-optimized%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Your CombinationKN function is way more expensive than it needs to be, if K is much smaller than N -- and if N is large then of course K is much smaller than N or you will run out of memory very quickly.
Notice that every valid index_row is a strictly monotonically increasing sequence of K integers less than N and vice-versa. It's easy enough to generate these directly:
void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {
std::vector<int> index_row;
// lexographically first valid row
for (int i=0; i<K; ++i) {
index_row.push_back(i);
}
for(;;) {
// output current row
indeces.push_back(index_row);
// increment index_row the the lexically next valid sequence
// find the right-most index we can increment
// This loop does O(1) amortized iterations if K is not large. O(K) worst case
int inc_index=K-1;
int index_limit=N-1;
while(inc_index >= 0 && index_row[inc_index] >= index_limit) {
--inc_index;
--index_limit;
}
if (inc_index < 0) {
break; //all done
}
// generate the lexically first valid row with matching prefix and
// larger value at inc_index
int val = index_row[inc_index]+1;
for (;inc_index<K; ++inc_index, ++val) {
index_row[inc_index] = val;
}
}
}
Also, if the only thing you're doing with these combinations is iterating through them, then there's no reason to waste the (possible very large amount of) memory required to store the whole list of them. The above function contains a procedure for generating the next one from the previous one when you need it.
Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.
– Vikas M
Jan 7 at 0:20
add a comment |
Your CombinationKN function is way more expensive than it needs to be, if K is much smaller than N -- and if N is large then of course K is much smaller than N or you will run out of memory very quickly.
Notice that every valid index_row is a strictly monotonically increasing sequence of K integers less than N and vice-versa. It's easy enough to generate these directly:
void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {
std::vector<int> index_row;
// lexographically first valid row
for (int i=0; i<K; ++i) {
index_row.push_back(i);
}
for(;;) {
// output current row
indeces.push_back(index_row);
// increment index_row the the lexically next valid sequence
// find the right-most index we can increment
// This loop does O(1) amortized iterations if K is not large. O(K) worst case
int inc_index=K-1;
int index_limit=N-1;
while(inc_index >= 0 && index_row[inc_index] >= index_limit) {
--inc_index;
--index_limit;
}
if (inc_index < 0) {
break; //all done
}
// generate the lexically first valid row with matching prefix and
// larger value at inc_index
int val = index_row[inc_index]+1;
for (;inc_index<K; ++inc_index, ++val) {
index_row[inc_index] = val;
}
}
}
Also, if the only thing you're doing with these combinations is iterating through them, then there's no reason to waste the (possible very large amount of) memory required to store the whole list of them. The above function contains a procedure for generating the next one from the previous one when you need it.
Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.
– Vikas M
Jan 7 at 0:20
add a comment |
Your CombinationKN function is way more expensive than it needs to be, if K is much smaller than N -- and if N is large then of course K is much smaller than N or you will run out of memory very quickly.
Notice that every valid index_row is a strictly monotonically increasing sequence of K integers less than N and vice-versa. It's easy enough to generate these directly:
void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {
std::vector<int> index_row;
// lexographically first valid row
for (int i=0; i<K; ++i) {
index_row.push_back(i);
}
for(;;) {
// output current row
indeces.push_back(index_row);
// increment index_row the the lexically next valid sequence
// find the right-most index we can increment
// This loop does O(1) amortized iterations if K is not large. O(K) worst case
int inc_index=K-1;
int index_limit=N-1;
while(inc_index >= 0 && index_row[inc_index] >= index_limit) {
--inc_index;
--index_limit;
}
if (inc_index < 0) {
break; //all done
}
// generate the lexically first valid row with matching prefix and
// larger value at inc_index
int val = index_row[inc_index]+1;
for (;inc_index<K; ++inc_index, ++val) {
index_row[inc_index] = val;
}
}
}
Also, if the only thing you're doing with these combinations is iterating through them, then there's no reason to waste the (possible very large amount of) memory required to store the whole list of them. The above function contains a procedure for generating the next one from the previous one when you need it.
Your CombinationKN function is way more expensive than it needs to be, if K is much smaller than N -- and if N is large then of course K is much smaller than N or you will run out of memory very quickly.
Notice that every valid index_row is a strictly monotonically increasing sequence of K integers less than N and vice-versa. It's easy enough to generate these directly:
void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {
std::vector<int> index_row;
// lexographically first valid row
for (int i=0; i<K; ++i) {
index_row.push_back(i);
}
for(;;) {
// output current row
indeces.push_back(index_row);
// increment index_row the the lexically next valid sequence
// find the right-most index we can increment
// This loop does O(1) amortized iterations if K is not large. O(K) worst case
int inc_index=K-1;
int index_limit=N-1;
while(inc_index >= 0 && index_row[inc_index] >= index_limit) {
--inc_index;
--index_limit;
}
if (inc_index < 0) {
break; //all done
}
// generate the lexically first valid row with matching prefix and
// larger value at inc_index
int val = index_row[inc_index]+1;
for (;inc_index<K; ++inc_index, ++val) {
index_row[inc_index] = val;
}
}
}
Also, if the only thing you're doing with these combinations is iterating through them, then there's no reason to waste the (possible very large amount of) memory required to store the whole list of them. The above function contains a procedure for generating the next one from the previous one when you need it.
edited Jan 4 at 4:59
answered Jan 4 at 4:46
Matt TimmermansMatt Timmermans
21.4k11734
21.4k11734
Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.
– Vikas M
Jan 7 at 0:20
add a comment |
Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.
– Vikas M
Jan 7 at 0:20
Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.
– Vikas M
Jan 7 at 0:20
Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.
– Vikas M
Jan 7 at 0:20
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54031528%2fhow-can-this-combination-algorithm-be-optimized%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Profile. Use a profiler to determine the code areas that take up the most time. Once the code is found, have compiler generate assembly listing. See if you can optimize the code better. Otherwise look at changing the design and data structures. Search the internet for "Data Driven Design" and "optimization for instruction cache" and "optimization for data cache". You may want to set the compiler's optimizations for highest and make a release build, then profile.
– Thomas Matthews
Jan 4 at 0:14
std::vector<std::vector<int>>can be surprisingly slow due to the lack of contiguous data. Often a singlevectormasquerading as 2D is a better option. It looks like you have something that works that you want to make work better. Ask your rubber duck if Code Review is right for you. And read their how to ask pages to make sure.– user4581301
Jan 4 at 0:16
One major consumption of time is branching. Use algebra or Boolean Algebra to reduce instructions. If your processor supports conditional compilation, you may want to redesign your algorithm to take advantage of that. Although processors may be able to fit small loops in their instruction cache, evaluation of a branch takes more time than advancing to the next instruction.
– Thomas Matthews
Jan 4 at 0:20
As implemented,
CombinationKN()resizes various containers multiple times, which (potentially) results in memory allocation and deallocation every loop iteration - so quality of implementation of the containers and their allocators is a key driver of performance. Before worrying about how to parallelise the loops, it would be worth optimising the memory reallocation (e.g. reserving a maximum possible size).– Peter
Jan 4 at 0:35
Thank you for the tips. I used the matrix instead of the vector of vectors along with the new code to speed up the function.
– Vikas M
Jan 7 at 0:24