How can this combination algorithm be optimized?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I am writing a molecular dynamics program that needs to take the atoms in a molecule and find the possible ways they can bond. To do this, I have a vector of Atom objects and I generate combination pairs using the following algorithm:

    void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {

        std::string bitmask(K, 1);

        bitmask.resize(N, 0);



        do {

            /* This loop takes forever with larger N values (approx. 3000) */

            std::vector<int> indexRow;



            for (int i = 0; i < N; i++)

            {

                if (bitmask[i]) indexRow.push_back(i);

            }



            indices.push_back(indexRow);

        } while (std::prev_permutation(bitmask.begin(), bitmask.end()));

    }

It is a simple N choose K algorithm (i.e. the indices returned could contain (1, 2) but not (2, 1)) where in my case N is the number of atoms in the molecule and K is 2.

I then call the algorithm like this:

void CalculateBondGraph(const std::vector<Atom *> &atoms, std::map<int, 

    std::map<int, double>> &bondGraph, ForceField *forceField) {

    int natoms = atoms.size();



    std::vector<std::vector<int>> indices;



    utils::CombinationKN(indices, 2, natoms);



    for (auto &v : indices) {

        int i = v[0];

        int j = v[1];



        /*... Check if atoms i and j are bonded based on their coordinates.*/

    }

}

The issue with this algorithm is that it takes forever to complete for large molecules that have 3000+ atoms. I have thought about parallelizing it (specifically with OpenMP), but even then, the work would have to be split among a few threads and it would still take a lot of time to complete. I need a way to optimize this algorithm so it doesn't take so long to compute combinations. Any help is appreciated.

Thank you,
Vikas

asked Jan 4 at 0:09

Vikas M

266

Profile. Use a profiler to determine the code areas that take up the most time. Once the code is found, have compiler generate assembly listing. See if you can optimize the code better. Otherwise look at changing the design and data structures. Search the internet for "Data Driven Design" and "optimization for instruction cache" and "optimization for data cache". You may want to set the compiler's optimizations for highest and make a release build, then profile.

– Thomas Matthews
Jan 4 at 0:14

std::vector<std::vector<int>> can be surprisingly slow due to the lack of contiguous data. Often a single vector masquerading as 2D is a better option. It looks like you have something that works that you want to make work better. Ask your rubber duck if Code Review is right for you. And read their how to ask pages to make sure.

– user4581301
Jan 4 at 0:16

One major consumption of time is branching. Use algebra or Boolean Algebra to reduce instructions. If your processor supports conditional compilation, you may want to redesign your algorithm to take advantage of that. Although processors may be able to fit small loops in their instruction cache, evaluation of a branch takes more time than advancing to the next instruction.

– Thomas Matthews
Jan 4 at 0:20

As implemented, CombinationKN() resizes various containers multiple times, which (potentially) results in memory allocation and deallocation every loop iteration - so quality of implementation of the containers and their allocators is a key driver of performance. Before worrying about how to parallelise the loops, it would be worth optimising the memory reallocation (e.g. reserving a maximum possible size).

– Peter
Jan 4 at 0:35

Thank you for the tips. I used the matrix instead of the vector of vectors along with the new code to speed up the function.

– Vikas M
Jan 7 at 0:24

add a comment |

    void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {

        std::string bitmask(K, 1);

        bitmask.resize(N, 0);



        do {

            /* This loop takes forever with larger N values (approx. 3000) */

            std::vector<int> indexRow;



            for (int i = 0; i < N; i++)

            {

                if (bitmask[i]) indexRow.push_back(i);

            }



            indices.push_back(indexRow);

        } while (std::prev_permutation(bitmask.begin(), bitmask.end()));

    }

It is a simple N choose K algorithm (i.e. the indices returned could contain (1, 2) but not (2, 1)) where in my case N is the number of atoms in the molecule and K is 2.

I then call the algorithm like this:

void CalculateBondGraph(const std::vector<Atom *> &atoms, std::map<int, 

    std::map<int, double>> &bondGraph, ForceField *forceField) {

    int natoms = atoms.size();



    std::vector<std::vector<int>> indices;



    utils::CombinationKN(indices, 2, natoms);



    for (auto &v : indices) {

        int i = v[0];

        int j = v[1];



        /*... Check if atoms i and j are bonded based on their coordinates.*/

    }

}

Thank you,
Vikas

asked Jan 4 at 0:09

Vikas M

266

Profile. Use a profiler to determine the code areas that take up the most time. Once the code is found, have compiler generate assembly listing. See if you can optimize the code better. Otherwise look at changing the design and data structures. Search the internet for "Data Driven Design" and "optimization for instruction cache" and "optimization for data cache". You may want to set the compiler's optimizations for highest and make a release build, then profile.

– Thomas Matthews
Jan 4 at 0:14

std::vector<std::vector<int>> can be surprisingly slow due to the lack of contiguous data. Often a single vector masquerading as 2D is a better option. It looks like you have something that works that you want to make work better. Ask your rubber duck if Code Review is right for you. And read their how to ask pages to make sure.

– user4581301
Jan 4 at 0:16

One major consumption of time is branching. Use algebra or Boolean Algebra to reduce instructions. If your processor supports conditional compilation, you may want to redesign your algorithm to take advantage of that. Although processors may be able to fit small loops in their instruction cache, evaluation of a branch takes more time than advancing to the next instruction.

– Thomas Matthews
Jan 4 at 0:20

As implemented, CombinationKN() resizes various containers multiple times, which (potentially) results in memory allocation and deallocation every loop iteration - so quality of implementation of the containers and their allocators is a key driver of performance. Before worrying about how to parallelise the loops, it would be worth optimising the memory reallocation (e.g. reserving a maximum possible size).

– Peter
Jan 4 at 0:35

Thank you for the tips. I used the matrix instead of the vector of vectors along with the new code to speed up the function.

– Vikas M
Jan 7 at 0:24

add a comment |

    void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {

        std::string bitmask(K, 1);

        bitmask.resize(N, 0);



        do {

            /* This loop takes forever with larger N values (approx. 3000) */

            std::vector<int> indexRow;



            for (int i = 0; i < N; i++)

            {

                if (bitmask[i]) indexRow.push_back(i);

            }



            indices.push_back(indexRow);

        } while (std::prev_permutation(bitmask.begin(), bitmask.end()));

    }

It is a simple N choose K algorithm (i.e. the indices returned could contain (1, 2) but not (2, 1)) where in my case N is the number of atoms in the molecule and K is 2.

I then call the algorithm like this:

void CalculateBondGraph(const std::vector<Atom *> &atoms, std::map<int, 

    std::map<int, double>> &bondGraph, ForceField *forceField) {

    int natoms = atoms.size();



    std::vector<std::vector<int>> indices;



    utils::CombinationKN(indices, 2, natoms);



    for (auto &v : indices) {

        int i = v[0];

        int j = v[1];



        /*... Check if atoms i and j are bonded based on their coordinates.*/

    }

}

Thank you,
Vikas

asked Jan 4 at 0:09

Vikas M

266

    void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {

        std::string bitmask(K, 1);

        bitmask.resize(N, 0);



        do {

            /* This loop takes forever with larger N values (approx. 3000) */

            std::vector<int> indexRow;



            for (int i = 0; i < N; i++)

            {

                if (bitmask[i]) indexRow.push_back(i);

            }



            indices.push_back(indexRow);

        } while (std::prev_permutation(bitmask.begin(), bitmask.end()));

    }

It is a simple N choose K algorithm (i.e. the indices returned could contain (1, 2) but not (2, 1)) where in my case N is the number of atoms in the molecule and K is 2.

I then call the algorithm like this:

void CalculateBondGraph(const std::vector<Atom *> &atoms, std::map<int, 

    std::map<int, double>> &bondGraph, ForceField *forceField) {

    int natoms = atoms.size();



    std::vector<std::vector<int>> indices;



    utils::CombinationKN(indices, 2, natoms);



    for (auto &v : indices) {

        int i = v[0];

        int j = v[1];



        /*... Check if atoms i and j are bonded based on their coordinates.*/

    }

}

Thank you,
Vikas

c++ algorithm parallel-processing

asked Jan 4 at 0:09

Vikas M

266

asked Jan 4 at 0:09

Vikas M

266

asked Jan 4 at 0:09

Vikas M

266

asked Jan 4 at 0:09

Vikas M

266

asked Jan 4 at 0:09

Vikas M

266

Profile. Use a profiler to determine the code areas that take up the most time. Once the code is found, have compiler generate assembly listing. See if you can optimize the code better. Otherwise look at changing the design and data structures. Search the internet for "Data Driven Design" and "optimization for instruction cache" and "optimization for data cache". You may want to set the compiler's optimizations for highest and make a release build, then profile.

– Thomas Matthews
Jan 4 at 0:14

std::vector<std::vector<int>> can be surprisingly slow due to the lack of contiguous data. Often a single vector masquerading as 2D is a better option. It looks like you have something that works that you want to make work better. Ask your rubber duck if Code Review is right for you. And read their how to ask pages to make sure.

– user4581301
Jan 4 at 0:16

One major consumption of time is branching. Use algebra or Boolean Algebra to reduce instructions. If your processor supports conditional compilation, you may want to redesign your algorithm to take advantage of that. Although processors may be able to fit small loops in their instruction cache, evaluation of a branch takes more time than advancing to the next instruction.

– Thomas Matthews
Jan 4 at 0:20

As implemented, CombinationKN() resizes various containers multiple times, which (potentially) results in memory allocation and deallocation every loop iteration - so quality of implementation of the containers and their allocators is a key driver of performance. Before worrying about how to parallelise the loops, it would be worth optimising the memory reallocation (e.g. reserving a maximum possible size).

– Peter
Jan 4 at 0:35

Thank you for the tips. I used the matrix instead of the vector of vectors along with the new code to speed up the function.

– Vikas M
Jan 7 at 0:24

add a comment |

Profile. Use a profiler to determine the code areas that take up the most time. Once the code is found, have compiler generate assembly listing. See if you can optimize the code better. Otherwise look at changing the design and data structures. Search the internet for "Data Driven Design" and "optimization for instruction cache" and "optimization for data cache". You may want to set the compiler's optimizations for highest and make a release build, then profile.

– Thomas Matthews
Jan 4 at 0:14

std::vector<std::vector<int>> can be surprisingly slow due to the lack of contiguous data. Often a single vector masquerading as 2D is a better option. It looks like you have something that works that you want to make work better. Ask your rubber duck if Code Review is right for you. And read their how to ask pages to make sure.

– user4581301
Jan 4 at 0:16

One major consumption of time is branching. Use algebra or Boolean Algebra to reduce instructions. If your processor supports conditional compilation, you may want to redesign your algorithm to take advantage of that. Although processors may be able to fit small loops in their instruction cache, evaluation of a branch takes more time than advancing to the next instruction.

– Thomas Matthews
Jan 4 at 0:20

As implemented, CombinationKN() resizes various containers multiple times, which (potentially) results in memory allocation and deallocation every loop iteration - so quality of implementation of the containers and their allocators is a key driver of performance. Before worrying about how to parallelise the loops, it would be worth optimising the memory reallocation (e.g. reserving a maximum possible size).

– Peter
Jan 4 at 0:35

Thank you for the tips. I used the matrix instead of the vector of vectors along with the new code to speed up the function.

– Vikas M
Jan 7 at 0:24

Profile. Use a profiler to determine the code areas that take up the most time. Once the code is found, have compiler generate assembly listing. See if you can optimize the code better. Otherwise look at changing the design and data structures. Search the internet for "Data Driven Design" and "optimization for instruction cache" and "optimization for data cache". You may want to set the compiler's optimizations for highest and make a release build, then profile.

– Thomas Matthews
Jan 4 at 0:14

std::vector<std::vector<int>> can be surprisingly slow due to the lack of contiguous data. Often a single vector masquerading as 2D is a better option. It looks like you have something that works that you want to make work better. Ask your rubber duck if Code Review is right for you. And read their how to ask pages to make sure.

– user4581301
Jan 4 at 0:16

One major consumption of time is branching. Use algebra or Boolean Algebra to reduce instructions. If your processor supports conditional compilation, you may want to redesign your algorithm to take advantage of that. Although processors may be able to fit small loops in their instruction cache, evaluation of a branch takes more time than advancing to the next instruction.

– Thomas Matthews
Jan 4 at 0:20

As implemented, CombinationKN() resizes various containers multiple times, which (potentially) results in memory allocation and deallocation every loop iteration - so quality of implementation of the containers and their allocators is a key driver of performance. Before worrying about how to parallelise the loops, it would be worth optimising the memory reallocation (e.g. reserving a maximum possible size).

– Peter
Jan 4 at 0:35

Thank you for the tips. I used the matrix instead of the vector of vectors along with the new code to speed up the function.

– Vikas M
Jan 7 at 0:24

add a comment |

1 Answer
1

active

oldest

votes

Your CombinationKN function is way more expensive than it needs to be, if K is much smaller than N -- and if N is large then of course K is much smaller than N or you will run out of memory very quickly.

Notice that every valid index_row is a strictly monotonically increasing sequence of K integers less than N and vice-versa. It's easy enough to generate these directly:

void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {

    std::vector<int> index_row;

    // lexographically first valid row

    for (int i=0; i<K; ++i) {

        index_row.push_back(i);

    }



    for(;;) {

        // output current row

        indeces.push_back(index_row);



        // increment index_row the the lexically next valid sequence

        // find the right-most index we can increment

        // This loop does O(1) amortized iterations if K is not large.  O(K) worst case

        int inc_index=K-1;

        int index_limit=N-1;

        while(inc_index >= 0 && index_row[inc_index] >= index_limit) {

          --inc_index;

          --index_limit;

        }

        if (inc_index < 0) {

            break; //all done

        }

        // generate the lexically first valid row with matching prefix and

        // larger value at inc_index

        int val = index_row[inc_index]+1;

        for (;inc_index<K; ++inc_index, ++val) {

            index_row[inc_index] = val;

        }

    }

}

Also, if the only thing you're doing with these combinations is iterating through them, then there's no reason to waste the (possible very large amount of) memory required to store the whole list of them. The above function contains a procedure for generating the next one from the previous one when you need it.

edited Jan 4 at 4:59

answered Jan 4 at 4:46

Matt Timmermans

21.4k11734

Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.

– Vikas M
Jan 7 at 0:20

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54031528%2fhow-can-this-combination-algorithm-be-optimized%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Notice that every valid index_row is a strictly monotonically increasing sequence of K integers less than N and vice-versa. It's easy enough to generate these directly:

void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {

    std::vector<int> index_row;

    // lexographically first valid row

    for (int i=0; i<K; ++i) {

        index_row.push_back(i);

    }



    for(;;) {

        // output current row

        indeces.push_back(index_row);



        // increment index_row the the lexically next valid sequence

        // find the right-most index we can increment

        // This loop does O(1) amortized iterations if K is not large.  O(K) worst case

        int inc_index=K-1;

        int index_limit=N-1;

        while(inc_index >= 0 && index_row[inc_index] >= index_limit) {

          --inc_index;

          --index_limit;

        }

        if (inc_index < 0) {

            break; //all done

        }

        // generate the lexically first valid row with matching prefix and

        // larger value at inc_index

        int val = index_row[inc_index]+1;

        for (;inc_index<K; ++inc_index, ++val) {

            index_row[inc_index] = val;

        }

    }

}

edited Jan 4 at 4:59

answered Jan 4 at 4:46

Matt Timmermans

21.4k11734

Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.

– Vikas M
Jan 7 at 0:20

add a comment |

Notice that every valid index_row is a strictly monotonically increasing sequence of K integers less than N and vice-versa. It's easy enough to generate these directly:

void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {

    std::vector<int> index_row;

    // lexographically first valid row

    for (int i=0; i<K; ++i) {

        index_row.push_back(i);

    }



    for(;;) {

        // output current row

        indeces.push_back(index_row);



        // increment index_row the the lexically next valid sequence

        // find the right-most index we can increment

        // This loop does O(1) amortized iterations if K is not large.  O(K) worst case

        int inc_index=K-1;

        int index_limit=N-1;

        while(inc_index >= 0 && index_row[inc_index] >= index_limit) {

          --inc_index;

          --index_limit;

        }

        if (inc_index < 0) {

            break; //all done

        }

        // generate the lexically first valid row with matching prefix and

        // larger value at inc_index

        int val = index_row[inc_index]+1;

        for (;inc_index<K; ++inc_index, ++val) {

            index_row[inc_index] = val;

        }

    }

}

edited Jan 4 at 4:59

answered Jan 4 at 4:46

Matt Timmermans

21.4k11734

Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.

– Vikas M
Jan 7 at 0:20

add a comment |

Notice that every valid index_row is a strictly monotonically increasing sequence of K integers less than N and vice-versa. It's easy enough to generate these directly:

void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {

    std::vector<int> index_row;

    // lexographically first valid row

    for (int i=0; i<K; ++i) {

        index_row.push_back(i);

    }



    for(;;) {

        // output current row

        indeces.push_back(index_row);



        // increment index_row the the lexically next valid sequence

        // find the right-most index we can increment

        // This loop does O(1) amortized iterations if K is not large.  O(K) worst case

        int inc_index=K-1;

        int index_limit=N-1;

        while(inc_index >= 0 && index_row[inc_index] >= index_limit) {

          --inc_index;

          --index_limit;

        }

        if (inc_index < 0) {

            break; //all done

        }

        // generate the lexically first valid row with matching prefix and

        // larger value at inc_index

        int val = index_row[inc_index]+1;

        for (;inc_index<K; ++inc_index, ++val) {

            index_row[inc_index] = val;

        }

    }

}

edited Jan 4 at 4:59

answered Jan 4 at 4:46

Matt Timmermans

21.4k11734

Notice that every valid index_row is a strictly monotonically increasing sequence of K integers less than N and vice-versa. It's easy enough to generate these directly:

void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {

    std::vector<int> index_row;

    // lexographically first valid row

    for (int i=0; i<K; ++i) {

        index_row.push_back(i);

    }



    for(;;) {

        // output current row

        indeces.push_back(index_row);



        // increment index_row the the lexically next valid sequence

        // find the right-most index we can increment

        // This loop does O(1) amortized iterations if K is not large.  O(K) worst case

        int inc_index=K-1;

        int index_limit=N-1;

        while(inc_index >= 0 && index_row[inc_index] >= index_limit) {

          --inc_index;

          --index_limit;

        }

        if (inc_index < 0) {

            break; //all done

        }

        // generate the lexically first valid row with matching prefix and

        // larger value at inc_index

        int val = index_row[inc_index]+1;

        for (;inc_index<K; ++inc_index, ++val) {

            index_row[inc_index] = val;

        }

    }

}

edited Jan 4 at 4:59

answered Jan 4 at 4:46

Matt Timmermans

21.4k11734

edited Jan 4 at 4:59

answered Jan 4 at 4:46

Matt Timmermans

21.4k11734

answered Jan 4 at 4:46

Matt Timmermans

21.4k11734

answered Jan 4 at 4:46

Matt Timmermans

21.4k11734

Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.

– Vikas M
Jan 7 at 0:20

add a comment |

Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.

– Vikas M
Jan 7 at 0:20

Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier.

– Vikas M
Jan 7 at 0:20

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk