How to properly implement multiprocessing in an application with 4 different functions?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















Have a look at the code snippet below. In the main function I instantiate an array jobs. Projects is an array containing multiple project objects. Those project objects also contain multiple target objects. For each target I want to execute four different functions. For this I start a Process pointing to the function run. I append the Process to the array and start it. Current piece of code will produce zombie processes which I try to avoid.



def main():
jobs =
for project in projects:
for target in project.getTargets():
p = multiprocessing.Process(target=run, args=(target.getX(),
target.getY(),))
jobs.append(p)
p.start()

for job in jobs:
job.join()

def run(x, y):
a(x, y)
b(x, y)
c(x, y)
d(x, y)


The goal is to handle approx. five targets in parallel and then use a mechanism such as FIFO to handle a new target once a another target has finished.










share|improve this question




















  • 2





    Your code is somewhat confusing. You don't seem to be using target. Is it suposed to be split into x and y? The return values of the functions a upto d aren't used. What's the point of calling them? Normally I would suggest using a multiprocessing.Pool to apply a function to an iterable of values in parallel, but I'm not sure how that would fit here.

    – Roland Smith
    Jan 3 at 21:13











  • I edited the code sample to show how target is being used. The purpose of the functions a to d are not relevant in this case I guess. You only should know I execute a bash command inside these functions.

    – ssd
    Jan 3 at 21:18











  • Apparently this is just pseudo code for some hypothetical question? 9 times out of 10 in multitasking the problem is how to split tasks and targets properly. Some function needs a lot of ram, another waits 90% of time for hard disk reads and one is busy calculating floats - or waiting input from other function. In general, start with least amount of tasks and targets and test in practise what happens and how/where the time is spent when running a certain task. Does more CPU help, or will one thread do as your ram is full or HD slow?

    – Stacking For Heap
    Jan 3 at 21:24


















0















Have a look at the code snippet below. In the main function I instantiate an array jobs. Projects is an array containing multiple project objects. Those project objects also contain multiple target objects. For each target I want to execute four different functions. For this I start a Process pointing to the function run. I append the Process to the array and start it. Current piece of code will produce zombie processes which I try to avoid.



def main():
jobs =
for project in projects:
for target in project.getTargets():
p = multiprocessing.Process(target=run, args=(target.getX(),
target.getY(),))
jobs.append(p)
p.start()

for job in jobs:
job.join()

def run(x, y):
a(x, y)
b(x, y)
c(x, y)
d(x, y)


The goal is to handle approx. five targets in parallel and then use a mechanism such as FIFO to handle a new target once a another target has finished.










share|improve this question




















  • 2





    Your code is somewhat confusing. You don't seem to be using target. Is it suposed to be split into x and y? The return values of the functions a upto d aren't used. What's the point of calling them? Normally I would suggest using a multiprocessing.Pool to apply a function to an iterable of values in parallel, but I'm not sure how that would fit here.

    – Roland Smith
    Jan 3 at 21:13











  • I edited the code sample to show how target is being used. The purpose of the functions a to d are not relevant in this case I guess. You only should know I execute a bash command inside these functions.

    – ssd
    Jan 3 at 21:18











  • Apparently this is just pseudo code for some hypothetical question? 9 times out of 10 in multitasking the problem is how to split tasks and targets properly. Some function needs a lot of ram, another waits 90% of time for hard disk reads and one is busy calculating floats - or waiting input from other function. In general, start with least amount of tasks and targets and test in practise what happens and how/where the time is spent when running a certain task. Does more CPU help, or will one thread do as your ram is full or HD slow?

    – Stacking For Heap
    Jan 3 at 21:24














0












0








0








Have a look at the code snippet below. In the main function I instantiate an array jobs. Projects is an array containing multiple project objects. Those project objects also contain multiple target objects. For each target I want to execute four different functions. For this I start a Process pointing to the function run. I append the Process to the array and start it. Current piece of code will produce zombie processes which I try to avoid.



def main():
jobs =
for project in projects:
for target in project.getTargets():
p = multiprocessing.Process(target=run, args=(target.getX(),
target.getY(),))
jobs.append(p)
p.start()

for job in jobs:
job.join()

def run(x, y):
a(x, y)
b(x, y)
c(x, y)
d(x, y)


The goal is to handle approx. five targets in parallel and then use a mechanism such as FIFO to handle a new target once a another target has finished.










share|improve this question
















Have a look at the code snippet below. In the main function I instantiate an array jobs. Projects is an array containing multiple project objects. Those project objects also contain multiple target objects. For each target I want to execute four different functions. For this I start a Process pointing to the function run. I append the Process to the array and start it. Current piece of code will produce zombie processes which I try to avoid.



def main():
jobs =
for project in projects:
for target in project.getTargets():
p = multiprocessing.Process(target=run, args=(target.getX(),
target.getY(),))
jobs.append(p)
p.start()

for job in jobs:
job.join()

def run(x, y):
a(x, y)
b(x, y)
c(x, y)
d(x, y)


The goal is to handle approx. five targets in parallel and then use a mechanism such as FIFO to handle a new target once a another target has finished.







python python-2.7 python-multiprocessing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 3 at 21:25









martineau

70.2k1092186




70.2k1092186










asked Jan 3 at 21:03









ssdssd

11




11








  • 2





    Your code is somewhat confusing. You don't seem to be using target. Is it suposed to be split into x and y? The return values of the functions a upto d aren't used. What's the point of calling them? Normally I would suggest using a multiprocessing.Pool to apply a function to an iterable of values in parallel, but I'm not sure how that would fit here.

    – Roland Smith
    Jan 3 at 21:13











  • I edited the code sample to show how target is being used. The purpose of the functions a to d are not relevant in this case I guess. You only should know I execute a bash command inside these functions.

    – ssd
    Jan 3 at 21:18











  • Apparently this is just pseudo code for some hypothetical question? 9 times out of 10 in multitasking the problem is how to split tasks and targets properly. Some function needs a lot of ram, another waits 90% of time for hard disk reads and one is busy calculating floats - or waiting input from other function. In general, start with least amount of tasks and targets and test in practise what happens and how/where the time is spent when running a certain task. Does more CPU help, or will one thread do as your ram is full or HD slow?

    – Stacking For Heap
    Jan 3 at 21:24














  • 2





    Your code is somewhat confusing. You don't seem to be using target. Is it suposed to be split into x and y? The return values of the functions a upto d aren't used. What's the point of calling them? Normally I would suggest using a multiprocessing.Pool to apply a function to an iterable of values in parallel, but I'm not sure how that would fit here.

    – Roland Smith
    Jan 3 at 21:13











  • I edited the code sample to show how target is being used. The purpose of the functions a to d are not relevant in this case I guess. You only should know I execute a bash command inside these functions.

    – ssd
    Jan 3 at 21:18











  • Apparently this is just pseudo code for some hypothetical question? 9 times out of 10 in multitasking the problem is how to split tasks and targets properly. Some function needs a lot of ram, another waits 90% of time for hard disk reads and one is busy calculating floats - or waiting input from other function. In general, start with least amount of tasks and targets and test in practise what happens and how/where the time is spent when running a certain task. Does more CPU help, or will one thread do as your ram is full or HD slow?

    – Stacking For Heap
    Jan 3 at 21:24








2




2





Your code is somewhat confusing. You don't seem to be using target. Is it suposed to be split into x and y? The return values of the functions a upto d aren't used. What's the point of calling them? Normally I would suggest using a multiprocessing.Pool to apply a function to an iterable of values in parallel, but I'm not sure how that would fit here.

– Roland Smith
Jan 3 at 21:13





Your code is somewhat confusing. You don't seem to be using target. Is it suposed to be split into x and y? The return values of the functions a upto d aren't used. What's the point of calling them? Normally I would suggest using a multiprocessing.Pool to apply a function to an iterable of values in parallel, but I'm not sure how that would fit here.

– Roland Smith
Jan 3 at 21:13













I edited the code sample to show how target is being used. The purpose of the functions a to d are not relevant in this case I guess. You only should know I execute a bash command inside these functions.

– ssd
Jan 3 at 21:18





I edited the code sample to show how target is being used. The purpose of the functions a to d are not relevant in this case I guess. You only should know I execute a bash command inside these functions.

– ssd
Jan 3 at 21:18













Apparently this is just pseudo code for some hypothetical question? 9 times out of 10 in multitasking the problem is how to split tasks and targets properly. Some function needs a lot of ram, another waits 90% of time for hard disk reads and one is busy calculating floats - or waiting input from other function. In general, start with least amount of tasks and targets and test in practise what happens and how/where the time is spent when running a certain task. Does more CPU help, or will one thread do as your ram is full or HD slow?

– Stacking For Heap
Jan 3 at 21:24





Apparently this is just pseudo code for some hypothetical question? 9 times out of 10 in multitasking the problem is how to split tasks and targets properly. Some function needs a lot of ram, another waits 90% of time for hard disk reads and one is busy calculating floats - or waiting input from other function. In general, start with least amount of tasks and targets and test in practise what happens and how/where the time is spent when running a certain task. Does more CPU help, or will one thread do as your ram is full or HD slow?

– Stacking For Heap
Jan 3 at 21:24












2 Answers
2






active

oldest

votes


















0














Pass processing function as part your collection over which you iterate:



from multiprocessing import Pool

def fun(*args)
proc, p, q = args
return proc(p, q)

data = [(f, x, y) for f in (a, b, c, d)]
pool = Pool(4)
results = pool.map(fun, data)





share|improve this answer































    0














    From you comment I gather that you are calling bash in the functions a .. d. I would suggest not to do that.




    1. If you are doing calculations in bash, that's beter done in Python.

    2. If you use bash to start programs, that is also better done directly in Python.


    If you can use python 3, I would recommend to use concurrent.futures.ThreadPoolExecutor to have a bunch of threads iterate over your data. In each thread, you can then use the subprocess module to start external programs.
    My dicom2jpg.py script is an example how to do this. It runs ImageMagick's convert program in parallel to convert DICOM x-ray images into PNG format.



    If you need to use Python 2.7, then I would make a list of subprocesses (by calling subprocess.Popen). Continuously iterate over this list and check if a subprocess has finished. If so, remove it from the list. If you have not run out of tasks, start a new subprocess and append it to the list. The list should have as many subprocesses as your machine has cores. More is generally not useful.
    This approach is shown in an older version of dicom2png.py.






    share|improve this answer
























    • Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

      – ssd
      Jan 6 at 17:22











    • @ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

      – Roland Smith
      Jan 6 at 18:45














    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54029774%2fhow-to-properly-implement-multiprocessing-in-an-application-with-4-different-fun%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Pass processing function as part your collection over which you iterate:



    from multiprocessing import Pool

    def fun(*args)
    proc, p, q = args
    return proc(p, q)

    data = [(f, x, y) for f in (a, b, c, d)]
    pool = Pool(4)
    results = pool.map(fun, data)





    share|improve this answer




























      0














      Pass processing function as part your collection over which you iterate:



      from multiprocessing import Pool

      def fun(*args)
      proc, p, q = args
      return proc(p, q)

      data = [(f, x, y) for f in (a, b, c, d)]
      pool = Pool(4)
      results = pool.map(fun, data)





      share|improve this answer


























        0












        0








        0







        Pass processing function as part your collection over which you iterate:



        from multiprocessing import Pool

        def fun(*args)
        proc, p, q = args
        return proc(p, q)

        data = [(f, x, y) for f in (a, b, c, d)]
        pool = Pool(4)
        results = pool.map(fun, data)





        share|improve this answer













        Pass processing function as part your collection over which you iterate:



        from multiprocessing import Pool

        def fun(*args)
        proc, p, q = args
        return proc(p, q)

        data = [(f, x, y) for f in (a, b, c, d)]
        pool = Pool(4)
        results = pool.map(fun, data)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 3 at 21:27









        scrutariscrutari

        4501715




        4501715

























            0














            From you comment I gather that you are calling bash in the functions a .. d. I would suggest not to do that.




            1. If you are doing calculations in bash, that's beter done in Python.

            2. If you use bash to start programs, that is also better done directly in Python.


            If you can use python 3, I would recommend to use concurrent.futures.ThreadPoolExecutor to have a bunch of threads iterate over your data. In each thread, you can then use the subprocess module to start external programs.
            My dicom2jpg.py script is an example how to do this. It runs ImageMagick's convert program in parallel to convert DICOM x-ray images into PNG format.



            If you need to use Python 2.7, then I would make a list of subprocesses (by calling subprocess.Popen). Continuously iterate over this list and check if a subprocess has finished. If so, remove it from the list. If you have not run out of tasks, start a new subprocess and append it to the list. The list should have as many subprocesses as your machine has cores. More is generally not useful.
            This approach is shown in an older version of dicom2png.py.






            share|improve this answer
























            • Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

              – ssd
              Jan 6 at 17:22











            • @ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

              – Roland Smith
              Jan 6 at 18:45


















            0














            From you comment I gather that you are calling bash in the functions a .. d. I would suggest not to do that.




            1. If you are doing calculations in bash, that's beter done in Python.

            2. If you use bash to start programs, that is also better done directly in Python.


            If you can use python 3, I would recommend to use concurrent.futures.ThreadPoolExecutor to have a bunch of threads iterate over your data. In each thread, you can then use the subprocess module to start external programs.
            My dicom2jpg.py script is an example how to do this. It runs ImageMagick's convert program in parallel to convert DICOM x-ray images into PNG format.



            If you need to use Python 2.7, then I would make a list of subprocesses (by calling subprocess.Popen). Continuously iterate over this list and check if a subprocess has finished. If so, remove it from the list. If you have not run out of tasks, start a new subprocess and append it to the list. The list should have as many subprocesses as your machine has cores. More is generally not useful.
            This approach is shown in an older version of dicom2png.py.






            share|improve this answer
























            • Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

              – ssd
              Jan 6 at 17:22











            • @ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

              – Roland Smith
              Jan 6 at 18:45
















            0












            0








            0







            From you comment I gather that you are calling bash in the functions a .. d. I would suggest not to do that.




            1. If you are doing calculations in bash, that's beter done in Python.

            2. If you use bash to start programs, that is also better done directly in Python.


            If you can use python 3, I would recommend to use concurrent.futures.ThreadPoolExecutor to have a bunch of threads iterate over your data. In each thread, you can then use the subprocess module to start external programs.
            My dicom2jpg.py script is an example how to do this. It runs ImageMagick's convert program in parallel to convert DICOM x-ray images into PNG format.



            If you need to use Python 2.7, then I would make a list of subprocesses (by calling subprocess.Popen). Continuously iterate over this list and check if a subprocess has finished. If so, remove it from the list. If you have not run out of tasks, start a new subprocess and append it to the list. The list should have as many subprocesses as your machine has cores. More is generally not useful.
            This approach is shown in an older version of dicom2png.py.






            share|improve this answer













            From you comment I gather that you are calling bash in the functions a .. d. I would suggest not to do that.




            1. If you are doing calculations in bash, that's beter done in Python.

            2. If you use bash to start programs, that is also better done directly in Python.


            If you can use python 3, I would recommend to use concurrent.futures.ThreadPoolExecutor to have a bunch of threads iterate over your data. In each thread, you can then use the subprocess module to start external programs.
            My dicom2jpg.py script is an example how to do this. It runs ImageMagick's convert program in parallel to convert DICOM x-ray images into PNG format.



            If you need to use Python 2.7, then I would make a list of subprocesses (by calling subprocess.Popen). Continuously iterate over this list and check if a subprocess has finished. If so, remove it from the list. If you have not run out of tasks, start a new subprocess and append it to the list. The list should have as many subprocesses as your machine has cores. More is generally not useful.
            This approach is shown in an older version of dicom2png.py.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jan 3 at 22:04









            Roland SmithRoland Smith

            27k33256




            27k33256













            • Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

              – ssd
              Jan 6 at 17:22











            • @ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

              – Roland Smith
              Jan 6 at 18:45





















            • Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

              – ssd
              Jan 6 at 17:22











            • @ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

              – Roland Smith
              Jan 6 at 18:45



















            Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

            – ssd
            Jan 6 at 17:22





            Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

            – ssd
            Jan 6 at 17:22













            @ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

            – Roland Smith
            Jan 6 at 18:45







            @ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

            – Roland Smith
            Jan 6 at 18:45




















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54029774%2fhow-to-properly-implement-multiprocessing-in-an-application-with-4-different-fun%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Mossoró

            Error while reading .h5 file using the rhdf5 package in R

            Pushsharp Apns notification error: 'InvalidToken'