How to properly implement multiprocessing in an application with 4 different functions?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

Have a look at the code snippet below. In the main function I instantiate an array jobs. Projects is an array containing multiple project objects. Those project objects also contain multiple target objects. For each target I want to execute four different functions. For this I start a Process pointing to the function run. I append the Process to the array and start it. Current piece of code will produce zombie processes which I try to avoid.

def main():

    jobs = 

    for project in projects:

        for target in project.getTargets():

            p = multiprocessing.Process(target=run, args=(target.getX(),  

                                                          target.getY(),))

            jobs.append(p)

            p.start()



        for job in jobs:

            job.join()



def run(x, y):

    a(x, y)

    b(x, y)

    c(x, y)

    d(x, y)

The goal is to handle approx. five targets in parallel and then use a mechanism such as FIFO to handle a new target once a another target has finished.

edited Jan 3 at 21:25

martineau

70.2k1092186

asked Jan 3 at 21:03

ssd

2

Your code is somewhat confusing. You don't seem to be using target. Is it suposed to be split into x and y? The return values of the functions a upto d aren't used. What's the point of calling them? Normally I would suggest using a multiprocessing.Pool to apply a function to an iterable of values in parallel, but I'm not sure how that would fit here.

– Roland Smith
Jan 3 at 21:13

I edited the code sample to show how target is being used. The purpose of the functions a to d are not relevant in this case I guess. You only should know I execute a bash command inside these functions.

– ssd
Jan 3 at 21:18

Apparently this is just pseudo code for some hypothetical question? 9 times out of 10 in multitasking the problem is how to split tasks and targets properly. Some function needs a lot of ram, another waits 90% of time for hard disk reads and one is busy calculating floats - or waiting input from other function. In general, start with least amount of tasks and targets and test in practise what happens and how/where the time is spent when running a certain task. Does more CPU help, or will one thread do as your ram is full or HD slow?

– Stacking For Heap
Jan 3 at 21:24

add a comment |

def main():

    jobs = 

    for project in projects:

        for target in project.getTargets():

            p = multiprocessing.Process(target=run, args=(target.getX(),  

                                                          target.getY(),))

            jobs.append(p)

            p.start()



        for job in jobs:

            job.join()



def run(x, y):

    a(x, y)

    b(x, y)

    c(x, y)

    d(x, y)

The goal is to handle approx. five targets in parallel and then use a mechanism such as FIFO to handle a new target once a another target has finished.

edited Jan 3 at 21:25

martineau

70.2k1092186

asked Jan 3 at 21:03

ssd

2

Your code is somewhat confusing. You don't seem to be using target. Is it suposed to be split into x and y? The return values of the functions a upto d aren't used. What's the point of calling them? Normally I would suggest using a multiprocessing.Pool to apply a function to an iterable of values in parallel, but I'm not sure how that would fit here.

– Roland Smith
Jan 3 at 21:13

I edited the code sample to show how target is being used. The purpose of the functions a to d are not relevant in this case I guess. You only should know I execute a bash command inside these functions.

– ssd
Jan 3 at 21:18

Apparently this is just pseudo code for some hypothetical question? 9 times out of 10 in multitasking the problem is how to split tasks and targets properly. Some function needs a lot of ram, another waits 90% of time for hard disk reads and one is busy calculating floats - or waiting input from other function. In general, start with least amount of tasks and targets and test in practise what happens and how/where the time is spent when running a certain task. Does more CPU help, or will one thread do as your ram is full or HD slow?

– Stacking For Heap
Jan 3 at 21:24

add a comment |

def main():

    jobs = 

    for project in projects:

        for target in project.getTargets():

            p = multiprocessing.Process(target=run, args=(target.getX(),  

                                                          target.getY(),))

            jobs.append(p)

            p.start()



        for job in jobs:

            job.join()



def run(x, y):

    a(x, y)

    b(x, y)

    c(x, y)

    d(x, y)

The goal is to handle approx. five targets in parallel and then use a mechanism such as FIFO to handle a new target once a another target has finished.

edited Jan 3 at 21:25

martineau

70.2k1092186

asked Jan 3 at 21:03

ssd

def main():

    jobs = 

    for project in projects:

        for target in project.getTargets():

            p = multiprocessing.Process(target=run, args=(target.getX(),  

                                                          target.getY(),))

            jobs.append(p)

            p.start()



        for job in jobs:

            job.join()



def run(x, y):

    a(x, y)

    b(x, y)

    c(x, y)

    d(x, y)

The goal is to handle approx. five targets in parallel and then use a mechanism such as FIFO to handle a new target once a another target has finished.

python python-2.7 python-multiprocessing

edited Jan 3 at 21:25

martineau

70.2k1092186

asked Jan 3 at 21:03

ssd

edited Jan 3 at 21:25

martineau

70.2k1092186

asked Jan 3 at 21:03

ssd

edited Jan 3 at 21:25

martineau

70.2k1092186

edited Jan 3 at 21:25

martineau

70.2k1092186

edited Jan 3 at 21:25

martineau

70.2k1092186

asked Jan 3 at 21:03

ssd

asked Jan 3 at 21:03

ssd

asked Jan 3 at 21:03

ssd

2

Your code is somewhat confusing. You don't seem to be using target. Is it suposed to be split into x and y? The return values of the functions a upto d aren't used. What's the point of calling them? Normally I would suggest using a multiprocessing.Pool to apply a function to an iterable of values in parallel, but I'm not sure how that would fit here.

– Roland Smith
Jan 3 at 21:13

I edited the code sample to show how target is being used. The purpose of the functions a to d are not relevant in this case I guess. You only should know I execute a bash command inside these functions.

– ssd
Jan 3 at 21:18

Apparently this is just pseudo code for some hypothetical question? 9 times out of 10 in multitasking the problem is how to split tasks and targets properly. Some function needs a lot of ram, another waits 90% of time for hard disk reads and one is busy calculating floats - or waiting input from other function. In general, start with least amount of tasks and targets and test in practise what happens and how/where the time is spent when running a certain task. Does more CPU help, or will one thread do as your ram is full or HD slow?

– Stacking For Heap
Jan 3 at 21:24

add a comment |

2

Your code is somewhat confusing. You don't seem to be using target. Is it suposed to be split into x and y? The return values of the functions a upto d aren't used. What's the point of calling them? Normally I would suggest using a multiprocessing.Pool to apply a function to an iterable of values in parallel, but I'm not sure how that would fit here.

– Roland Smith
Jan 3 at 21:13

I edited the code sample to show how target is being used. The purpose of the functions a to d are not relevant in this case I guess. You only should know I execute a bash command inside these functions.

– ssd
Jan 3 at 21:18

Apparently this is just pseudo code for some hypothetical question? 9 times out of 10 in multitasking the problem is how to split tasks and targets properly. Some function needs a lot of ram, another waits 90% of time for hard disk reads and one is busy calculating floats - or waiting input from other function. In general, start with least amount of tasks and targets and test in practise what happens and how/where the time is spent when running a certain task. Does more CPU help, or will one thread do as your ram is full or HD slow?

– Stacking For Heap
Jan 3 at 21:24

Your code is somewhat confusing. You don't seem to be using target. Is it suposed to be split into x and y? The return values of the functions a upto d aren't used. What's the point of calling them? Normally I would suggest using a multiprocessing.Pool to apply a function to an iterable of values in parallel, but I'm not sure how that would fit here.

– Roland Smith
Jan 3 at 21:13

I edited the code sample to show how target is being used. The purpose of the functions a to d are not relevant in this case I guess. You only should know I execute a bash command inside these functions.

– ssd
Jan 3 at 21:18

Apparently this is just pseudo code for some hypothetical question? 9 times out of 10 in multitasking the problem is how to split tasks and targets properly. Some function needs a lot of ram, another waits 90% of time for hard disk reads and one is busy calculating floats - or waiting input from other function. In general, start with least amount of tasks and targets and test in practise what happens and how/where the time is spent when running a certain task. Does more CPU help, or will one thread do as your ram is full or HD slow?

– Stacking For Heap
Jan 3 at 21:24

add a comment |

2 Answers
2

active

oldest

votes

Pass processing function as part your collection over which you iterate:

from multiprocessing import Pool



def fun(*args)

    proc, p, q = args

    return proc(p, q)



data = [(f, x, y) for f in (a, b, c, d)]

pool = Pool(4)

results = pool.map(fun, data)

answered Jan 3 at 21:27

scrutari

4501715

add a comment |

From you comment I gather that you are calling bash in the functions a .. d. I would suggest not to do that.

If you are doing calculations in bash, that's beter done in Python.

If you use bash to start programs, that is also better done directly in Python.

If you can use python 3, I would recommend to use concurrent.futures.ThreadPoolExecutor to have a bunch of threads iterate over your data. In each thread, you can then use the subprocess module to start external programs.
My dicom2jpg.py script is an example how to do this. It runs ImageMagick's convert program in parallel to convert DICOM x-ray images into PNG format.

If you need to use Python 2.7, then I would make a list of subprocesses (by calling subprocess.Popen). Continuously iterate over this list and check if a subprocess has finished. If so, remove it from the list. If you have not run out of tasks, start a new subprocess and append it to the list. The list should have as many subprocesses as your machine has cores. More is generally not useful.
This approach is shown in an older version of dicom2png.py.

answered Jan 3 at 22:04

Roland Smith

27k33256

Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

– ssd
Jan 6 at 17:22

@ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

– Roland Smith
Jan 6 at 18:45

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54029774%2fhow-to-properly-implement-multiprocessing-in-an-application-with-4-different-fun%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Pass processing function as part your collection over which you iterate:

from multiprocessing import Pool



def fun(*args)

    proc, p, q = args

    return proc(p, q)



data = [(f, x, y) for f in (a, b, c, d)]

pool = Pool(4)

results = pool.map(fun, data)

answered Jan 3 at 21:27

scrutari

4501715

add a comment |

Pass processing function as part your collection over which you iterate:

from multiprocessing import Pool



def fun(*args)

    proc, p, q = args

    return proc(p, q)



data = [(f, x, y) for f in (a, b, c, d)]

pool = Pool(4)

results = pool.map(fun, data)

answered Jan 3 at 21:27

scrutari

4501715

add a comment |

Pass processing function as part your collection over which you iterate:

from multiprocessing import Pool



def fun(*args)

    proc, p, q = args

    return proc(p, q)



data = [(f, x, y) for f in (a, b, c, d)]

pool = Pool(4)

results = pool.map(fun, data)

answered Jan 3 at 21:27

scrutari

4501715

Pass processing function as part your collection over which you iterate:

from multiprocessing import Pool



def fun(*args)

    proc, p, q = args

    return proc(p, q)



data = [(f, x, y) for f in (a, b, c, d)]

pool = Pool(4)

results = pool.map(fun, data)

answered Jan 3 at 21:27

scrutari

4501715

answered Jan 3 at 21:27

scrutari

4501715

answered Jan 3 at 21:27

scrutari

4501715

answered Jan 3 at 21:27

scrutari

4501715

add a comment |

From you comment I gather that you are calling bash in the functions a .. d. I would suggest not to do that.

If you are doing calculations in bash, that's beter done in Python.

If you use bash to start programs, that is also better done directly in Python.

answered Jan 3 at 22:04

Roland Smith

27k33256

Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

– ssd
Jan 6 at 17:22

@ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

– Roland Smith
Jan 6 at 18:45

add a comment |

From you comment I gather that you are calling bash in the functions a .. d. I would suggest not to do that.

If you are doing calculations in bash, that's beter done in Python.

If you use bash to start programs, that is also better done directly in Python.

answered Jan 3 at 22:04

Roland Smith

27k33256

Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

– ssd
Jan 6 at 17:22

@ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

– Roland Smith
Jan 6 at 18:45

add a comment |

From you comment I gather that you are calling bash in the functions a .. d. I would suggest not to do that.

If you are doing calculations in bash, that's beter done in Python.

If you use bash to start programs, that is also better done directly in Python.

answered Jan 3 at 22:04

Roland Smith

27k33256

From you comment I gather that you are calling bash in the functions a .. d. I would suggest not to do that.

If you are doing calculations in bash, that's beter done in Python.

If you use bash to start programs, that is also better done directly in Python.

answered Jan 3 at 22:04

Roland Smith

27k33256

answered Jan 3 at 22:04

Roland Smith

27k33256

answered Jan 3 at 22:04

Roland Smith

27k33256

answered Jan 3 at 22:04

Roland Smith

27k33256

Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

– ssd
Jan 6 at 17:22

@ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

– Roland Smith
Jan 6 at 18:45

add a comment |

Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

– ssd
Jan 6 at 17:22

@ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

– Roland Smith
Jan 6 at 18:45

Interesting! But how would you handle this if you have like three other functions next to startconvert? I'm talking about Python 2.7.

– ssd
Jan 6 at 17:22

@ssd As long as those other functions can have the same interface as startconvert, that is they take one filename argument and return a (str, Process) tuple, it would be OK. They would fit into the manageprocs framework.

– Roland Smith
Jan 6 at 18:45

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk