Designing OpenCV operations. Determining when to use CPU vs GPU
I'm working on an OpenCV project to monitor a 1080p 60fps video feed and also apply custom graphics to this feed. I'm looking for general guidance about designing some of the higher-level operations in my system which compose multiple matrix operations. For example, in one of my features I am resizing a video frame and applying an overlay to that resized frame. The following diagram describes the process:
Here is the implementation of the process (currently done in C# opencvsharp, however, I can shift to any language at this point):
private void updateFrame(Mat currentFrame, Mat background, Mat mask, Mat invertedMask)
{
int w = 400, h = 224;
using (var resizedFrame = new Mat(
new OpenCvSharp.Size(currentFrame.Size().Width - w, currentFrame.Size().Height - h),
currentFrame.Type()))
using (var resizedBorderFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
using (var maskedFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
using (var maskedBackground = new Mat(currentFrame.Size(), currentFrame.Type()))
using (var output = new Mat(currentFrame.Size(), currentFrame.Type()))
{
Cv2.Resize(currentFrame, resizedFrame, resizedFrame.Size());
Cv2.CopyMakeBorder(resizedFrame, resizedBorderFrame, h/4, h*3/4, w/2, w/2, BorderTypes.Constant, new Scalar(0));
Cv2.BitwiseAnd(resizedBorderFrame, mask, maskedFrame);
Cv2.BitwiseAnd(background, invertedMask, maskedBackground);
Cv2.BitwiseOr(maskedBackground, maskedFrame, output);
pictureBox.Image = OpenCvSharp.Extensions.BitmapConverter.ToBitmap(output);
}
}
This process (along with a few other operations) is beginning to take longer than the framerate of the video, creating a noticeable lag. Currently the process is being performed using CPU based operations, however, I read that applying GPU operations could speed up the runtime a considerable amount. Further, I read that creating a custom kernel to combine operations (or creating the whole series as a compound-kernel operation) could speed this up even more. I'm also trying to analyze which operations are not constrained by the CPU, which might make the GPU equivalent operation an overkill.
If you were to evaluate this problem from the start, how would you go about determining which operations to put on CPU vs GPU vs custom kernel? Or rather, what resources and tools could I be using for analyzing the performance differences? And, what other optimizations or processes should I be employing when considering these types of problems?
c# performance opencv gpu cpu
add a comment |
I'm working on an OpenCV project to monitor a 1080p 60fps video feed and also apply custom graphics to this feed. I'm looking for general guidance about designing some of the higher-level operations in my system which compose multiple matrix operations. For example, in one of my features I am resizing a video frame and applying an overlay to that resized frame. The following diagram describes the process:
Here is the implementation of the process (currently done in C# opencvsharp, however, I can shift to any language at this point):
private void updateFrame(Mat currentFrame, Mat background, Mat mask, Mat invertedMask)
{
int w = 400, h = 224;
using (var resizedFrame = new Mat(
new OpenCvSharp.Size(currentFrame.Size().Width - w, currentFrame.Size().Height - h),
currentFrame.Type()))
using (var resizedBorderFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
using (var maskedFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
using (var maskedBackground = new Mat(currentFrame.Size(), currentFrame.Type()))
using (var output = new Mat(currentFrame.Size(), currentFrame.Type()))
{
Cv2.Resize(currentFrame, resizedFrame, resizedFrame.Size());
Cv2.CopyMakeBorder(resizedFrame, resizedBorderFrame, h/4, h*3/4, w/2, w/2, BorderTypes.Constant, new Scalar(0));
Cv2.BitwiseAnd(resizedBorderFrame, mask, maskedFrame);
Cv2.BitwiseAnd(background, invertedMask, maskedBackground);
Cv2.BitwiseOr(maskedBackground, maskedFrame, output);
pictureBox.Image = OpenCvSharp.Extensions.BitmapConverter.ToBitmap(output);
}
}
This process (along with a few other operations) is beginning to take longer than the framerate of the video, creating a noticeable lag. Currently the process is being performed using CPU based operations, however, I read that applying GPU operations could speed up the runtime a considerable amount. Further, I read that creating a custom kernel to combine operations (or creating the whole series as a compound-kernel operation) could speed this up even more. I'm also trying to analyze which operations are not constrained by the CPU, which might make the GPU equivalent operation an overkill.
If you were to evaluate this problem from the start, how would you go about determining which operations to put on CPU vs GPU vs custom kernel? Or rather, what resources and tools could I be using for analyzing the performance differences? And, what other optimizations or processes should I be employing when considering these types of problems?
c# performance opencv gpu cpu
1
GPUs are good at stream processing: the same operation with lots of independent data. CPUs are good at exploiting data dependency chains. If you can process each pixel, or pixels group, independently then you have a problem set that is easy to process in parallel and that's a GPU job.
– Margaret Bloom
Dec 31 '18 at 18:25
@MargaretBloom that makes sense. This follow up question is probably application specific, but for this scenario, do you think I should be attempting to limit the amount of calls to GPU operations by making more complex kernel operations? Or rather, do you imagine the time required for loading 1080p image buffers from RAM to GPU memory significant enough to warrant more complex kernels?
– flakes
Dec 31 '18 at 18:45
add a comment |
I'm working on an OpenCV project to monitor a 1080p 60fps video feed and also apply custom graphics to this feed. I'm looking for general guidance about designing some of the higher-level operations in my system which compose multiple matrix operations. For example, in one of my features I am resizing a video frame and applying an overlay to that resized frame. The following diagram describes the process:
Here is the implementation of the process (currently done in C# opencvsharp, however, I can shift to any language at this point):
private void updateFrame(Mat currentFrame, Mat background, Mat mask, Mat invertedMask)
{
int w = 400, h = 224;
using (var resizedFrame = new Mat(
new OpenCvSharp.Size(currentFrame.Size().Width - w, currentFrame.Size().Height - h),
currentFrame.Type()))
using (var resizedBorderFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
using (var maskedFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
using (var maskedBackground = new Mat(currentFrame.Size(), currentFrame.Type()))
using (var output = new Mat(currentFrame.Size(), currentFrame.Type()))
{
Cv2.Resize(currentFrame, resizedFrame, resizedFrame.Size());
Cv2.CopyMakeBorder(resizedFrame, resizedBorderFrame, h/4, h*3/4, w/2, w/2, BorderTypes.Constant, new Scalar(0));
Cv2.BitwiseAnd(resizedBorderFrame, mask, maskedFrame);
Cv2.BitwiseAnd(background, invertedMask, maskedBackground);
Cv2.BitwiseOr(maskedBackground, maskedFrame, output);
pictureBox.Image = OpenCvSharp.Extensions.BitmapConverter.ToBitmap(output);
}
}
This process (along with a few other operations) is beginning to take longer than the framerate of the video, creating a noticeable lag. Currently the process is being performed using CPU based operations, however, I read that applying GPU operations could speed up the runtime a considerable amount. Further, I read that creating a custom kernel to combine operations (or creating the whole series as a compound-kernel operation) could speed this up even more. I'm also trying to analyze which operations are not constrained by the CPU, which might make the GPU equivalent operation an overkill.
If you were to evaluate this problem from the start, how would you go about determining which operations to put on CPU vs GPU vs custom kernel? Or rather, what resources and tools could I be using for analyzing the performance differences? And, what other optimizations or processes should I be employing when considering these types of problems?
c# performance opencv gpu cpu
I'm working on an OpenCV project to monitor a 1080p 60fps video feed and also apply custom graphics to this feed. I'm looking for general guidance about designing some of the higher-level operations in my system which compose multiple matrix operations. For example, in one of my features I am resizing a video frame and applying an overlay to that resized frame. The following diagram describes the process:
Here is the implementation of the process (currently done in C# opencvsharp, however, I can shift to any language at this point):
private void updateFrame(Mat currentFrame, Mat background, Mat mask, Mat invertedMask)
{
int w = 400, h = 224;
using (var resizedFrame = new Mat(
new OpenCvSharp.Size(currentFrame.Size().Width - w, currentFrame.Size().Height - h),
currentFrame.Type()))
using (var resizedBorderFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
using (var maskedFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
using (var maskedBackground = new Mat(currentFrame.Size(), currentFrame.Type()))
using (var output = new Mat(currentFrame.Size(), currentFrame.Type()))
{
Cv2.Resize(currentFrame, resizedFrame, resizedFrame.Size());
Cv2.CopyMakeBorder(resizedFrame, resizedBorderFrame, h/4, h*3/4, w/2, w/2, BorderTypes.Constant, new Scalar(0));
Cv2.BitwiseAnd(resizedBorderFrame, mask, maskedFrame);
Cv2.BitwiseAnd(background, invertedMask, maskedBackground);
Cv2.BitwiseOr(maskedBackground, maskedFrame, output);
pictureBox.Image = OpenCvSharp.Extensions.BitmapConverter.ToBitmap(output);
}
}
This process (along with a few other operations) is beginning to take longer than the framerate of the video, creating a noticeable lag. Currently the process is being performed using CPU based operations, however, I read that applying GPU operations could speed up the runtime a considerable amount. Further, I read that creating a custom kernel to combine operations (or creating the whole series as a compound-kernel operation) could speed this up even more. I'm also trying to analyze which operations are not constrained by the CPU, which might make the GPU equivalent operation an overkill.
If you were to evaluate this problem from the start, how would you go about determining which operations to put on CPU vs GPU vs custom kernel? Or rather, what resources and tools could I be using for analyzing the performance differences? And, what other optimizations or processes should I be employing when considering these types of problems?
c# performance opencv gpu cpu
c# performance opencv gpu cpu
edited Dec 31 '18 at 17:41
flakes
asked Dec 31 '18 at 16:42
flakesflakes
6,57511950
6,57511950
1
GPUs are good at stream processing: the same operation with lots of independent data. CPUs are good at exploiting data dependency chains. If you can process each pixel, or pixels group, independently then you have a problem set that is easy to process in parallel and that's a GPU job.
– Margaret Bloom
Dec 31 '18 at 18:25
@MargaretBloom that makes sense. This follow up question is probably application specific, but for this scenario, do you think I should be attempting to limit the amount of calls to GPU operations by making more complex kernel operations? Or rather, do you imagine the time required for loading 1080p image buffers from RAM to GPU memory significant enough to warrant more complex kernels?
– flakes
Dec 31 '18 at 18:45
add a comment |
1
GPUs are good at stream processing: the same operation with lots of independent data. CPUs are good at exploiting data dependency chains. If you can process each pixel, or pixels group, independently then you have a problem set that is easy to process in parallel and that's a GPU job.
– Margaret Bloom
Dec 31 '18 at 18:25
@MargaretBloom that makes sense. This follow up question is probably application specific, but for this scenario, do you think I should be attempting to limit the amount of calls to GPU operations by making more complex kernel operations? Or rather, do you imagine the time required for loading 1080p image buffers from RAM to GPU memory significant enough to warrant more complex kernels?
– flakes
Dec 31 '18 at 18:45
1
1
GPUs are good at stream processing: the same operation with lots of independent data. CPUs are good at exploiting data dependency chains. If you can process each pixel, or pixels group, independently then you have a problem set that is easy to process in parallel and that's a GPU job.
– Margaret Bloom
Dec 31 '18 at 18:25
GPUs are good at stream processing: the same operation with lots of independent data. CPUs are good at exploiting data dependency chains. If you can process each pixel, or pixels group, independently then you have a problem set that is easy to process in parallel and that's a GPU job.
– Margaret Bloom
Dec 31 '18 at 18:25
@MargaretBloom that makes sense. This follow up question is probably application specific, but for this scenario, do you think I should be attempting to limit the amount of calls to GPU operations by making more complex kernel operations? Or rather, do you imagine the time required for loading 1080p image buffers from RAM to GPU memory significant enough to warrant more complex kernels?
– flakes
Dec 31 '18 at 18:45
@MargaretBloom that makes sense. This follow up question is probably application specific, but for this scenario, do you think I should be attempting to limit the amount of calls to GPU operations by making more complex kernel operations? Or rather, do you imagine the time required for loading 1080p image buffers from RAM to GPU memory significant enough to warrant more complex kernels?
– flakes
Dec 31 '18 at 18:45
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53989595%2fdesigning-opencv-operations-determining-when-to-use-cpu-vs-gpu%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53989595%2fdesigning-opencv-operations-determining-when-to-use-cpu-vs-gpu%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
GPUs are good at stream processing: the same operation with lots of independent data. CPUs are good at exploiting data dependency chains. If you can process each pixel, or pixels group, independently then you have a problem set that is easy to process in parallel and that's a GPU job.
– Margaret Bloom
Dec 31 '18 at 18:25
@MargaretBloom that makes sense. This follow up question is probably application specific, but for this scenario, do you think I should be attempting to limit the amount of calls to GPU operations by making more complex kernel operations? Or rather, do you imagine the time required for loading 1080p image buffers from RAM to GPU memory significant enough to warrant more complex kernels?
– flakes
Dec 31 '18 at 18:45