Does wave / subgroup need synchronization for shared variables?
![Multi tool use Multi tool use](http://sgv.ssvwv.com/sg/ssvwvcomimagb.png)
Multi tool use
I am wondering if within a same wave / subgroup (warp?) we need to call memoryBarrierShared
and barrier
to synchronize shared variable? In NVIDIA I think it is not necessary, but I do not know for other IHVs.
EDIT : ballot
Since I am talking about wave / subgroup, I am talking about the ARB_shader_ballot
extension.
Let's say we have such code (1) :
shared uint s_data[128];
uint tid = gl_GlobalInvocationID.x;
// initialization of some s_data
memoryBarrierShared();
barrier();
if(tid < gl_SubGroupSizeARB) {
for(uint i = gl_SubGroupeSizeARB; i > 0; i>>=1)
s_data[tid] += s_data[tid + i];
}
According to me, this code is not correct. The correct one, according to the spec, would be (2):
if(tid < gl_SubGroupSizeARB) {
for(uint i = gl_SubGroupeSizeARB; i > 0; i>>=1) {
s_data[tid] += s_data[tid + i];
memoryBarrierShared();
barrier();
}
}
However, since invocations run in parallel within a wave/subgroup, the barrier
function seems to be useless : this one should be correct as well and faster than the second (3) :
if(tid < gl_SubGroupSizeARB) {
for(uint i = gl_SubGroupeSizeARB; i > 0; i>>=1) {
s_data[tid] += s_data[tid + i];
memoryBarrierShared();
}
}
However, since we do not need barrier
function, I wonder if (1) is correct, even if it is unlikely for me, and if not, if (3) is correct (that would means that my understanding is correct)
EDIT : int to uint, and change =
to +=
opengl glsl vulkan
|
show 7 more comments
I am wondering if within a same wave / subgroup (warp?) we need to call memoryBarrierShared
and barrier
to synchronize shared variable? In NVIDIA I think it is not necessary, but I do not know for other IHVs.
EDIT : ballot
Since I am talking about wave / subgroup, I am talking about the ARB_shader_ballot
extension.
Let's say we have such code (1) :
shared uint s_data[128];
uint tid = gl_GlobalInvocationID.x;
// initialization of some s_data
memoryBarrierShared();
barrier();
if(tid < gl_SubGroupSizeARB) {
for(uint i = gl_SubGroupeSizeARB; i > 0; i>>=1)
s_data[tid] += s_data[tid + i];
}
According to me, this code is not correct. The correct one, according to the spec, would be (2):
if(tid < gl_SubGroupSizeARB) {
for(uint i = gl_SubGroupeSizeARB; i > 0; i>>=1) {
s_data[tid] += s_data[tid + i];
memoryBarrierShared();
barrier();
}
}
However, since invocations run in parallel within a wave/subgroup, the barrier
function seems to be useless : this one should be correct as well and faster than the second (3) :
if(tid < gl_SubGroupSizeARB) {
for(uint i = gl_SubGroupeSizeARB; i > 0; i>>=1) {
s_data[tid] += s_data[tid + i];
memoryBarrierShared();
}
}
However, since we do not need barrier
function, I wonder if (1) is correct, even if it is unlikely for me, and if not, if (3) is correct (that would means that my understanding is correct)
EDIT : int to uint, and change =
to +=
opengl glsl vulkan
1
"According to me, this code is not correct." Well, what exactly is it supposed to do? I don't understand what your code is intended to accomplish. I have no idea whats_data
is, what values it has, or what it is intended to eventually store. And since all versions of your code exhibit UB, it's not clear what is supposed to be happening here.
– Nicol Bolas
Jan 3 at 15:36
The idea of my code is to accomplish a reduction. (I wanted to write+=
instead of=
).s_data
is only "values". What UB do my codes have?
– Antoine Morrier
Jan 3 at 15:55
1
In every case, you have invocations reading from memory that some other invocation will write to with no barriers between them to provide ordering/visibility. Even in your case 2, an invocation wheretid == 1
will write to a variable that thetid == 0
invocation reads from. That's undefined behavior, whethershader_ballot
exists or not.
– Nicol Bolas
Jan 3 at 16:00
1
@AntoineMorrier ARB_shader_ballot must define a groupsize, but that is not it's purpose. shader_ballot makes no guarantees about the underlying architecture beyond the fact that ballotARB works if the vendor has implemented the extension. Unrolling the last warp works because all other warps are free to do other work with in a Streaming Multiprocessor (NV specific) but also relies on undefined behavior EVEN ON NVIDIA GPUS to carry out adding values simultaneously accumulated from each warp. (cont.)
– opa
Jan 3 at 16:34
1
@AntoineMorrier 1) Yes, the code is UB on Nvidia GPUs. 2) Yes, I was talking about the code in the article. There either should be a__syncwarp
extension provided for GLSL, or it should be built into other primitives provided by extension, for exampleballotARB
internally may just be the__ballot_sync
cuda function on Nvidia gpus, which performs ballot and syncs the warp ensuring safe result.
– opa
Jan 3 at 17:57
|
show 7 more comments
I am wondering if within a same wave / subgroup (warp?) we need to call memoryBarrierShared
and barrier
to synchronize shared variable? In NVIDIA I think it is not necessary, but I do not know for other IHVs.
EDIT : ballot
Since I am talking about wave / subgroup, I am talking about the ARB_shader_ballot
extension.
Let's say we have such code (1) :
shared uint s_data[128];
uint tid = gl_GlobalInvocationID.x;
// initialization of some s_data
memoryBarrierShared();
barrier();
if(tid < gl_SubGroupSizeARB) {
for(uint i = gl_SubGroupeSizeARB; i > 0; i>>=1)
s_data[tid] += s_data[tid + i];
}
According to me, this code is not correct. The correct one, according to the spec, would be (2):
if(tid < gl_SubGroupSizeARB) {
for(uint i = gl_SubGroupeSizeARB; i > 0; i>>=1) {
s_data[tid] += s_data[tid + i];
memoryBarrierShared();
barrier();
}
}
However, since invocations run in parallel within a wave/subgroup, the barrier
function seems to be useless : this one should be correct as well and faster than the second (3) :
if(tid < gl_SubGroupSizeARB) {
for(uint i = gl_SubGroupeSizeARB; i > 0; i>>=1) {
s_data[tid] += s_data[tid + i];
memoryBarrierShared();
}
}
However, since we do not need barrier
function, I wonder if (1) is correct, even if it is unlikely for me, and if not, if (3) is correct (that would means that my understanding is correct)
EDIT : int to uint, and change =
to +=
opengl glsl vulkan
I am wondering if within a same wave / subgroup (warp?) we need to call memoryBarrierShared
and barrier
to synchronize shared variable? In NVIDIA I think it is not necessary, but I do not know for other IHVs.
EDIT : ballot
Since I am talking about wave / subgroup, I am talking about the ARB_shader_ballot
extension.
Let's say we have such code (1) :
shared uint s_data[128];
uint tid = gl_GlobalInvocationID.x;
// initialization of some s_data
memoryBarrierShared();
barrier();
if(tid < gl_SubGroupSizeARB) {
for(uint i = gl_SubGroupeSizeARB; i > 0; i>>=1)
s_data[tid] += s_data[tid + i];
}
According to me, this code is not correct. The correct one, according to the spec, would be (2):
if(tid < gl_SubGroupSizeARB) {
for(uint i = gl_SubGroupeSizeARB; i > 0; i>>=1) {
s_data[tid] += s_data[tid + i];
memoryBarrierShared();
barrier();
}
}
However, since invocations run in parallel within a wave/subgroup, the barrier
function seems to be useless : this one should be correct as well and faster than the second (3) :
if(tid < gl_SubGroupSizeARB) {
for(uint i = gl_SubGroupeSizeARB; i > 0; i>>=1) {
s_data[tid] += s_data[tid + i];
memoryBarrierShared();
}
}
However, since we do not need barrier
function, I wonder if (1) is correct, even if it is unlikely for me, and if not, if (3) is correct (that would means that my understanding is correct)
EDIT : int to uint, and change =
to +=
opengl glsl vulkan
opengl glsl vulkan
edited Jan 3 at 15:51
Antoine Morrier
asked Jan 3 at 9:01
Antoine MorrierAntoine Morrier
2,102721
2,102721
1
"According to me, this code is not correct." Well, what exactly is it supposed to do? I don't understand what your code is intended to accomplish. I have no idea whats_data
is, what values it has, or what it is intended to eventually store. And since all versions of your code exhibit UB, it's not clear what is supposed to be happening here.
– Nicol Bolas
Jan 3 at 15:36
The idea of my code is to accomplish a reduction. (I wanted to write+=
instead of=
).s_data
is only "values". What UB do my codes have?
– Antoine Morrier
Jan 3 at 15:55
1
In every case, you have invocations reading from memory that some other invocation will write to with no barriers between them to provide ordering/visibility. Even in your case 2, an invocation wheretid == 1
will write to a variable that thetid == 0
invocation reads from. That's undefined behavior, whethershader_ballot
exists or not.
– Nicol Bolas
Jan 3 at 16:00
1
@AntoineMorrier ARB_shader_ballot must define a groupsize, but that is not it's purpose. shader_ballot makes no guarantees about the underlying architecture beyond the fact that ballotARB works if the vendor has implemented the extension. Unrolling the last warp works because all other warps are free to do other work with in a Streaming Multiprocessor (NV specific) but also relies on undefined behavior EVEN ON NVIDIA GPUS to carry out adding values simultaneously accumulated from each warp. (cont.)
– opa
Jan 3 at 16:34
1
@AntoineMorrier 1) Yes, the code is UB on Nvidia GPUs. 2) Yes, I was talking about the code in the article. There either should be a__syncwarp
extension provided for GLSL, or it should be built into other primitives provided by extension, for exampleballotARB
internally may just be the__ballot_sync
cuda function on Nvidia gpus, which performs ballot and syncs the warp ensuring safe result.
– opa
Jan 3 at 17:57
|
show 7 more comments
1
"According to me, this code is not correct." Well, what exactly is it supposed to do? I don't understand what your code is intended to accomplish. I have no idea whats_data
is, what values it has, or what it is intended to eventually store. And since all versions of your code exhibit UB, it's not clear what is supposed to be happening here.
– Nicol Bolas
Jan 3 at 15:36
The idea of my code is to accomplish a reduction. (I wanted to write+=
instead of=
).s_data
is only "values". What UB do my codes have?
– Antoine Morrier
Jan 3 at 15:55
1
In every case, you have invocations reading from memory that some other invocation will write to with no barriers between them to provide ordering/visibility. Even in your case 2, an invocation wheretid == 1
will write to a variable that thetid == 0
invocation reads from. That's undefined behavior, whethershader_ballot
exists or not.
– Nicol Bolas
Jan 3 at 16:00
1
@AntoineMorrier ARB_shader_ballot must define a groupsize, but that is not it's purpose. shader_ballot makes no guarantees about the underlying architecture beyond the fact that ballotARB works if the vendor has implemented the extension. Unrolling the last warp works because all other warps are free to do other work with in a Streaming Multiprocessor (NV specific) but also relies on undefined behavior EVEN ON NVIDIA GPUS to carry out adding values simultaneously accumulated from each warp. (cont.)
– opa
Jan 3 at 16:34
1
@AntoineMorrier 1) Yes, the code is UB on Nvidia GPUs. 2) Yes, I was talking about the code in the article. There either should be a__syncwarp
extension provided for GLSL, or it should be built into other primitives provided by extension, for exampleballotARB
internally may just be the__ballot_sync
cuda function on Nvidia gpus, which performs ballot and syncs the warp ensuring safe result.
– opa
Jan 3 at 17:57
1
1
"According to me, this code is not correct." Well, what exactly is it supposed to do? I don't understand what your code is intended to accomplish. I have no idea what
s_data
is, what values it has, or what it is intended to eventually store. And since all versions of your code exhibit UB, it's not clear what is supposed to be happening here.– Nicol Bolas
Jan 3 at 15:36
"According to me, this code is not correct." Well, what exactly is it supposed to do? I don't understand what your code is intended to accomplish. I have no idea what
s_data
is, what values it has, or what it is intended to eventually store. And since all versions of your code exhibit UB, it's not clear what is supposed to be happening here.– Nicol Bolas
Jan 3 at 15:36
The idea of my code is to accomplish a reduction. (I wanted to write
+=
instead of =
). s_data
is only "values". What UB do my codes have?– Antoine Morrier
Jan 3 at 15:55
The idea of my code is to accomplish a reduction. (I wanted to write
+=
instead of =
). s_data
is only "values". What UB do my codes have?– Antoine Morrier
Jan 3 at 15:55
1
1
In every case, you have invocations reading from memory that some other invocation will write to with no barriers between them to provide ordering/visibility. Even in your case 2, an invocation where
tid == 1
will write to a variable that the tid == 0
invocation reads from. That's undefined behavior, whether shader_ballot
exists or not.– Nicol Bolas
Jan 3 at 16:00
In every case, you have invocations reading from memory that some other invocation will write to with no barriers between them to provide ordering/visibility. Even in your case 2, an invocation where
tid == 1
will write to a variable that the tid == 0
invocation reads from. That's undefined behavior, whether shader_ballot
exists or not.– Nicol Bolas
Jan 3 at 16:00
1
1
@AntoineMorrier ARB_shader_ballot must define a groupsize, but that is not it's purpose. shader_ballot makes no guarantees about the underlying architecture beyond the fact that ballotARB works if the vendor has implemented the extension. Unrolling the last warp works because all other warps are free to do other work with in a Streaming Multiprocessor (NV specific) but also relies on undefined behavior EVEN ON NVIDIA GPUS to carry out adding values simultaneously accumulated from each warp. (cont.)
– opa
Jan 3 at 16:34
@AntoineMorrier ARB_shader_ballot must define a groupsize, but that is not it's purpose. shader_ballot makes no guarantees about the underlying architecture beyond the fact that ballotARB works if the vendor has implemented the extension. Unrolling the last warp works because all other warps are free to do other work with in a Streaming Multiprocessor (NV specific) but also relies on undefined behavior EVEN ON NVIDIA GPUS to carry out adding values simultaneously accumulated from each warp. (cont.)
– opa
Jan 3 at 16:34
1
1
@AntoineMorrier 1) Yes, the code is UB on Nvidia GPUs. 2) Yes, I was talking about the code in the article. There either should be a
__syncwarp
extension provided for GLSL, or it should be built into other primitives provided by extension, for example ballotARB
internally may just be the __ballot_sync
cuda function on Nvidia gpus, which performs ballot and syncs the warp ensuring safe result.– opa
Jan 3 at 17:57
@AntoineMorrier 1) Yes, the code is UB on Nvidia GPUs. 2) Yes, I was talking about the code in the article. There either should be a
__syncwarp
extension provided for GLSL, or it should be built into other primitives provided by extension, for example ballotARB
internally may just be the __ballot_sync
cuda function on Nvidia gpus, which performs ballot and syncs the warp ensuring safe result.– opa
Jan 3 at 17:57
|
show 7 more comments
1 Answer
1
active
oldest
votes
The execution model shared by OpenGL and Vulkan with regard to compute shaders does not really recognize the concept of a "wave". It has the concept of a work group, but that is not the same thing. A work group can be much bigger than a GPU "wave", and for small work groups, multiple work groups could be executing on the same GPU "wave".
As such, these specifications make no statements about the behavior of any of its functions with regard to a "wave" (with the exception of shader ballot functions). So if you want synchronization that the standard says will work on all conforming implementations, you must call both functions as dictated by the standard.
Even with ARB_shader_ballot
, its behavior does not modify the execution model of shaders. It only allows cross-communication between subgroups, and only via the explicit mechanisms that it provides.
The execution model and memory model of shader invocations is that they are unordered with respect to each other, unless you explicitly order them with barriers.
I am talking about the shader ballot functions :). I am talking about it because I want to optimize my code by using this extension :)
– Antoine Morrier
Jan 3 at 14:25
3
@AntoineMorrier: No, you aren't. You mentioned shared variables,barrier
andmemoryBarrierShared
. Nowhere did you bring up shader ballot stuff. So you should fix your question to ask about what you wanted to know about, preferably with some source code.
– Nicol Bolas
Jan 3 at 14:26
I edited the question and add some source code :)
– Antoine Morrier
Jan 3 at 14:52
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54019084%2fdoes-wave-subgroup-need-synchronization-for-shared-variables%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The execution model shared by OpenGL and Vulkan with regard to compute shaders does not really recognize the concept of a "wave". It has the concept of a work group, but that is not the same thing. A work group can be much bigger than a GPU "wave", and for small work groups, multiple work groups could be executing on the same GPU "wave".
As such, these specifications make no statements about the behavior of any of its functions with regard to a "wave" (with the exception of shader ballot functions). So if you want synchronization that the standard says will work on all conforming implementations, you must call both functions as dictated by the standard.
Even with ARB_shader_ballot
, its behavior does not modify the execution model of shaders. It only allows cross-communication between subgroups, and only via the explicit mechanisms that it provides.
The execution model and memory model of shader invocations is that they are unordered with respect to each other, unless you explicitly order them with barriers.
I am talking about the shader ballot functions :). I am talking about it because I want to optimize my code by using this extension :)
– Antoine Morrier
Jan 3 at 14:25
3
@AntoineMorrier: No, you aren't. You mentioned shared variables,barrier
andmemoryBarrierShared
. Nowhere did you bring up shader ballot stuff. So you should fix your question to ask about what you wanted to know about, preferably with some source code.
– Nicol Bolas
Jan 3 at 14:26
I edited the question and add some source code :)
– Antoine Morrier
Jan 3 at 14:52
add a comment |
The execution model shared by OpenGL and Vulkan with regard to compute shaders does not really recognize the concept of a "wave". It has the concept of a work group, but that is not the same thing. A work group can be much bigger than a GPU "wave", and for small work groups, multiple work groups could be executing on the same GPU "wave".
As such, these specifications make no statements about the behavior of any of its functions with regard to a "wave" (with the exception of shader ballot functions). So if you want synchronization that the standard says will work on all conforming implementations, you must call both functions as dictated by the standard.
Even with ARB_shader_ballot
, its behavior does not modify the execution model of shaders. It only allows cross-communication between subgroups, and only via the explicit mechanisms that it provides.
The execution model and memory model of shader invocations is that they are unordered with respect to each other, unless you explicitly order them with barriers.
I am talking about the shader ballot functions :). I am talking about it because I want to optimize my code by using this extension :)
– Antoine Morrier
Jan 3 at 14:25
3
@AntoineMorrier: No, you aren't. You mentioned shared variables,barrier
andmemoryBarrierShared
. Nowhere did you bring up shader ballot stuff. So you should fix your question to ask about what you wanted to know about, preferably with some source code.
– Nicol Bolas
Jan 3 at 14:26
I edited the question and add some source code :)
– Antoine Morrier
Jan 3 at 14:52
add a comment |
The execution model shared by OpenGL and Vulkan with regard to compute shaders does not really recognize the concept of a "wave". It has the concept of a work group, but that is not the same thing. A work group can be much bigger than a GPU "wave", and for small work groups, multiple work groups could be executing on the same GPU "wave".
As such, these specifications make no statements about the behavior of any of its functions with regard to a "wave" (with the exception of shader ballot functions). So if you want synchronization that the standard says will work on all conforming implementations, you must call both functions as dictated by the standard.
Even with ARB_shader_ballot
, its behavior does not modify the execution model of shaders. It only allows cross-communication between subgroups, and only via the explicit mechanisms that it provides.
The execution model and memory model of shader invocations is that they are unordered with respect to each other, unless you explicitly order them with barriers.
The execution model shared by OpenGL and Vulkan with regard to compute shaders does not really recognize the concept of a "wave". It has the concept of a work group, but that is not the same thing. A work group can be much bigger than a GPU "wave", and for small work groups, multiple work groups could be executing on the same GPU "wave".
As such, these specifications make no statements about the behavior of any of its functions with regard to a "wave" (with the exception of shader ballot functions). So if you want synchronization that the standard says will work on all conforming implementations, you must call both functions as dictated by the standard.
Even with ARB_shader_ballot
, its behavior does not modify the execution model of shaders. It only allows cross-communication between subgroups, and only via the explicit mechanisms that it provides.
The execution model and memory model of shader invocations is that they are unordered with respect to each other, unless you explicitly order them with barriers.
edited Jan 3 at 16:02
answered Jan 3 at 14:21
Nicol BolasNicol Bolas
290k34481657
290k34481657
I am talking about the shader ballot functions :). I am talking about it because I want to optimize my code by using this extension :)
– Antoine Morrier
Jan 3 at 14:25
3
@AntoineMorrier: No, you aren't. You mentioned shared variables,barrier
andmemoryBarrierShared
. Nowhere did you bring up shader ballot stuff. So you should fix your question to ask about what you wanted to know about, preferably with some source code.
– Nicol Bolas
Jan 3 at 14:26
I edited the question and add some source code :)
– Antoine Morrier
Jan 3 at 14:52
add a comment |
I am talking about the shader ballot functions :). I am talking about it because I want to optimize my code by using this extension :)
– Antoine Morrier
Jan 3 at 14:25
3
@AntoineMorrier: No, you aren't. You mentioned shared variables,barrier
andmemoryBarrierShared
. Nowhere did you bring up shader ballot stuff. So you should fix your question to ask about what you wanted to know about, preferably with some source code.
– Nicol Bolas
Jan 3 at 14:26
I edited the question and add some source code :)
– Antoine Morrier
Jan 3 at 14:52
I am talking about the shader ballot functions :). I am talking about it because I want to optimize my code by using this extension :)
– Antoine Morrier
Jan 3 at 14:25
I am talking about the shader ballot functions :). I am talking about it because I want to optimize my code by using this extension :)
– Antoine Morrier
Jan 3 at 14:25
3
3
@AntoineMorrier: No, you aren't. You mentioned shared variables,
barrier
and memoryBarrierShared
. Nowhere did you bring up shader ballot stuff. So you should fix your question to ask about what you wanted to know about, preferably with some source code.– Nicol Bolas
Jan 3 at 14:26
@AntoineMorrier: No, you aren't. You mentioned shared variables,
barrier
and memoryBarrierShared
. Nowhere did you bring up shader ballot stuff. So you should fix your question to ask about what you wanted to know about, preferably with some source code.– Nicol Bolas
Jan 3 at 14:26
I edited the question and add some source code :)
– Antoine Morrier
Jan 3 at 14:52
I edited the question and add some source code :)
– Antoine Morrier
Jan 3 at 14:52
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54019084%2fdoes-wave-subgroup-need-synchronization-for-shared-variables%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
JN08xBJ wRjT,1Xx2rJZ wCQsFmQ6zqil4 4Cyd YEoRjH8cKCosPuDGTfJZ p3ywe8jnXkop5a flupuSPVPIa7U
1
"According to me, this code is not correct." Well, what exactly is it supposed to do? I don't understand what your code is intended to accomplish. I have no idea what
s_data
is, what values it has, or what it is intended to eventually store. And since all versions of your code exhibit UB, it's not clear what is supposed to be happening here.– Nicol Bolas
Jan 3 at 15:36
The idea of my code is to accomplish a reduction. (I wanted to write
+=
instead of=
).s_data
is only "values". What UB do my codes have?– Antoine Morrier
Jan 3 at 15:55
1
In every case, you have invocations reading from memory that some other invocation will write to with no barriers between them to provide ordering/visibility. Even in your case 2, an invocation where
tid == 1
will write to a variable that thetid == 0
invocation reads from. That's undefined behavior, whethershader_ballot
exists or not.– Nicol Bolas
Jan 3 at 16:00
1
@AntoineMorrier ARB_shader_ballot must define a groupsize, but that is not it's purpose. shader_ballot makes no guarantees about the underlying architecture beyond the fact that ballotARB works if the vendor has implemented the extension. Unrolling the last warp works because all other warps are free to do other work with in a Streaming Multiprocessor (NV specific) but also relies on undefined behavior EVEN ON NVIDIA GPUS to carry out adding values simultaneously accumulated from each warp. (cont.)
– opa
Jan 3 at 16:34
1
@AntoineMorrier 1) Yes, the code is UB on Nvidia GPUs. 2) Yes, I was talking about the code in the article. There either should be a
__syncwarp
extension provided for GLSL, or it should be built into other primitives provided by extension, for exampleballotARB
internally may just be the__ballot_sync
cuda function on Nvidia gpus, which performs ballot and syncs the warp ensuring safe result.– opa
Jan 3 at 17:57