ARMv8 floating point output inline assembly
For adding two integers, I write:
int sum;
asm volatile("add %0, x3, x4" : "=r"(sum) : :);
How can I do this with two floats?
I tried:
float sum;
asm volatile("fadd %0, s3, s4" : "=r"(sum) : :);
But it gives me an error:
Error: operand 1 should be a SIMD vector register -- `fadd x0,s3,s4'
Any ideas?
gcc floating-point arm inline-assembly arm64
add a comment |
For adding two integers, I write:
int sum;
asm volatile("add %0, x3, x4" : "=r"(sum) : :);
How can I do this with two floats?
I tried:
float sum;
asm volatile("fadd %0, s3, s4" : "=r"(sum) : :);
But it gives me an error:
Error: operand 1 should be a SIMD vector register -- `fadd x0,s3,s4'
Any ideas?
gcc floating-point arm inline-assembly arm64
add a comment |
For adding two integers, I write:
int sum;
asm volatile("add %0, x3, x4" : "=r"(sum) : :);
How can I do this with two floats?
I tried:
float sum;
asm volatile("fadd %0, s3, s4" : "=r"(sum) : :);
But it gives me an error:
Error: operand 1 should be a SIMD vector register -- `fadd x0,s3,s4'
Any ideas?
gcc floating-point arm inline-assembly arm64
For adding two integers, I write:
int sum;
asm volatile("add %0, x3, x4" : "=r"(sum) : :);
How can I do this with two floats?
I tried:
float sum;
asm volatile("fadd %0, s3, s4" : "=r"(sum) : :);
But it gives me an error:
Error: operand 1 should be a SIMD vector register -- `fadd x0,s3,s4'
Any ideas?
gcc floating-point arm inline-assembly arm64
gcc floating-point arm inline-assembly arm64
edited Dec 31 '18 at 0:54
Peter Cordes
121k17184312
121k17184312
asked Dec 28 '18 at 14:41
今天春天今天春天
486418
486418
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Because registers can have multiple names in AArch64 (v0, b0, h0, s0, d0 all refer to the same register) it is necessary to add an output modifier to the print string:
On Godbolt
float foo()
{
float sum;
asm volatile("fadd %s0, s3, s4" : "=w"(sum) : :);
return sum;
}
double dsum()
{
double sum;
asm volatile("fadd %d0, d3, d4" : "=w"(sum) : :);
return sum;
}
Will produce:
foo:
fadd s0, s3, s4 // sum
ret
dsum:
fadd d0, d3, d4 // sum
ret
Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…
– Peter Cordes
Jan 4 at 2:49
@PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.
– David Wohlferd
Jan 4 at 3:24
1
As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.
– David Wohlferd
Jan 4 at 3:26
1
If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…
– James Greenhalgh
Jan 4 at 13:34
@PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.
– David Wohlferd
Jan 4 at 23:40
|
show 5 more comments
"=r"
is the constraint for GP integer registers.
The GCC manual claims that "=w"
is the constraint for an FP / SIMD register on AArch64. But if you try that, you get v0
not s0
, which won't assemble. I don't know a workaround here, you should probably report on the gcc bugzilla that the constraint documented in the manual doesn't work for scalar FP.
On Godbolt I tried this source:
float foo()
{
float sum;
#ifdef __aarch64__
asm volatile("fadd %0, s3, s4" : "=w"(sum) : :); // AArch64
#else
asm volatile("fadds %0, s3, s4" : "=t"(sum) : :); // ARM32
#endif
return sum;
}
double dsum()
{
double sum;
#ifdef __aarch64__
asm volatile("fadd %0, d3, d4" : "=w"(sum) : :); // AArch64
#else
asm volatile("faddd %0, d3, d4" : "=w"(sum) : :); // ARM32
#endif
return sum;
}
clang7.0 (with its built-in assembler) requires the asm to be actually valid. But for gcc we're only compiling to asm, and Godbolt doesn't have a "binary mode" for non-x86.
# AArch64 gcc 8.2 -xc -O3 -fverbose-asm -Wall
# INVALID ASM, errors if you try to actually assemble it.
foo:
fadd v0, s3, s4 // sum
ret
dsum:
fadd v0, d3, d4 // sum
ret
clang produces the same asm, and its built-in assembler errors with:
<source>:5:18: error: invalid operand for instruction
asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);
^
<inline asm>:1:11: note: instantiated into assembly here
fadd v0, s3, s4
^
On 32-bit ARM, =t"
for single works, but "=w"
for (which the manual says you should use for double-precision) also gives you s0
with gcc. It works with clang, though. You have to use -mfloat-abi=hard
and a -mcpu=
something with an FPU, e.g. -mcpu=cortex-a15
# clang7.0 -xc -O3 -Wall--target=arm -mcpu=cortex-a15 -mfloat-abi=hard
# valid asm for ARM 32
foo:
vadd.f32 s0, s3, s4
bx lr
dsum:
vadd.f64 d0, d3, d4
bx lr
But gcc fails:
# ARM gcc 8.2 -xc -O3 -fverbose-asm -Wall -mfloat-abi=hard -mcpu=cortex-a15
foo:
fadds s0, s3, s4 @ sum
bx lr @
dsum:
faddd s0, d3, d4 @ sum @@@ INVALID
bx lr @
So you can use =t
for single just fine with gcc, but for double
presumably you need a %something0
modifier to print the register name as d0
instead of s0
, with a "=w"
output.
Obviously these asm statements would only be useful for anything beyond learning the syntax if you add constraints to specify the input operands as well, instead of reading whatever happened to be sitting in s3 and s4.
See also https://stackoverflow.com/tags/inline-assembly/info
@DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like"w" (input1)
.
– Peter Cordes
Dec 30 '18 at 23:52
@DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.
– Peter Cordes
Dec 31 '18 at 5:20
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53960240%2farmv8-floating-point-output-inline-assembly%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Because registers can have multiple names in AArch64 (v0, b0, h0, s0, d0 all refer to the same register) it is necessary to add an output modifier to the print string:
On Godbolt
float foo()
{
float sum;
asm volatile("fadd %s0, s3, s4" : "=w"(sum) : :);
return sum;
}
double dsum()
{
double sum;
asm volatile("fadd %d0, d3, d4" : "=w"(sum) : :);
return sum;
}
Will produce:
foo:
fadd s0, s3, s4 // sum
ret
dsum:
fadd d0, d3, d4 // sum
ret
Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…
– Peter Cordes
Jan 4 at 2:49
@PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.
– David Wohlferd
Jan 4 at 3:24
1
As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.
– David Wohlferd
Jan 4 at 3:26
1
If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…
– James Greenhalgh
Jan 4 at 13:34
@PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.
– David Wohlferd
Jan 4 at 23:40
|
show 5 more comments
Because registers can have multiple names in AArch64 (v0, b0, h0, s0, d0 all refer to the same register) it is necessary to add an output modifier to the print string:
On Godbolt
float foo()
{
float sum;
asm volatile("fadd %s0, s3, s4" : "=w"(sum) : :);
return sum;
}
double dsum()
{
double sum;
asm volatile("fadd %d0, d3, d4" : "=w"(sum) : :);
return sum;
}
Will produce:
foo:
fadd s0, s3, s4 // sum
ret
dsum:
fadd d0, d3, d4 // sum
ret
Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…
– Peter Cordes
Jan 4 at 2:49
@PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.
– David Wohlferd
Jan 4 at 3:24
1
As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.
– David Wohlferd
Jan 4 at 3:26
1
If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…
– James Greenhalgh
Jan 4 at 13:34
@PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.
– David Wohlferd
Jan 4 at 23:40
|
show 5 more comments
Because registers can have multiple names in AArch64 (v0, b0, h0, s0, d0 all refer to the same register) it is necessary to add an output modifier to the print string:
On Godbolt
float foo()
{
float sum;
asm volatile("fadd %s0, s3, s4" : "=w"(sum) : :);
return sum;
}
double dsum()
{
double sum;
asm volatile("fadd %d0, d3, d4" : "=w"(sum) : :);
return sum;
}
Will produce:
foo:
fadd s0, s3, s4 // sum
ret
dsum:
fadd d0, d3, d4 // sum
ret
Because registers can have multiple names in AArch64 (v0, b0, h0, s0, d0 all refer to the same register) it is necessary to add an output modifier to the print string:
On Godbolt
float foo()
{
float sum;
asm volatile("fadd %s0, s3, s4" : "=w"(sum) : :);
return sum;
}
double dsum()
{
double sum;
asm volatile("fadd %d0, d3, d4" : "=w"(sum) : :);
return sum;
}
Will produce:
foo:
fadd s0, s3, s4 // sum
ret
dsum:
fadd d0, d3, d4 // sum
ret
answered Jan 3 at 23:30
James GreenhalghJames Greenhalgh
2,0141012
2,0141012
Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…
– Peter Cordes
Jan 4 at 2:49
@PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.
– David Wohlferd
Jan 4 at 3:24
1
As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.
– David Wohlferd
Jan 4 at 3:26
1
If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…
– James Greenhalgh
Jan 4 at 13:34
@PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.
– David Wohlferd
Jan 4 at 23:40
|
show 5 more comments
Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…
– Peter Cordes
Jan 4 at 2:49
@PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.
– David Wohlferd
Jan 4 at 3:24
1
As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.
– David Wohlferd
Jan 4 at 3:26
1
If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…
– James Greenhalgh
Jan 4 at 13:34
@PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.
– David Wohlferd
Jan 4 at 23:40
Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…
– Peter Cordes
Jan 4 at 2:49
Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…
– Peter Cordes
Jan 4 at 2:49
@PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.
– David Wohlferd
Jan 4 at 3:24
@PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.
– David Wohlferd
Jan 4 at 3:24
1
1
As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.
– David Wohlferd
Jan 4 at 3:26
As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.
– David Wohlferd
Jan 4 at 3:26
1
1
If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…
– James Greenhalgh
Jan 4 at 13:34
If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…
– James Greenhalgh
Jan 4 at 13:34
@PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.
– David Wohlferd
Jan 4 at 23:40
@PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.
– David Wohlferd
Jan 4 at 23:40
|
show 5 more comments
"=r"
is the constraint for GP integer registers.
The GCC manual claims that "=w"
is the constraint for an FP / SIMD register on AArch64. But if you try that, you get v0
not s0
, which won't assemble. I don't know a workaround here, you should probably report on the gcc bugzilla that the constraint documented in the manual doesn't work for scalar FP.
On Godbolt I tried this source:
float foo()
{
float sum;
#ifdef __aarch64__
asm volatile("fadd %0, s3, s4" : "=w"(sum) : :); // AArch64
#else
asm volatile("fadds %0, s3, s4" : "=t"(sum) : :); // ARM32
#endif
return sum;
}
double dsum()
{
double sum;
#ifdef __aarch64__
asm volatile("fadd %0, d3, d4" : "=w"(sum) : :); // AArch64
#else
asm volatile("faddd %0, d3, d4" : "=w"(sum) : :); // ARM32
#endif
return sum;
}
clang7.0 (with its built-in assembler) requires the asm to be actually valid. But for gcc we're only compiling to asm, and Godbolt doesn't have a "binary mode" for non-x86.
# AArch64 gcc 8.2 -xc -O3 -fverbose-asm -Wall
# INVALID ASM, errors if you try to actually assemble it.
foo:
fadd v0, s3, s4 // sum
ret
dsum:
fadd v0, d3, d4 // sum
ret
clang produces the same asm, and its built-in assembler errors with:
<source>:5:18: error: invalid operand for instruction
asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);
^
<inline asm>:1:11: note: instantiated into assembly here
fadd v0, s3, s4
^
On 32-bit ARM, =t"
for single works, but "=w"
for (which the manual says you should use for double-precision) also gives you s0
with gcc. It works with clang, though. You have to use -mfloat-abi=hard
and a -mcpu=
something with an FPU, e.g. -mcpu=cortex-a15
# clang7.0 -xc -O3 -Wall--target=arm -mcpu=cortex-a15 -mfloat-abi=hard
# valid asm for ARM 32
foo:
vadd.f32 s0, s3, s4
bx lr
dsum:
vadd.f64 d0, d3, d4
bx lr
But gcc fails:
# ARM gcc 8.2 -xc -O3 -fverbose-asm -Wall -mfloat-abi=hard -mcpu=cortex-a15
foo:
fadds s0, s3, s4 @ sum
bx lr @
dsum:
faddd s0, d3, d4 @ sum @@@ INVALID
bx lr @
So you can use =t
for single just fine with gcc, but for double
presumably you need a %something0
modifier to print the register name as d0
instead of s0
, with a "=w"
output.
Obviously these asm statements would only be useful for anything beyond learning the syntax if you add constraints to specify the input operands as well, instead of reading whatever happened to be sitting in s3 and s4.
See also https://stackoverflow.com/tags/inline-assembly/info
@DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like"w" (input1)
.
– Peter Cordes
Dec 30 '18 at 23:52
@DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.
– Peter Cordes
Dec 31 '18 at 5:20
add a comment |
"=r"
is the constraint for GP integer registers.
The GCC manual claims that "=w"
is the constraint for an FP / SIMD register on AArch64. But if you try that, you get v0
not s0
, which won't assemble. I don't know a workaround here, you should probably report on the gcc bugzilla that the constraint documented in the manual doesn't work for scalar FP.
On Godbolt I tried this source:
float foo()
{
float sum;
#ifdef __aarch64__
asm volatile("fadd %0, s3, s4" : "=w"(sum) : :); // AArch64
#else
asm volatile("fadds %0, s3, s4" : "=t"(sum) : :); // ARM32
#endif
return sum;
}
double dsum()
{
double sum;
#ifdef __aarch64__
asm volatile("fadd %0, d3, d4" : "=w"(sum) : :); // AArch64
#else
asm volatile("faddd %0, d3, d4" : "=w"(sum) : :); // ARM32
#endif
return sum;
}
clang7.0 (with its built-in assembler) requires the asm to be actually valid. But for gcc we're only compiling to asm, and Godbolt doesn't have a "binary mode" for non-x86.
# AArch64 gcc 8.2 -xc -O3 -fverbose-asm -Wall
# INVALID ASM, errors if you try to actually assemble it.
foo:
fadd v0, s3, s4 // sum
ret
dsum:
fadd v0, d3, d4 // sum
ret
clang produces the same asm, and its built-in assembler errors with:
<source>:5:18: error: invalid operand for instruction
asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);
^
<inline asm>:1:11: note: instantiated into assembly here
fadd v0, s3, s4
^
On 32-bit ARM, =t"
for single works, but "=w"
for (which the manual says you should use for double-precision) also gives you s0
with gcc. It works with clang, though. You have to use -mfloat-abi=hard
and a -mcpu=
something with an FPU, e.g. -mcpu=cortex-a15
# clang7.0 -xc -O3 -Wall--target=arm -mcpu=cortex-a15 -mfloat-abi=hard
# valid asm for ARM 32
foo:
vadd.f32 s0, s3, s4
bx lr
dsum:
vadd.f64 d0, d3, d4
bx lr
But gcc fails:
# ARM gcc 8.2 -xc -O3 -fverbose-asm -Wall -mfloat-abi=hard -mcpu=cortex-a15
foo:
fadds s0, s3, s4 @ sum
bx lr @
dsum:
faddd s0, d3, d4 @ sum @@@ INVALID
bx lr @
So you can use =t
for single just fine with gcc, but for double
presumably you need a %something0
modifier to print the register name as d0
instead of s0
, with a "=w"
output.
Obviously these asm statements would only be useful for anything beyond learning the syntax if you add constraints to specify the input operands as well, instead of reading whatever happened to be sitting in s3 and s4.
See also https://stackoverflow.com/tags/inline-assembly/info
@DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like"w" (input1)
.
– Peter Cordes
Dec 30 '18 at 23:52
@DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.
– Peter Cordes
Dec 31 '18 at 5:20
add a comment |
"=r"
is the constraint for GP integer registers.
The GCC manual claims that "=w"
is the constraint for an FP / SIMD register on AArch64. But if you try that, you get v0
not s0
, which won't assemble. I don't know a workaround here, you should probably report on the gcc bugzilla that the constraint documented in the manual doesn't work for scalar FP.
On Godbolt I tried this source:
float foo()
{
float sum;
#ifdef __aarch64__
asm volatile("fadd %0, s3, s4" : "=w"(sum) : :); // AArch64
#else
asm volatile("fadds %0, s3, s4" : "=t"(sum) : :); // ARM32
#endif
return sum;
}
double dsum()
{
double sum;
#ifdef __aarch64__
asm volatile("fadd %0, d3, d4" : "=w"(sum) : :); // AArch64
#else
asm volatile("faddd %0, d3, d4" : "=w"(sum) : :); // ARM32
#endif
return sum;
}
clang7.0 (with its built-in assembler) requires the asm to be actually valid. But for gcc we're only compiling to asm, and Godbolt doesn't have a "binary mode" for non-x86.
# AArch64 gcc 8.2 -xc -O3 -fverbose-asm -Wall
# INVALID ASM, errors if you try to actually assemble it.
foo:
fadd v0, s3, s4 // sum
ret
dsum:
fadd v0, d3, d4 // sum
ret
clang produces the same asm, and its built-in assembler errors with:
<source>:5:18: error: invalid operand for instruction
asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);
^
<inline asm>:1:11: note: instantiated into assembly here
fadd v0, s3, s4
^
On 32-bit ARM, =t"
for single works, but "=w"
for (which the manual says you should use for double-precision) also gives you s0
with gcc. It works with clang, though. You have to use -mfloat-abi=hard
and a -mcpu=
something with an FPU, e.g. -mcpu=cortex-a15
# clang7.0 -xc -O3 -Wall--target=arm -mcpu=cortex-a15 -mfloat-abi=hard
# valid asm for ARM 32
foo:
vadd.f32 s0, s3, s4
bx lr
dsum:
vadd.f64 d0, d3, d4
bx lr
But gcc fails:
# ARM gcc 8.2 -xc -O3 -fverbose-asm -Wall -mfloat-abi=hard -mcpu=cortex-a15
foo:
fadds s0, s3, s4 @ sum
bx lr @
dsum:
faddd s0, d3, d4 @ sum @@@ INVALID
bx lr @
So you can use =t
for single just fine with gcc, but for double
presumably you need a %something0
modifier to print the register name as d0
instead of s0
, with a "=w"
output.
Obviously these asm statements would only be useful for anything beyond learning the syntax if you add constraints to specify the input operands as well, instead of reading whatever happened to be sitting in s3 and s4.
See also https://stackoverflow.com/tags/inline-assembly/info
"=r"
is the constraint for GP integer registers.
The GCC manual claims that "=w"
is the constraint for an FP / SIMD register on AArch64. But if you try that, you get v0
not s0
, which won't assemble. I don't know a workaround here, you should probably report on the gcc bugzilla that the constraint documented in the manual doesn't work for scalar FP.
On Godbolt I tried this source:
float foo()
{
float sum;
#ifdef __aarch64__
asm volatile("fadd %0, s3, s4" : "=w"(sum) : :); // AArch64
#else
asm volatile("fadds %0, s3, s4" : "=t"(sum) : :); // ARM32
#endif
return sum;
}
double dsum()
{
double sum;
#ifdef __aarch64__
asm volatile("fadd %0, d3, d4" : "=w"(sum) : :); // AArch64
#else
asm volatile("faddd %0, d3, d4" : "=w"(sum) : :); // ARM32
#endif
return sum;
}
clang7.0 (with its built-in assembler) requires the asm to be actually valid. But for gcc we're only compiling to asm, and Godbolt doesn't have a "binary mode" for non-x86.
# AArch64 gcc 8.2 -xc -O3 -fverbose-asm -Wall
# INVALID ASM, errors if you try to actually assemble it.
foo:
fadd v0, s3, s4 // sum
ret
dsum:
fadd v0, d3, d4 // sum
ret
clang produces the same asm, and its built-in assembler errors with:
<source>:5:18: error: invalid operand for instruction
asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);
^
<inline asm>:1:11: note: instantiated into assembly here
fadd v0, s3, s4
^
On 32-bit ARM, =t"
for single works, but "=w"
for (which the manual says you should use for double-precision) also gives you s0
with gcc. It works with clang, though. You have to use -mfloat-abi=hard
and a -mcpu=
something with an FPU, e.g. -mcpu=cortex-a15
# clang7.0 -xc -O3 -Wall--target=arm -mcpu=cortex-a15 -mfloat-abi=hard
# valid asm for ARM 32
foo:
vadd.f32 s0, s3, s4
bx lr
dsum:
vadd.f64 d0, d3, d4
bx lr
But gcc fails:
# ARM gcc 8.2 -xc -O3 -fverbose-asm -Wall -mfloat-abi=hard -mcpu=cortex-a15
foo:
fadds s0, s3, s4 @ sum
bx lr @
dsum:
faddd s0, d3, d4 @ sum @@@ INVALID
bx lr @
So you can use =t
for single just fine with gcc, but for double
presumably you need a %something0
modifier to print the register name as d0
instead of s0
, with a "=w"
output.
Obviously these asm statements would only be useful for anything beyond learning the syntax if you add constraints to specify the input operands as well, instead of reading whatever happened to be sitting in s3 and s4.
See also https://stackoverflow.com/tags/inline-assembly/info
edited Dec 31 '18 at 0:25
answered Dec 30 '18 at 5:56
Peter CordesPeter Cordes
121k17184312
121k17184312
@DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like"w" (input1)
.
– Peter Cordes
Dec 30 '18 at 23:52
@DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.
– Peter Cordes
Dec 31 '18 at 5:20
add a comment |
@DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like"w" (input1)
.
– Peter Cordes
Dec 30 '18 at 23:52
@DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.
– Peter Cordes
Dec 31 '18 at 5:20
@DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like
"w" (input1)
.– Peter Cordes
Dec 30 '18 at 23:52
@DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like
"w" (input1)
.– Peter Cordes
Dec 30 '18 at 23:52
@DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.
– Peter Cordes
Dec 31 '18 at 5:20
@DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.
– Peter Cordes
Dec 31 '18 at 5:20
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53960240%2farmv8-floating-point-output-inline-assembly%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown