ARMv8 floating point output inline assembly












1















For adding two integers, I write:



int sum;
asm volatile("add %0, x3, x4" : "=r"(sum) : :);


How can I do this with two floats?
I tried:



float sum;
asm volatile("fadd %0, s3, s4" : "=r"(sum) : :);


But it gives me an error:




Error: operand 1 should be a SIMD vector register -- `fadd x0,s3,s4'




Any ideas?










share|improve this question





























    1















    For adding two integers, I write:



    int sum;
    asm volatile("add %0, x3, x4" : "=r"(sum) : :);


    How can I do this with two floats?
    I tried:



    float sum;
    asm volatile("fadd %0, s3, s4" : "=r"(sum) : :);


    But it gives me an error:




    Error: operand 1 should be a SIMD vector register -- `fadd x0,s3,s4'




    Any ideas?










    share|improve this question



























      1












      1








      1








      For adding two integers, I write:



      int sum;
      asm volatile("add %0, x3, x4" : "=r"(sum) : :);


      How can I do this with two floats?
      I tried:



      float sum;
      asm volatile("fadd %0, s3, s4" : "=r"(sum) : :);


      But it gives me an error:




      Error: operand 1 should be a SIMD vector register -- `fadd x0,s3,s4'




      Any ideas?










      share|improve this question
















      For adding two integers, I write:



      int sum;
      asm volatile("add %0, x3, x4" : "=r"(sum) : :);


      How can I do this with two floats?
      I tried:



      float sum;
      asm volatile("fadd %0, s3, s4" : "=r"(sum) : :);


      But it gives me an error:




      Error: operand 1 should be a SIMD vector register -- `fadd x0,s3,s4'




      Any ideas?







      gcc floating-point arm inline-assembly arm64






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 31 '18 at 0:54









      Peter Cordes

      121k17184312




      121k17184312










      asked Dec 28 '18 at 14:41









      今天春天今天春天

      486418




      486418
























          2 Answers
          2






          active

          oldest

          votes


















          1














          Because registers can have multiple names in AArch64 (v0, b0, h0, s0, d0 all refer to the same register) it is necessary to add an output modifier to the print string:



          On Godbolt



          float foo()
          {
          float sum;
          asm volatile("fadd %s0, s3, s4" : "=w"(sum) : :);
          return sum;
          }

          double dsum()
          {
          double sum;
          asm volatile("fadd %d0, d3, d4" : "=w"(sum) : :);
          return sum;
          }


          Will produce:



          foo:
          fadd s0, s3, s4 // sum
          ret
          dsum:
          fadd d0, d3, d4 // sum
          ret





          share|improve this answer
























          • Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…

            – Peter Cordes
            Jan 4 at 2:49













          • @PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.

            – David Wohlferd
            Jan 4 at 3:24






          • 1





            As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.

            – David Wohlferd
            Jan 4 at 3:26






          • 1





            If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…

            – James Greenhalgh
            Jan 4 at 13:34













          • @PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.

            – David Wohlferd
            Jan 4 at 23:40



















          0














          "=r" is the constraint for GP integer registers.



          The GCC manual claims that "=w" is the constraint for an FP / SIMD register on AArch64. But if you try that, you get v0 not s0, which won't assemble. I don't know a workaround here, you should probably report on the gcc bugzilla that the constraint documented in the manual doesn't work for scalar FP.



          On Godbolt I tried this source:



          float foo()
          {
          float sum;
          #ifdef __aarch64__
          asm volatile("fadd %0, s3, s4" : "=w"(sum) : :); // AArch64
          #else
          asm volatile("fadds %0, s3, s4" : "=t"(sum) : :); // ARM32
          #endif
          return sum;
          }

          double dsum()
          {
          double sum;
          #ifdef __aarch64__
          asm volatile("fadd %0, d3, d4" : "=w"(sum) : :); // AArch64
          #else
          asm volatile("faddd %0, d3, d4" : "=w"(sum) : :); // ARM32
          #endif
          return sum;
          }


          clang7.0 (with its built-in assembler) requires the asm to be actually valid. But for gcc we're only compiling to asm, and Godbolt doesn't have a "binary mode" for non-x86.



          # AArch64 gcc 8.2  -xc -O3 -fverbose-asm -Wall
          # INVALID ASM, errors if you try to actually assemble it.
          foo:
          fadd v0, s3, s4 // sum
          ret
          dsum:
          fadd v0, d3, d4 // sum
          ret


          clang produces the same asm, and its built-in assembler errors with:



          <source>:5:18: error: invalid operand for instruction
          asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);
          ^
          <inline asm>:1:11: note: instantiated into assembly here
          fadd v0, s3, s4
          ^




          On 32-bit ARM, =t" for single works, but "=w" for (which the manual says you should use for double-precision) also gives you s0 with gcc. It works with clang, though. You have to use -mfloat-abi=hard and a -mcpu= something with an FPU, e.g. -mcpu=cortex-a15



          # clang7.0 -xc -O3 -Wall--target=arm -mcpu=cortex-a15 -mfloat-abi=hard
          # valid asm for ARM 32
          foo:
          vadd.f32 s0, s3, s4
          bx lr
          dsum:
          vadd.f64 d0, d3, d4
          bx lr


          But gcc fails:



          # ARM gcc 8.2  -xc -O3 -fverbose-asm -Wall -mfloat-abi=hard -mcpu=cortex-a15
          foo:
          fadds s0, s3, s4 @ sum
          bx lr @
          dsum:
          faddd s0, d3, d4 @ sum @@@ INVALID
          bx lr @


          So you can use =t for single just fine with gcc, but for double presumably you need a %something0 modifier to print the register name as d0 instead of s0, with a "=w" output.





          Obviously these asm statements would only be useful for anything beyond learning the syntax if you add constraints to specify the input operands as well, instead of reading whatever happened to be sitting in s3 and s4.



          See also https://stackoverflow.com/tags/inline-assembly/info






          share|improve this answer


























          • @DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like "w" (input1).

            – Peter Cordes
            Dec 30 '18 at 23:52











          • @DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.

            – Peter Cordes
            Dec 31 '18 at 5:20











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53960240%2farmv8-floating-point-output-inline-assembly%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Because registers can have multiple names in AArch64 (v0, b0, h0, s0, d0 all refer to the same register) it is necessary to add an output modifier to the print string:



          On Godbolt



          float foo()
          {
          float sum;
          asm volatile("fadd %s0, s3, s4" : "=w"(sum) : :);
          return sum;
          }

          double dsum()
          {
          double sum;
          asm volatile("fadd %d0, d3, d4" : "=w"(sum) : :);
          return sum;
          }


          Will produce:



          foo:
          fadd s0, s3, s4 // sum
          ret
          dsum:
          fadd d0, d3, d4 // sum
          ret





          share|improve this answer
























          • Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…

            – Peter Cordes
            Jan 4 at 2:49













          • @PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.

            – David Wohlferd
            Jan 4 at 3:24






          • 1





            As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.

            – David Wohlferd
            Jan 4 at 3:26






          • 1





            If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…

            – James Greenhalgh
            Jan 4 at 13:34













          • @PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.

            – David Wohlferd
            Jan 4 at 23:40
















          1














          Because registers can have multiple names in AArch64 (v0, b0, h0, s0, d0 all refer to the same register) it is necessary to add an output modifier to the print string:



          On Godbolt



          float foo()
          {
          float sum;
          asm volatile("fadd %s0, s3, s4" : "=w"(sum) : :);
          return sum;
          }

          double dsum()
          {
          double sum;
          asm volatile("fadd %d0, d3, d4" : "=w"(sum) : :);
          return sum;
          }


          Will produce:



          foo:
          fadd s0, s3, s4 // sum
          ret
          dsum:
          fadd d0, d3, d4 // sum
          ret





          share|improve this answer
























          • Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…

            – Peter Cordes
            Jan 4 at 2:49













          • @PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.

            – David Wohlferd
            Jan 4 at 3:24






          • 1





            As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.

            – David Wohlferd
            Jan 4 at 3:26






          • 1





            If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…

            – James Greenhalgh
            Jan 4 at 13:34













          • @PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.

            – David Wohlferd
            Jan 4 at 23:40














          1












          1








          1







          Because registers can have multiple names in AArch64 (v0, b0, h0, s0, d0 all refer to the same register) it is necessary to add an output modifier to the print string:



          On Godbolt



          float foo()
          {
          float sum;
          asm volatile("fadd %s0, s3, s4" : "=w"(sum) : :);
          return sum;
          }

          double dsum()
          {
          double sum;
          asm volatile("fadd %d0, d3, d4" : "=w"(sum) : :);
          return sum;
          }


          Will produce:



          foo:
          fadd s0, s3, s4 // sum
          ret
          dsum:
          fadd d0, d3, d4 // sum
          ret





          share|improve this answer













          Because registers can have multiple names in AArch64 (v0, b0, h0, s0, d0 all refer to the same register) it is necessary to add an output modifier to the print string:



          On Godbolt



          float foo()
          {
          float sum;
          asm volatile("fadd %s0, s3, s4" : "=w"(sum) : :);
          return sum;
          }

          double dsum()
          {
          double sum;
          asm volatile("fadd %d0, d3, d4" : "=w"(sum) : :);
          return sum;
          }


          Will produce:



          foo:
          fadd s0, s3, s4 // sum
          ret
          dsum:
          fadd d0, d3, d4 // sum
          ret






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 3 at 23:30









          James GreenhalghJames Greenhalgh

          2,0141012




          2,0141012













          • Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…

            – Peter Cordes
            Jan 4 at 2:49













          • @PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.

            – David Wohlferd
            Jan 4 at 3:24






          • 1





            As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.

            – David Wohlferd
            Jan 4 at 3:26






          • 1





            If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…

            – James Greenhalgh
            Jan 4 at 13:34













          • @PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.

            – David Wohlferd
            Jan 4 at 23:40



















          • Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…

            – Peter Cordes
            Jan 4 at 2:49













          • @PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.

            – David Wohlferd
            Jan 4 at 3:24






          • 1





            As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.

            – David Wohlferd
            Jan 4 at 3:26






          • 1





            If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…

            – James Greenhalgh
            Jan 4 at 13:34













          • @PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.

            – David Wohlferd
            Jan 4 at 23:40

















          Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…

          – Peter Cordes
          Jan 4 at 2:49







          Nice, I figured there was probably a modifier while writing my answer, but I didn't see it in the GCC manual. Is this documented anywhere? I only know of the modifiers for x86 registers being in the gcc manual, at the bottom of the Extended asm section: gcc.gnu.org/onlinedocs/gcc/…

          – Peter Cordes
          Jan 4 at 2:49















          @PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.

          – David Wohlferd
          Jan 4 at 3:24





          @PeterCordes The problem is that the gcc people are reluctant to document these things. If they're documented, then they're not allowed to change them (which they probably don't do much anyway). One could argue that there are already enough people using them that changing them would cause wide-spread consternation, but that has not yet proven to be a good enough argument to overcome the inertia (but feel free to try!). x86 got doc'ed cuz I was changing the asm docs and added them, and no one felt strongly enough about it to argue with me. I only did x86 cuz that's what I know. Sorry.

          – David Wohlferd
          Jan 4 at 3:24




          1




          1





          As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.

          – David Wohlferd
          Jan 4 at 3:26





          As an aside to James and OP: Be aware that since these modifiers AREN'T doc'ed, using them is unsupported. As with any undocumented feature, gcc can change them at any time. They probably won't, but they can.

          – David Wohlferd
          Jan 4 at 3:26




          1




          1





          If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…

          – James Greenhalgh
          Jan 4 at 13:34







          If you wanted to document the subset of modifiers that we shouldn’t change, I’ll approve the patch on list. Some (these, %w0 for printing the w rather than x name) should just be documented and fixed. Especially where behavior is consistent between GCC and Clang. To my great shame, the best documentation we have is at github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/…

          – James Greenhalgh
          Jan 4 at 13:34















          @PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.

          – David Wohlferd
          Jan 4 at 23:40





          @PeterCordes - Is there any way you can take James up on his offer here? I'd love to help, but I know almost zero about arm, so selecting the subset is beyond what I can offer. I could help with the texinfo, but that's probably the least challenging part of this.

          – David Wohlferd
          Jan 4 at 23:40













          0














          "=r" is the constraint for GP integer registers.



          The GCC manual claims that "=w" is the constraint for an FP / SIMD register on AArch64. But if you try that, you get v0 not s0, which won't assemble. I don't know a workaround here, you should probably report on the gcc bugzilla that the constraint documented in the manual doesn't work for scalar FP.



          On Godbolt I tried this source:



          float foo()
          {
          float sum;
          #ifdef __aarch64__
          asm volatile("fadd %0, s3, s4" : "=w"(sum) : :); // AArch64
          #else
          asm volatile("fadds %0, s3, s4" : "=t"(sum) : :); // ARM32
          #endif
          return sum;
          }

          double dsum()
          {
          double sum;
          #ifdef __aarch64__
          asm volatile("fadd %0, d3, d4" : "=w"(sum) : :); // AArch64
          #else
          asm volatile("faddd %0, d3, d4" : "=w"(sum) : :); // ARM32
          #endif
          return sum;
          }


          clang7.0 (with its built-in assembler) requires the asm to be actually valid. But for gcc we're only compiling to asm, and Godbolt doesn't have a "binary mode" for non-x86.



          # AArch64 gcc 8.2  -xc -O3 -fverbose-asm -Wall
          # INVALID ASM, errors if you try to actually assemble it.
          foo:
          fadd v0, s3, s4 // sum
          ret
          dsum:
          fadd v0, d3, d4 // sum
          ret


          clang produces the same asm, and its built-in assembler errors with:



          <source>:5:18: error: invalid operand for instruction
          asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);
          ^
          <inline asm>:1:11: note: instantiated into assembly here
          fadd v0, s3, s4
          ^




          On 32-bit ARM, =t" for single works, but "=w" for (which the manual says you should use for double-precision) also gives you s0 with gcc. It works with clang, though. You have to use -mfloat-abi=hard and a -mcpu= something with an FPU, e.g. -mcpu=cortex-a15



          # clang7.0 -xc -O3 -Wall--target=arm -mcpu=cortex-a15 -mfloat-abi=hard
          # valid asm for ARM 32
          foo:
          vadd.f32 s0, s3, s4
          bx lr
          dsum:
          vadd.f64 d0, d3, d4
          bx lr


          But gcc fails:



          # ARM gcc 8.2  -xc -O3 -fverbose-asm -Wall -mfloat-abi=hard -mcpu=cortex-a15
          foo:
          fadds s0, s3, s4 @ sum
          bx lr @
          dsum:
          faddd s0, d3, d4 @ sum @@@ INVALID
          bx lr @


          So you can use =t for single just fine with gcc, but for double presumably you need a %something0 modifier to print the register name as d0 instead of s0, with a "=w" output.





          Obviously these asm statements would only be useful for anything beyond learning the syntax if you add constraints to specify the input operands as well, instead of reading whatever happened to be sitting in s3 and s4.



          See also https://stackoverflow.com/tags/inline-assembly/info






          share|improve this answer


























          • @DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like "w" (input1).

            – Peter Cordes
            Dec 30 '18 at 23:52











          • @DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.

            – Peter Cordes
            Dec 31 '18 at 5:20
















          0














          "=r" is the constraint for GP integer registers.



          The GCC manual claims that "=w" is the constraint for an FP / SIMD register on AArch64. But if you try that, you get v0 not s0, which won't assemble. I don't know a workaround here, you should probably report on the gcc bugzilla that the constraint documented in the manual doesn't work for scalar FP.



          On Godbolt I tried this source:



          float foo()
          {
          float sum;
          #ifdef __aarch64__
          asm volatile("fadd %0, s3, s4" : "=w"(sum) : :); // AArch64
          #else
          asm volatile("fadds %0, s3, s4" : "=t"(sum) : :); // ARM32
          #endif
          return sum;
          }

          double dsum()
          {
          double sum;
          #ifdef __aarch64__
          asm volatile("fadd %0, d3, d4" : "=w"(sum) : :); // AArch64
          #else
          asm volatile("faddd %0, d3, d4" : "=w"(sum) : :); // ARM32
          #endif
          return sum;
          }


          clang7.0 (with its built-in assembler) requires the asm to be actually valid. But for gcc we're only compiling to asm, and Godbolt doesn't have a "binary mode" for non-x86.



          # AArch64 gcc 8.2  -xc -O3 -fverbose-asm -Wall
          # INVALID ASM, errors if you try to actually assemble it.
          foo:
          fadd v0, s3, s4 // sum
          ret
          dsum:
          fadd v0, d3, d4 // sum
          ret


          clang produces the same asm, and its built-in assembler errors with:



          <source>:5:18: error: invalid operand for instruction
          asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);
          ^
          <inline asm>:1:11: note: instantiated into assembly here
          fadd v0, s3, s4
          ^




          On 32-bit ARM, =t" for single works, but "=w" for (which the manual says you should use for double-precision) also gives you s0 with gcc. It works with clang, though. You have to use -mfloat-abi=hard and a -mcpu= something with an FPU, e.g. -mcpu=cortex-a15



          # clang7.0 -xc -O3 -Wall--target=arm -mcpu=cortex-a15 -mfloat-abi=hard
          # valid asm for ARM 32
          foo:
          vadd.f32 s0, s3, s4
          bx lr
          dsum:
          vadd.f64 d0, d3, d4
          bx lr


          But gcc fails:



          # ARM gcc 8.2  -xc -O3 -fverbose-asm -Wall -mfloat-abi=hard -mcpu=cortex-a15
          foo:
          fadds s0, s3, s4 @ sum
          bx lr @
          dsum:
          faddd s0, d3, d4 @ sum @@@ INVALID
          bx lr @


          So you can use =t for single just fine with gcc, but for double presumably you need a %something0 modifier to print the register name as d0 instead of s0, with a "=w" output.





          Obviously these asm statements would only be useful for anything beyond learning the syntax if you add constraints to specify the input operands as well, instead of reading whatever happened to be sitting in s3 and s4.



          See also https://stackoverflow.com/tags/inline-assembly/info






          share|improve this answer


























          • @DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like "w" (input1).

            – Peter Cordes
            Dec 30 '18 at 23:52











          • @DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.

            – Peter Cordes
            Dec 31 '18 at 5:20














          0












          0








          0







          "=r" is the constraint for GP integer registers.



          The GCC manual claims that "=w" is the constraint for an FP / SIMD register on AArch64. But if you try that, you get v0 not s0, which won't assemble. I don't know a workaround here, you should probably report on the gcc bugzilla that the constraint documented in the manual doesn't work for scalar FP.



          On Godbolt I tried this source:



          float foo()
          {
          float sum;
          #ifdef __aarch64__
          asm volatile("fadd %0, s3, s4" : "=w"(sum) : :); // AArch64
          #else
          asm volatile("fadds %0, s3, s4" : "=t"(sum) : :); // ARM32
          #endif
          return sum;
          }

          double dsum()
          {
          double sum;
          #ifdef __aarch64__
          asm volatile("fadd %0, d3, d4" : "=w"(sum) : :); // AArch64
          #else
          asm volatile("faddd %0, d3, d4" : "=w"(sum) : :); // ARM32
          #endif
          return sum;
          }


          clang7.0 (with its built-in assembler) requires the asm to be actually valid. But for gcc we're only compiling to asm, and Godbolt doesn't have a "binary mode" for non-x86.



          # AArch64 gcc 8.2  -xc -O3 -fverbose-asm -Wall
          # INVALID ASM, errors if you try to actually assemble it.
          foo:
          fadd v0, s3, s4 // sum
          ret
          dsum:
          fadd v0, d3, d4 // sum
          ret


          clang produces the same asm, and its built-in assembler errors with:



          <source>:5:18: error: invalid operand for instruction
          asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);
          ^
          <inline asm>:1:11: note: instantiated into assembly here
          fadd v0, s3, s4
          ^




          On 32-bit ARM, =t" for single works, but "=w" for (which the manual says you should use for double-precision) also gives you s0 with gcc. It works with clang, though. You have to use -mfloat-abi=hard and a -mcpu= something with an FPU, e.g. -mcpu=cortex-a15



          # clang7.0 -xc -O3 -Wall--target=arm -mcpu=cortex-a15 -mfloat-abi=hard
          # valid asm for ARM 32
          foo:
          vadd.f32 s0, s3, s4
          bx lr
          dsum:
          vadd.f64 d0, d3, d4
          bx lr


          But gcc fails:



          # ARM gcc 8.2  -xc -O3 -fverbose-asm -Wall -mfloat-abi=hard -mcpu=cortex-a15
          foo:
          fadds s0, s3, s4 @ sum
          bx lr @
          dsum:
          faddd s0, d3, d4 @ sum @@@ INVALID
          bx lr @


          So you can use =t for single just fine with gcc, but for double presumably you need a %something0 modifier to print the register name as d0 instead of s0, with a "=w" output.





          Obviously these asm statements would only be useful for anything beyond learning the syntax if you add constraints to specify the input operands as well, instead of reading whatever happened to be sitting in s3 and s4.



          See also https://stackoverflow.com/tags/inline-assembly/info






          share|improve this answer















          "=r" is the constraint for GP integer registers.



          The GCC manual claims that "=w" is the constraint for an FP / SIMD register on AArch64. But if you try that, you get v0 not s0, which won't assemble. I don't know a workaround here, you should probably report on the gcc bugzilla that the constraint documented in the manual doesn't work for scalar FP.



          On Godbolt I tried this source:



          float foo()
          {
          float sum;
          #ifdef __aarch64__
          asm volatile("fadd %0, s3, s4" : "=w"(sum) : :); // AArch64
          #else
          asm volatile("fadds %0, s3, s4" : "=t"(sum) : :); // ARM32
          #endif
          return sum;
          }

          double dsum()
          {
          double sum;
          #ifdef __aarch64__
          asm volatile("fadd %0, d3, d4" : "=w"(sum) : :); // AArch64
          #else
          asm volatile("faddd %0, d3, d4" : "=w"(sum) : :); // ARM32
          #endif
          return sum;
          }


          clang7.0 (with its built-in assembler) requires the asm to be actually valid. But for gcc we're only compiling to asm, and Godbolt doesn't have a "binary mode" for non-x86.



          # AArch64 gcc 8.2  -xc -O3 -fverbose-asm -Wall
          # INVALID ASM, errors if you try to actually assemble it.
          foo:
          fadd v0, s3, s4 // sum
          ret
          dsum:
          fadd v0, d3, d4 // sum
          ret


          clang produces the same asm, and its built-in assembler errors with:



          <source>:5:18: error: invalid operand for instruction
          asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);
          ^
          <inline asm>:1:11: note: instantiated into assembly here
          fadd v0, s3, s4
          ^




          On 32-bit ARM, =t" for single works, but "=w" for (which the manual says you should use for double-precision) also gives you s0 with gcc. It works with clang, though. You have to use -mfloat-abi=hard and a -mcpu= something with an FPU, e.g. -mcpu=cortex-a15



          # clang7.0 -xc -O3 -Wall--target=arm -mcpu=cortex-a15 -mfloat-abi=hard
          # valid asm for ARM 32
          foo:
          vadd.f32 s0, s3, s4
          bx lr
          dsum:
          vadd.f64 d0, d3, d4
          bx lr


          But gcc fails:



          # ARM gcc 8.2  -xc -O3 -fverbose-asm -Wall -mfloat-abi=hard -mcpu=cortex-a15
          foo:
          fadds s0, s3, s4 @ sum
          bx lr @
          dsum:
          faddd s0, d3, d4 @ sum @@@ INVALID
          bx lr @


          So you can use =t for single just fine with gcc, but for double presumably you need a %something0 modifier to print the register name as d0 instead of s0, with a "=w" output.





          Obviously these asm statements would only be useful for anything beyond learning the syntax if you add constraints to specify the input operands as well, instead of reading whatever happened to be sitting in s3 and s4.



          See also https://stackoverflow.com/tags/inline-assembly/info







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Dec 31 '18 at 0:25

























          answered Dec 30 '18 at 5:56









          Peter CordesPeter Cordes

          121k17184312




          121k17184312













          • @DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like "w" (input1).

            – Peter Cordes
            Dec 30 '18 at 23:52











          • @DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.

            – Peter Cordes
            Dec 31 '18 at 5:20



















          • @DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like "w" (input1).

            – Peter Cordes
            Dec 30 '18 at 23:52











          • @DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.

            – Peter Cordes
            Dec 31 '18 at 5:20

















          @DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like "w" (input1).

          – Peter Cordes
          Dec 30 '18 at 23:52





          @DavidWohlferd: I meant that without input constraints, this toy inline asm is only useful for learning the syntax, not doing anything useful. To go beyond that, you need input constraints like "w" (input1).

          – Peter Cordes
          Dec 30 '18 at 23:52













          @DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.

          – Peter Cordes
          Dec 31 '18 at 5:20





          @DavidWohlferd: I think a valid parsing of my sentence is that change is required for the statements to be useful for anything other than learning/testing the syntax. In my last edit I changed the wording of the rest of the sentence to try to make that clearer, but feel free to edit if you're still convinced it's confusing. Probably you aren't the only one that parsed it differently from how I intended.

          – Peter Cordes
          Dec 31 '18 at 5:20


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53960240%2farmv8-floating-point-output-inline-assembly%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Monofisismo

          Angular Downloading a file using contenturl with Basic Authentication

          Olmecas