Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?
In the x86-64 Tour of Intel Manuals, I read
Perhaps the most surprising fact is that an instruction such as
MOV EAX, EBX
automatically zeroes upper 32 bits ofRAX
register.
The Intel documentation (3.4.1.1 General-Purpose Registers in 64-Bit Mode in manual Basic Architecture) quoted at the same source tells us:
- 64-bit operands generate a 64-bit result in the destination general-purpose register.
- 32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.
- 8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the destination general-purpose register are not be modified by the operation. If the result of an 8-bit or 16-bit operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits.
In x86-32 and x86-64 assembly, 16 bit instructions such as
mov ax, bx
don't show this kind of "strange" behaviour that the upper word of eax is zeroed.
Thus: what is the reason why this behaviour was introduced? At a first glance it seems illogical (but the reason might be that I am used to the quirks of x86-32 assembly).
assembly x86 x86-64 cpu-registers zero-extension
add a comment |
In the x86-64 Tour of Intel Manuals, I read
Perhaps the most surprising fact is that an instruction such as
MOV EAX, EBX
automatically zeroes upper 32 bits ofRAX
register.
The Intel documentation (3.4.1.1 General-Purpose Registers in 64-Bit Mode in manual Basic Architecture) quoted at the same source tells us:
- 64-bit operands generate a 64-bit result in the destination general-purpose register.
- 32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.
- 8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the destination general-purpose register are not be modified by the operation. If the result of an 8-bit or 16-bit operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits.
In x86-32 and x86-64 assembly, 16 bit instructions such as
mov ax, bx
don't show this kind of "strange" behaviour that the upper word of eax is zeroed.
Thus: what is the reason why this behaviour was introduced? At a first glance it seems illogical (but the reason might be that I am used to the quirks of x86-32 assembly).
assembly x86 x86-64 cpu-registers zero-extension
16
If you Google for "Partial register stall", you'll find quite a bit of information about the problem they were (almost certainly) trying to avoid.
– Jerry Coffin
Jun 24 '12 at 14:38
3
stackoverflow.com/questions/25455447/…
– Hans Passant
Aug 27 '15 at 7:16
2
Not just "most". AFAIK, all instructions with anr32
destination operand zero the high 32, rather than merging. For example, some assemblers will replacepmovmskb r64, xmm
withpmovmskb r32, xmm
, saving a REX, because the 64bit destination version behaves identically. Even though the Operation section of the manual lists all 6 combinations of 32/64bit dest and 64/128/256b source separately, the implicit zero-extension of the r32 form duplicates the explicit zero-extension of the r64 form. I'm curious about the HW implementation...
– Peter Cordes
May 26 '16 at 23:38
2
@HansPassant, the circular reference begins.
– kchoi
Jul 15 '16 at 23:26
1
Related:xor eax,eax
orxor r8d,r8d
is the best way to zero RAX or R8 (saving a REX prefix for RAX, and 64-bit XOR isn't even handled specially on Silvermont). Related: How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent
– Peter Cordes
Nov 21 '17 at 23:44
add a comment |
In the x86-64 Tour of Intel Manuals, I read
Perhaps the most surprising fact is that an instruction such as
MOV EAX, EBX
automatically zeroes upper 32 bits ofRAX
register.
The Intel documentation (3.4.1.1 General-Purpose Registers in 64-Bit Mode in manual Basic Architecture) quoted at the same source tells us:
- 64-bit operands generate a 64-bit result in the destination general-purpose register.
- 32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.
- 8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the destination general-purpose register are not be modified by the operation. If the result of an 8-bit or 16-bit operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits.
In x86-32 and x86-64 assembly, 16 bit instructions such as
mov ax, bx
don't show this kind of "strange" behaviour that the upper word of eax is zeroed.
Thus: what is the reason why this behaviour was introduced? At a first glance it seems illogical (but the reason might be that I am used to the quirks of x86-32 assembly).
assembly x86 x86-64 cpu-registers zero-extension
In the x86-64 Tour of Intel Manuals, I read
Perhaps the most surprising fact is that an instruction such as
MOV EAX, EBX
automatically zeroes upper 32 bits ofRAX
register.
The Intel documentation (3.4.1.1 General-Purpose Registers in 64-Bit Mode in manual Basic Architecture) quoted at the same source tells us:
- 64-bit operands generate a 64-bit result in the destination general-purpose register.
- 32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.
- 8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the destination general-purpose register are not be modified by the operation. If the result of an 8-bit or 16-bit operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits.
In x86-32 and x86-64 assembly, 16 bit instructions such as
mov ax, bx
don't show this kind of "strange" behaviour that the upper word of eax is zeroed.
Thus: what is the reason why this behaviour was introduced? At a first glance it seems illogical (but the reason might be that I am used to the quirks of x86-32 assembly).
assembly x86 x86-64 cpu-registers zero-extension
assembly x86 x86-64 cpu-registers zero-extension
edited Aug 1 '18 at 16:49
Nubok
asked Jun 24 '12 at 11:40
NubokNubok
1,62142038
1,62142038
16
If you Google for "Partial register stall", you'll find quite a bit of information about the problem they were (almost certainly) trying to avoid.
– Jerry Coffin
Jun 24 '12 at 14:38
3
stackoverflow.com/questions/25455447/…
– Hans Passant
Aug 27 '15 at 7:16
2
Not just "most". AFAIK, all instructions with anr32
destination operand zero the high 32, rather than merging. For example, some assemblers will replacepmovmskb r64, xmm
withpmovmskb r32, xmm
, saving a REX, because the 64bit destination version behaves identically. Even though the Operation section of the manual lists all 6 combinations of 32/64bit dest and 64/128/256b source separately, the implicit zero-extension of the r32 form duplicates the explicit zero-extension of the r64 form. I'm curious about the HW implementation...
– Peter Cordes
May 26 '16 at 23:38
2
@HansPassant, the circular reference begins.
– kchoi
Jul 15 '16 at 23:26
1
Related:xor eax,eax
orxor r8d,r8d
is the best way to zero RAX or R8 (saving a REX prefix for RAX, and 64-bit XOR isn't even handled specially on Silvermont). Related: How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent
– Peter Cordes
Nov 21 '17 at 23:44
add a comment |
16
If you Google for "Partial register stall", you'll find quite a bit of information about the problem they were (almost certainly) trying to avoid.
– Jerry Coffin
Jun 24 '12 at 14:38
3
stackoverflow.com/questions/25455447/…
– Hans Passant
Aug 27 '15 at 7:16
2
Not just "most". AFAIK, all instructions with anr32
destination operand zero the high 32, rather than merging. For example, some assemblers will replacepmovmskb r64, xmm
withpmovmskb r32, xmm
, saving a REX, because the 64bit destination version behaves identically. Even though the Operation section of the manual lists all 6 combinations of 32/64bit dest and 64/128/256b source separately, the implicit zero-extension of the r32 form duplicates the explicit zero-extension of the r64 form. I'm curious about the HW implementation...
– Peter Cordes
May 26 '16 at 23:38
2
@HansPassant, the circular reference begins.
– kchoi
Jul 15 '16 at 23:26
1
Related:xor eax,eax
orxor r8d,r8d
is the best way to zero RAX or R8 (saving a REX prefix for RAX, and 64-bit XOR isn't even handled specially on Silvermont). Related: How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent
– Peter Cordes
Nov 21 '17 at 23:44
16
16
If you Google for "Partial register stall", you'll find quite a bit of information about the problem they were (almost certainly) trying to avoid.
– Jerry Coffin
Jun 24 '12 at 14:38
If you Google for "Partial register stall", you'll find quite a bit of information about the problem they were (almost certainly) trying to avoid.
– Jerry Coffin
Jun 24 '12 at 14:38
3
3
stackoverflow.com/questions/25455447/…
– Hans Passant
Aug 27 '15 at 7:16
stackoverflow.com/questions/25455447/…
– Hans Passant
Aug 27 '15 at 7:16
2
2
Not just "most". AFAIK, all instructions with an
r32
destination operand zero the high 32, rather than merging. For example, some assemblers will replace pmovmskb r64, xmm
with pmovmskb r32, xmm
, saving a REX, because the 64bit destination version behaves identically. Even though the Operation section of the manual lists all 6 combinations of 32/64bit dest and 64/128/256b source separately, the implicit zero-extension of the r32 form duplicates the explicit zero-extension of the r64 form. I'm curious about the HW implementation...– Peter Cordes
May 26 '16 at 23:38
Not just "most". AFAIK, all instructions with an
r32
destination operand zero the high 32, rather than merging. For example, some assemblers will replace pmovmskb r64, xmm
with pmovmskb r32, xmm
, saving a REX, because the 64bit destination version behaves identically. Even though the Operation section of the manual lists all 6 combinations of 32/64bit dest and 64/128/256b source separately, the implicit zero-extension of the r32 form duplicates the explicit zero-extension of the r64 form. I'm curious about the HW implementation...– Peter Cordes
May 26 '16 at 23:38
2
2
@HansPassant, the circular reference begins.
– kchoi
Jul 15 '16 at 23:26
@HansPassant, the circular reference begins.
– kchoi
Jul 15 '16 at 23:26
1
1
Related:
xor eax,eax
or xor r8d,r8d
is the best way to zero RAX or R8 (saving a REX prefix for RAX, and 64-bit XOR isn't even handled specially on Silvermont). Related: How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent– Peter Cordes
Nov 21 '17 at 23:44
Related:
xor eax,eax
or xor r8d,r8d
is the best way to zero RAX or R8 (saving a REX prefix for RAX, and 64-bit XOR isn't even handled specially on Silvermont). Related: How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent– Peter Cordes
Nov 21 '17 at 23:44
add a comment |
2 Answers
2
active
oldest
votes
I'm not AMD or speaking for them, but I would have done it the same way. Because zeroing the high half doesn't create a dependency on the previous value, that the cpu would have to wait on. The register renaming mechanism would essentially be defeated if it wasn't done that way. This way you can write fast 32bit code in 64bit mode without having to explicitly break dependencies all the time. Without this behaviour, every single 32bit instruction in 64bit mode would have to wait on something that happened before, even though that high part would almost never be used.
The behaviour for 16bit instructions is the strange one. The dependency madness is one of the reasons that 16bit instructions are avoided now.
7
I don't think it's strange, I think they didn't want to break too much and kept the old behavior there.
– Alexey Frunze
Jun 24 '12 at 11:56
4
@Alex when they introduced 32bit mode, there was no old behaviour for the high part. There was no high part before.. Of course after that it couldn't be changed anymore.
– harold
Jun 24 '12 at 11:59
1
I was speaking about 16-bit operands, why the top bits don't get zeroed in that case. They don't in non-64-bit modes. And that's kept in 64-bit mode too.
– Alexey Frunze
Jun 24 '12 at 12:04
3
I interpreted your "The behaviour for 16bit instructions is the strange one" as "it's strange that zero-extension doesn't happen with 16-bit operands in 64-bit mode". Hence my comments about keeping it the same way in 64-bit mode for better compatibility.
– Alexey Frunze
Jun 24 '12 at 12:09
8
@Alex oh I see. Ok. I don't think it's strange from that perspective. Just from a "looking back, maybe it wasn't such a good idea"-perspective. Guess I should have been clearer :)
– harold
Jun 24 '12 at 12:12
|
show 4 more comments
It simply saves space in the instructions, and the instruction set. You can move small immediate values to a 64-bit register by using existing (32-bit) instructions.
It also saves you from having to encode 8 byte values for MOV RAX, 42
, when MOV EAX, 42
can be reused.
This optimization is not as important for 8 and 16 bit ops (because they are smaller), and changing the rules there would also break old code.
6
If that's correct, wouldn't it have made more sense for it to sign-extend rather than 0 extend?
– Damien_The_Unbeliever
Jun 24 '12 at 11:54
2
@Alex: And sign-extension isn't? Both can be done very cheaply in hardware.
– jalf
Jun 24 '12 at 11:59
2
@Alex: no it's not. It would be a bit slower if done in software, sure, but in hardware, it'd, at worst, cost a few more transistors, which, on a chip the size and complexity of a modern CPU, that's really not an issue.
– jalf
Jun 24 '12 at 14:03
14
Sign extension is slower, even in hardware. Zero extension can be done in parallel with whatever computation produces the lower half, but sign extension can't be done until (at least the sign of) the lower half has been computed.
– Jerry Coffin
Jun 24 '12 at 14:26
10
Another related trick is to useXOR EAX, EAX
becauseXOR RAX, RAX
would need an REX prefix.
– Neil
Oct 2 '13 at 9:12
|
show 7 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f11177137%2fwhy-do-x86-64-instructions-on-32-bit-registers-zero-the-upper-part-of-the-full-6%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm not AMD or speaking for them, but I would have done it the same way. Because zeroing the high half doesn't create a dependency on the previous value, that the cpu would have to wait on. The register renaming mechanism would essentially be defeated if it wasn't done that way. This way you can write fast 32bit code in 64bit mode without having to explicitly break dependencies all the time. Without this behaviour, every single 32bit instruction in 64bit mode would have to wait on something that happened before, even though that high part would almost never be used.
The behaviour for 16bit instructions is the strange one. The dependency madness is one of the reasons that 16bit instructions are avoided now.
7
I don't think it's strange, I think they didn't want to break too much and kept the old behavior there.
– Alexey Frunze
Jun 24 '12 at 11:56
4
@Alex when they introduced 32bit mode, there was no old behaviour for the high part. There was no high part before.. Of course after that it couldn't be changed anymore.
– harold
Jun 24 '12 at 11:59
1
I was speaking about 16-bit operands, why the top bits don't get zeroed in that case. They don't in non-64-bit modes. And that's kept in 64-bit mode too.
– Alexey Frunze
Jun 24 '12 at 12:04
3
I interpreted your "The behaviour for 16bit instructions is the strange one" as "it's strange that zero-extension doesn't happen with 16-bit operands in 64-bit mode". Hence my comments about keeping it the same way in 64-bit mode for better compatibility.
– Alexey Frunze
Jun 24 '12 at 12:09
8
@Alex oh I see. Ok. I don't think it's strange from that perspective. Just from a "looking back, maybe it wasn't such a good idea"-perspective. Guess I should have been clearer :)
– harold
Jun 24 '12 at 12:12
|
show 4 more comments
I'm not AMD or speaking for them, but I would have done it the same way. Because zeroing the high half doesn't create a dependency on the previous value, that the cpu would have to wait on. The register renaming mechanism would essentially be defeated if it wasn't done that way. This way you can write fast 32bit code in 64bit mode without having to explicitly break dependencies all the time. Without this behaviour, every single 32bit instruction in 64bit mode would have to wait on something that happened before, even though that high part would almost never be used.
The behaviour for 16bit instructions is the strange one. The dependency madness is one of the reasons that 16bit instructions are avoided now.
7
I don't think it's strange, I think they didn't want to break too much and kept the old behavior there.
– Alexey Frunze
Jun 24 '12 at 11:56
4
@Alex when they introduced 32bit mode, there was no old behaviour for the high part. There was no high part before.. Of course after that it couldn't be changed anymore.
– harold
Jun 24 '12 at 11:59
1
I was speaking about 16-bit operands, why the top bits don't get zeroed in that case. They don't in non-64-bit modes. And that's kept in 64-bit mode too.
– Alexey Frunze
Jun 24 '12 at 12:04
3
I interpreted your "The behaviour for 16bit instructions is the strange one" as "it's strange that zero-extension doesn't happen with 16-bit operands in 64-bit mode". Hence my comments about keeping it the same way in 64-bit mode for better compatibility.
– Alexey Frunze
Jun 24 '12 at 12:09
8
@Alex oh I see. Ok. I don't think it's strange from that perspective. Just from a "looking back, maybe it wasn't such a good idea"-perspective. Guess I should have been clearer :)
– harold
Jun 24 '12 at 12:12
|
show 4 more comments
I'm not AMD or speaking for them, but I would have done it the same way. Because zeroing the high half doesn't create a dependency on the previous value, that the cpu would have to wait on. The register renaming mechanism would essentially be defeated if it wasn't done that way. This way you can write fast 32bit code in 64bit mode without having to explicitly break dependencies all the time. Without this behaviour, every single 32bit instruction in 64bit mode would have to wait on something that happened before, even though that high part would almost never be used.
The behaviour for 16bit instructions is the strange one. The dependency madness is one of the reasons that 16bit instructions are avoided now.
I'm not AMD or speaking for them, but I would have done it the same way. Because zeroing the high half doesn't create a dependency on the previous value, that the cpu would have to wait on. The register renaming mechanism would essentially be defeated if it wasn't done that way. This way you can write fast 32bit code in 64bit mode without having to explicitly break dependencies all the time. Without this behaviour, every single 32bit instruction in 64bit mode would have to wait on something that happened before, even though that high part would almost never be used.
The behaviour for 16bit instructions is the strange one. The dependency madness is one of the reasons that 16bit instructions are avoided now.
edited Jun 24 '12 at 12:03
answered Jun 24 '12 at 11:53
haroldharold
41.7k357109
41.7k357109
7
I don't think it's strange, I think they didn't want to break too much and kept the old behavior there.
– Alexey Frunze
Jun 24 '12 at 11:56
4
@Alex when they introduced 32bit mode, there was no old behaviour for the high part. There was no high part before.. Of course after that it couldn't be changed anymore.
– harold
Jun 24 '12 at 11:59
1
I was speaking about 16-bit operands, why the top bits don't get zeroed in that case. They don't in non-64-bit modes. And that's kept in 64-bit mode too.
– Alexey Frunze
Jun 24 '12 at 12:04
3
I interpreted your "The behaviour for 16bit instructions is the strange one" as "it's strange that zero-extension doesn't happen with 16-bit operands in 64-bit mode". Hence my comments about keeping it the same way in 64-bit mode for better compatibility.
– Alexey Frunze
Jun 24 '12 at 12:09
8
@Alex oh I see. Ok. I don't think it's strange from that perspective. Just from a "looking back, maybe it wasn't such a good idea"-perspective. Guess I should have been clearer :)
– harold
Jun 24 '12 at 12:12
|
show 4 more comments
7
I don't think it's strange, I think they didn't want to break too much and kept the old behavior there.
– Alexey Frunze
Jun 24 '12 at 11:56
4
@Alex when they introduced 32bit mode, there was no old behaviour for the high part. There was no high part before.. Of course after that it couldn't be changed anymore.
– harold
Jun 24 '12 at 11:59
1
I was speaking about 16-bit operands, why the top bits don't get zeroed in that case. They don't in non-64-bit modes. And that's kept in 64-bit mode too.
– Alexey Frunze
Jun 24 '12 at 12:04
3
I interpreted your "The behaviour for 16bit instructions is the strange one" as "it's strange that zero-extension doesn't happen with 16-bit operands in 64-bit mode". Hence my comments about keeping it the same way in 64-bit mode for better compatibility.
– Alexey Frunze
Jun 24 '12 at 12:09
8
@Alex oh I see. Ok. I don't think it's strange from that perspective. Just from a "looking back, maybe it wasn't such a good idea"-perspective. Guess I should have been clearer :)
– harold
Jun 24 '12 at 12:12
7
7
I don't think it's strange, I think they didn't want to break too much and kept the old behavior there.
– Alexey Frunze
Jun 24 '12 at 11:56
I don't think it's strange, I think they didn't want to break too much and kept the old behavior there.
– Alexey Frunze
Jun 24 '12 at 11:56
4
4
@Alex when they introduced 32bit mode, there was no old behaviour for the high part. There was no high part before.. Of course after that it couldn't be changed anymore.
– harold
Jun 24 '12 at 11:59
@Alex when they introduced 32bit mode, there was no old behaviour for the high part. There was no high part before.. Of course after that it couldn't be changed anymore.
– harold
Jun 24 '12 at 11:59
1
1
I was speaking about 16-bit operands, why the top bits don't get zeroed in that case. They don't in non-64-bit modes. And that's kept in 64-bit mode too.
– Alexey Frunze
Jun 24 '12 at 12:04
I was speaking about 16-bit operands, why the top bits don't get zeroed in that case. They don't in non-64-bit modes. And that's kept in 64-bit mode too.
– Alexey Frunze
Jun 24 '12 at 12:04
3
3
I interpreted your "The behaviour for 16bit instructions is the strange one" as "it's strange that zero-extension doesn't happen with 16-bit operands in 64-bit mode". Hence my comments about keeping it the same way in 64-bit mode for better compatibility.
– Alexey Frunze
Jun 24 '12 at 12:09
I interpreted your "The behaviour for 16bit instructions is the strange one" as "it's strange that zero-extension doesn't happen with 16-bit operands in 64-bit mode". Hence my comments about keeping it the same way in 64-bit mode for better compatibility.
– Alexey Frunze
Jun 24 '12 at 12:09
8
8
@Alex oh I see. Ok. I don't think it's strange from that perspective. Just from a "looking back, maybe it wasn't such a good idea"-perspective. Guess I should have been clearer :)
– harold
Jun 24 '12 at 12:12
@Alex oh I see. Ok. I don't think it's strange from that perspective. Just from a "looking back, maybe it wasn't such a good idea"-perspective. Guess I should have been clearer :)
– harold
Jun 24 '12 at 12:12
|
show 4 more comments
It simply saves space in the instructions, and the instruction set. You can move small immediate values to a 64-bit register by using existing (32-bit) instructions.
It also saves you from having to encode 8 byte values for MOV RAX, 42
, when MOV EAX, 42
can be reused.
This optimization is not as important for 8 and 16 bit ops (because they are smaller), and changing the rules there would also break old code.
6
If that's correct, wouldn't it have made more sense for it to sign-extend rather than 0 extend?
– Damien_The_Unbeliever
Jun 24 '12 at 11:54
2
@Alex: And sign-extension isn't? Both can be done very cheaply in hardware.
– jalf
Jun 24 '12 at 11:59
2
@Alex: no it's not. It would be a bit slower if done in software, sure, but in hardware, it'd, at worst, cost a few more transistors, which, on a chip the size and complexity of a modern CPU, that's really not an issue.
– jalf
Jun 24 '12 at 14:03
14
Sign extension is slower, even in hardware. Zero extension can be done in parallel with whatever computation produces the lower half, but sign extension can't be done until (at least the sign of) the lower half has been computed.
– Jerry Coffin
Jun 24 '12 at 14:26
10
Another related trick is to useXOR EAX, EAX
becauseXOR RAX, RAX
would need an REX prefix.
– Neil
Oct 2 '13 at 9:12
|
show 7 more comments
It simply saves space in the instructions, and the instruction set. You can move small immediate values to a 64-bit register by using existing (32-bit) instructions.
It also saves you from having to encode 8 byte values for MOV RAX, 42
, when MOV EAX, 42
can be reused.
This optimization is not as important for 8 and 16 bit ops (because they are smaller), and changing the rules there would also break old code.
6
If that's correct, wouldn't it have made more sense for it to sign-extend rather than 0 extend?
– Damien_The_Unbeliever
Jun 24 '12 at 11:54
2
@Alex: And sign-extension isn't? Both can be done very cheaply in hardware.
– jalf
Jun 24 '12 at 11:59
2
@Alex: no it's not. It would be a bit slower if done in software, sure, but in hardware, it'd, at worst, cost a few more transistors, which, on a chip the size and complexity of a modern CPU, that's really not an issue.
– jalf
Jun 24 '12 at 14:03
14
Sign extension is slower, even in hardware. Zero extension can be done in parallel with whatever computation produces the lower half, but sign extension can't be done until (at least the sign of) the lower half has been computed.
– Jerry Coffin
Jun 24 '12 at 14:26
10
Another related trick is to useXOR EAX, EAX
becauseXOR RAX, RAX
would need an REX prefix.
– Neil
Oct 2 '13 at 9:12
|
show 7 more comments
It simply saves space in the instructions, and the instruction set. You can move small immediate values to a 64-bit register by using existing (32-bit) instructions.
It also saves you from having to encode 8 byte values for MOV RAX, 42
, when MOV EAX, 42
can be reused.
This optimization is not as important for 8 and 16 bit ops (because they are smaller), and changing the rules there would also break old code.
It simply saves space in the instructions, and the instruction set. You can move small immediate values to a 64-bit register by using existing (32-bit) instructions.
It also saves you from having to encode 8 byte values for MOV RAX, 42
, when MOV EAX, 42
can be reused.
This optimization is not as important for 8 and 16 bit ops (because they are smaller), and changing the rules there would also break old code.
answered Jun 24 '12 at 11:50
Bo PerssonBo Persson
78.2k17118184
78.2k17118184
6
If that's correct, wouldn't it have made more sense for it to sign-extend rather than 0 extend?
– Damien_The_Unbeliever
Jun 24 '12 at 11:54
2
@Alex: And sign-extension isn't? Both can be done very cheaply in hardware.
– jalf
Jun 24 '12 at 11:59
2
@Alex: no it's not. It would be a bit slower if done in software, sure, but in hardware, it'd, at worst, cost a few more transistors, which, on a chip the size and complexity of a modern CPU, that's really not an issue.
– jalf
Jun 24 '12 at 14:03
14
Sign extension is slower, even in hardware. Zero extension can be done in parallel with whatever computation produces the lower half, but sign extension can't be done until (at least the sign of) the lower half has been computed.
– Jerry Coffin
Jun 24 '12 at 14:26
10
Another related trick is to useXOR EAX, EAX
becauseXOR RAX, RAX
would need an REX prefix.
– Neil
Oct 2 '13 at 9:12
|
show 7 more comments
6
If that's correct, wouldn't it have made more sense for it to sign-extend rather than 0 extend?
– Damien_The_Unbeliever
Jun 24 '12 at 11:54
2
@Alex: And sign-extension isn't? Both can be done very cheaply in hardware.
– jalf
Jun 24 '12 at 11:59
2
@Alex: no it's not. It would be a bit slower if done in software, sure, but in hardware, it'd, at worst, cost a few more transistors, which, on a chip the size and complexity of a modern CPU, that's really not an issue.
– jalf
Jun 24 '12 at 14:03
14
Sign extension is slower, even in hardware. Zero extension can be done in parallel with whatever computation produces the lower half, but sign extension can't be done until (at least the sign of) the lower half has been computed.
– Jerry Coffin
Jun 24 '12 at 14:26
10
Another related trick is to useXOR EAX, EAX
becauseXOR RAX, RAX
would need an REX prefix.
– Neil
Oct 2 '13 at 9:12
6
6
If that's correct, wouldn't it have made more sense for it to sign-extend rather than 0 extend?
– Damien_The_Unbeliever
Jun 24 '12 at 11:54
If that's correct, wouldn't it have made more sense for it to sign-extend rather than 0 extend?
– Damien_The_Unbeliever
Jun 24 '12 at 11:54
2
2
@Alex: And sign-extension isn't? Both can be done very cheaply in hardware.
– jalf
Jun 24 '12 at 11:59
@Alex: And sign-extension isn't? Both can be done very cheaply in hardware.
– jalf
Jun 24 '12 at 11:59
2
2
@Alex: no it's not. It would be a bit slower if done in software, sure, but in hardware, it'd, at worst, cost a few more transistors, which, on a chip the size and complexity of a modern CPU, that's really not an issue.
– jalf
Jun 24 '12 at 14:03
@Alex: no it's not. It would be a bit slower if done in software, sure, but in hardware, it'd, at worst, cost a few more transistors, which, on a chip the size and complexity of a modern CPU, that's really not an issue.
– jalf
Jun 24 '12 at 14:03
14
14
Sign extension is slower, even in hardware. Zero extension can be done in parallel with whatever computation produces the lower half, but sign extension can't be done until (at least the sign of) the lower half has been computed.
– Jerry Coffin
Jun 24 '12 at 14:26
Sign extension is slower, even in hardware. Zero extension can be done in parallel with whatever computation produces the lower half, but sign extension can't be done until (at least the sign of) the lower half has been computed.
– Jerry Coffin
Jun 24 '12 at 14:26
10
10
Another related trick is to use
XOR EAX, EAX
because XOR RAX, RAX
would need an REX prefix.– Neil
Oct 2 '13 at 9:12
Another related trick is to use
XOR EAX, EAX
because XOR RAX, RAX
would need an REX prefix.– Neil
Oct 2 '13 at 9:12
|
show 7 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f11177137%2fwhy-do-x86-64-instructions-on-32-bit-registers-zero-the-upper-part-of-the-full-6%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
16
If you Google for "Partial register stall", you'll find quite a bit of information about the problem they were (almost certainly) trying to avoid.
– Jerry Coffin
Jun 24 '12 at 14:38
3
stackoverflow.com/questions/25455447/…
– Hans Passant
Aug 27 '15 at 7:16
2
Not just "most". AFAIK, all instructions with an
r32
destination operand zero the high 32, rather than merging. For example, some assemblers will replacepmovmskb r64, xmm
withpmovmskb r32, xmm
, saving a REX, because the 64bit destination version behaves identically. Even though the Operation section of the manual lists all 6 combinations of 32/64bit dest and 64/128/256b source separately, the implicit zero-extension of the r32 form duplicates the explicit zero-extension of the r64 form. I'm curious about the HW implementation...– Peter Cordes
May 26 '16 at 23:38
2
@HansPassant, the circular reference begins.
– kchoi
Jul 15 '16 at 23:26
1
Related:
xor eax,eax
orxor r8d,r8d
is the best way to zero RAX or R8 (saving a REX prefix for RAX, and 64-bit XOR isn't even handled specially on Silvermont). Related: How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent– Peter Cordes
Nov 21 '17 at 23:44