Comparison of Non ASCII only works in IDLE
I'm doing a fairly simple code that transforms European Portuguese input into Brazilian Portuguese -- so there are a lot of accented characters such as á,é,À,ç, etc.
Basically, the goal is to find words in the text from a list and replace them with the BR words from a second list.
Here's the code:
#-*- coding: latin-1 -*-
listapt=["gestão","utilizador","telemóvel"]
listabr=["gerenciamento", "usuário", "celular"]
while True:
#this is all because I need to be able to input multiple lines of text, seems to be working fine
print ("Insert text")
lines =
while True:
line = raw_input()
if line != "FIM":
lines.append(line)
else:
break
text = 'n'.join(lines)
for word in listapt:
if word in text:
num = listapt.index(word)
wordbr = listabr[num]
print(word + " --> " + wordbr) #just to show what changes were made
text = text.replace(word, wordbr)
print(text)
I run the code on Windows using IDLE and by double-clicking on the .py
file.
The code works fine when using IDLE, but does not match and replace characters when double-clicking the .py
file.
python windows python-2.7 character-encoding
New contributor
|
show 9 more comments
I'm doing a fairly simple code that transforms European Portuguese input into Brazilian Portuguese -- so there are a lot of accented characters such as á,é,À,ç, etc.
Basically, the goal is to find words in the text from a list and replace them with the BR words from a second list.
Here's the code:
#-*- coding: latin-1 -*-
listapt=["gestão","utilizador","telemóvel"]
listabr=["gerenciamento", "usuário", "celular"]
while True:
#this is all because I need to be able to input multiple lines of text, seems to be working fine
print ("Insert text")
lines =
while True:
line = raw_input()
if line != "FIM":
lines.append(line)
else:
break
text = 'n'.join(lines)
for word in listapt:
if word in text:
num = listapt.index(word)
wordbr = listabr[num]
print(word + " --> " + wordbr) #just to show what changes were made
text = text.replace(word, wordbr)
print(text)
I run the code on Windows using IDLE and by double-clicking on the .py
file.
The code works fine when using IDLE, but does not match and replace characters when double-clicking the .py
file.
python windows python-2.7 character-encoding
New contributor
1
What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41
Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49
1
Do you get the same error message if you add a basicprint "gestão"
somewhere?
– usr2564301
Dec 27 '18 at 15:22
No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35
I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51
|
show 9 more comments
I'm doing a fairly simple code that transforms European Portuguese input into Brazilian Portuguese -- so there are a lot of accented characters such as á,é,À,ç, etc.
Basically, the goal is to find words in the text from a list and replace them with the BR words from a second list.
Here's the code:
#-*- coding: latin-1 -*-
listapt=["gestão","utilizador","telemóvel"]
listabr=["gerenciamento", "usuário", "celular"]
while True:
#this is all because I need to be able to input multiple lines of text, seems to be working fine
print ("Insert text")
lines =
while True:
line = raw_input()
if line != "FIM":
lines.append(line)
else:
break
text = 'n'.join(lines)
for word in listapt:
if word in text:
num = listapt.index(word)
wordbr = listabr[num]
print(word + " --> " + wordbr) #just to show what changes were made
text = text.replace(word, wordbr)
print(text)
I run the code on Windows using IDLE and by double-clicking on the .py
file.
The code works fine when using IDLE, but does not match and replace characters when double-clicking the .py
file.
python windows python-2.7 character-encoding
New contributor
I'm doing a fairly simple code that transforms European Portuguese input into Brazilian Portuguese -- so there are a lot of accented characters such as á,é,À,ç, etc.
Basically, the goal is to find words in the text from a list and replace them with the BR words from a second list.
Here's the code:
#-*- coding: latin-1 -*-
listapt=["gestão","utilizador","telemóvel"]
listabr=["gerenciamento", "usuário", "celular"]
while True:
#this is all because I need to be able to input multiple lines of text, seems to be working fine
print ("Insert text")
lines =
while True:
line = raw_input()
if line != "FIM":
lines.append(line)
else:
break
text = 'n'.join(lines)
for word in listapt:
if word in text:
num = listapt.index(word)
wordbr = listabr[num]
print(word + " --> " + wordbr) #just to show what changes were made
text = text.replace(word, wordbr)
print(text)
I run the code on Windows using IDLE and by double-clicking on the .py
file.
The code works fine when using IDLE, but does not match and replace characters when double-clicking the .py
file.
python windows python-2.7 character-encoding
python windows python-2.7 character-encoding
New contributor
New contributor
edited Dec 27 '18 at 17:34
Alastair McCormack
15k33860
15k33860
New contributor
asked Dec 27 '18 at 14:36
Pedro
84
84
New contributor
New contributor
1
What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41
Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49
1
Do you get the same error message if you add a basicprint "gestão"
somewhere?
– usr2564301
Dec 27 '18 at 15:22
No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35
I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51
|
show 9 more comments
1
What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41
Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49
1
Do you get the same error message if you add a basicprint "gestão"
somewhere?
– usr2564301
Dec 27 '18 at 15:22
No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35
I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51
1
1
What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41
What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41
Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49
Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49
1
1
Do you get the same error message if you add a basic
print "gestão"
somewhere?– usr2564301
Dec 27 '18 at 15:22
Do you get the same error message if you add a basic
print "gestão"
somewhere?– usr2564301
Dec 27 '18 at 15:22
No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35
No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35
I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51
I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51
|
show 9 more comments
3 Answers
3
active
oldest
votes
Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:
- Your code is UTF-8 encoded, not latin-1 encoded
- IDLE always works in UTF-8 "input/output" mode.
- On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.
- When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).
The way to fix this is to make sure you're comparing "apples with apples". You could do this by converting everything to the same encoding. E.g. Convert the input read to UTF-8 so it matches the hardcoded strings. The better solution is to convert all [byte] strings to Unicode strings (Strings with no encoding). If you were on Python 3, this would be all automatic.
On Python 2.x, you need to do three things:
Prefix all sourcecode strings with
u
to make them Unicode strings:
listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celula]
...
if line != u"FIM":
Alternatively, add
from __future__ import unicode_literals
to avoid changing all your code.
Use the correct coding header for the encoding of your file. I suspect your header should read
utf-8
. E.g.
#-*- coding: utf-8 -*-
Convert the result of
raw_input
to Unicode. This must be done with the detected encoding of the standard input:
import sys
line = raw_input().decode(sys.stdin.encoding)
By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.
words = { u"telemóvel": u"celula"}
OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21
Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g.ReadConsoleW
), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37
There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51
add a comment |
I don't see that problem over here.
Based on your use of raw_input, it seems like you're using Python 2.x
This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.
Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.
This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.
The problem you have here is Python 2.x getting confused at some point while trying to translate between byte sequences (what Python 2.x strings contain, eg binary file contents), and human-meaningful text (unicode, eg for things like user informational display of chinese characters), because it makes incorrect assumptions about how human-readable text was encoded into the byte sequence seen in the Python strings.
It's a detail that Python 3 attempts to address a lot better/less ambiguously.
New contributor
I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10
add a comment |
First try executing the code below, it should resolve the issue:
# -*- coding: latin-1 -*-
listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celular"]
lines=
line = raw_input()
line = line.decode('latin-1')
if line != "FIM":
lines.append(line)
text = u'n'.join(lines)
for word in listapt:
if word in text:
print("Hello")
num = listapt.index(word)
print(num)
wordbr = listabr[num]
print(wordbr)
New contributor
Do not hardcodedecode
- this assumes the terminal is usinglatin-1
. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Pedro is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946732%2fcomparison-of-non-ascii-only-works-in-idle%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:
- Your code is UTF-8 encoded, not latin-1 encoded
- IDLE always works in UTF-8 "input/output" mode.
- On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.
- When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).
The way to fix this is to make sure you're comparing "apples with apples". You could do this by converting everything to the same encoding. E.g. Convert the input read to UTF-8 so it matches the hardcoded strings. The better solution is to convert all [byte] strings to Unicode strings (Strings with no encoding). If you were on Python 3, this would be all automatic.
On Python 2.x, you need to do three things:
Prefix all sourcecode strings with
u
to make them Unicode strings:
listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celula]
...
if line != u"FIM":
Alternatively, add
from __future__ import unicode_literals
to avoid changing all your code.
Use the correct coding header for the encoding of your file. I suspect your header should read
utf-8
. E.g.
#-*- coding: utf-8 -*-
Convert the result of
raw_input
to Unicode. This must be done with the detected encoding of the standard input:
import sys
line = raw_input().decode(sys.stdin.encoding)
By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.
words = { u"telemóvel": u"celula"}
OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21
Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g.ReadConsoleW
), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37
There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51
add a comment |
Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:
- Your code is UTF-8 encoded, not latin-1 encoded
- IDLE always works in UTF-8 "input/output" mode.
- On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.
- When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).
The way to fix this is to make sure you're comparing "apples with apples". You could do this by converting everything to the same encoding. E.g. Convert the input read to UTF-8 so it matches the hardcoded strings. The better solution is to convert all [byte] strings to Unicode strings (Strings with no encoding). If you were on Python 3, this would be all automatic.
On Python 2.x, you need to do three things:
Prefix all sourcecode strings with
u
to make them Unicode strings:
listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celula]
...
if line != u"FIM":
Alternatively, add
from __future__ import unicode_literals
to avoid changing all your code.
Use the correct coding header for the encoding of your file. I suspect your header should read
utf-8
. E.g.
#-*- coding: utf-8 -*-
Convert the result of
raw_input
to Unicode. This must be done with the detected encoding of the standard input:
import sys
line = raw_input().decode(sys.stdin.encoding)
By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.
words = { u"telemóvel": u"celula"}
OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21
Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g.ReadConsoleW
), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37
There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51
add a comment |
Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:
- Your code is UTF-8 encoded, not latin-1 encoded
- IDLE always works in UTF-8 "input/output" mode.
- On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.
- When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).
The way to fix this is to make sure you're comparing "apples with apples". You could do this by converting everything to the same encoding. E.g. Convert the input read to UTF-8 so it matches the hardcoded strings. The better solution is to convert all [byte] strings to Unicode strings (Strings with no encoding). If you were on Python 3, this would be all automatic.
On Python 2.x, you need to do three things:
Prefix all sourcecode strings with
u
to make them Unicode strings:
listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celula]
...
if line != u"FIM":
Alternatively, add
from __future__ import unicode_literals
to avoid changing all your code.
Use the correct coding header for the encoding of your file. I suspect your header should read
utf-8
. E.g.
#-*- coding: utf-8 -*-
Convert the result of
raw_input
to Unicode. This must be done with the detected encoding of the standard input:
import sys
line = raw_input().decode(sys.stdin.encoding)
By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.
words = { u"telemóvel": u"celula"}
Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:
- Your code is UTF-8 encoded, not latin-1 encoded
- IDLE always works in UTF-8 "input/output" mode.
- On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.
- When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).
The way to fix this is to make sure you're comparing "apples with apples". You could do this by converting everything to the same encoding. E.g. Convert the input read to UTF-8 so it matches the hardcoded strings. The better solution is to convert all [byte] strings to Unicode strings (Strings with no encoding). If you were on Python 3, this would be all automatic.
On Python 2.x, you need to do three things:
Prefix all sourcecode strings with
u
to make them Unicode strings:
listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celula]
...
if line != u"FIM":
Alternatively, add
from __future__ import unicode_literals
to avoid changing all your code.
Use the correct coding header for the encoding of your file. I suspect your header should read
utf-8
. E.g.
#-*- coding: utf-8 -*-
Convert the result of
raw_input
to Unicode. This must be done with the detected encoding of the standard input:
import sys
line = raw_input().decode(sys.stdin.encoding)
By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.
words = { u"telemóvel": u"celula"}
edited Dec 27 '18 at 18:22
answered Dec 27 '18 at 17:13
Alastair McCormack
15k33860
15k33860
OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21
Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g.ReadConsoleW
), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37
There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51
add a comment |
OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21
Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g.ReadConsoleW
), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37
There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51
OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21
OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21
Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g.
ReadConsoleW
), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.– eryksun
Dec 27 '18 at 23:37
Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g.
ReadConsoleW
), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.– eryksun
Dec 27 '18 at 23:37
There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51
There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51
add a comment |
I don't see that problem over here.
Based on your use of raw_input, it seems like you're using Python 2.x
This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.
Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.
This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.
The problem you have here is Python 2.x getting confused at some point while trying to translate between byte sequences (what Python 2.x strings contain, eg binary file contents), and human-meaningful text (unicode, eg for things like user informational display of chinese characters), because it makes incorrect assumptions about how human-readable text was encoded into the byte sequence seen in the Python strings.
It's a detail that Python 3 attempts to address a lot better/less ambiguously.
New contributor
I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10
add a comment |
I don't see that problem over here.
Based on your use of raw_input, it seems like you're using Python 2.x
This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.
Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.
This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.
The problem you have here is Python 2.x getting confused at some point while trying to translate between byte sequences (what Python 2.x strings contain, eg binary file contents), and human-meaningful text (unicode, eg for things like user informational display of chinese characters), because it makes incorrect assumptions about how human-readable text was encoded into the byte sequence seen in the Python strings.
It's a detail that Python 3 attempts to address a lot better/less ambiguously.
New contributor
I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10
add a comment |
I don't see that problem over here.
Based on your use of raw_input, it seems like you're using Python 2.x
This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.
Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.
This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.
The problem you have here is Python 2.x getting confused at some point while trying to translate between byte sequences (what Python 2.x strings contain, eg binary file contents), and human-meaningful text (unicode, eg for things like user informational display of chinese characters), because it makes incorrect assumptions about how human-readable text was encoded into the byte sequence seen in the Python strings.
It's a detail that Python 3 attempts to address a lot better/less ambiguously.
New contributor
I don't see that problem over here.
Based on your use of raw_input, it seems like you're using Python 2.x
This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.
Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.
This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.
The problem you have here is Python 2.x getting confused at some point while trying to translate between byte sequences (what Python 2.x strings contain, eg binary file contents), and human-meaningful text (unicode, eg for things like user informational display of chinese characters), because it makes incorrect assumptions about how human-readable text was encoded into the byte sequence seen in the Python strings.
It's a detail that Python 3 attempts to address a lot better/less ambiguously.
New contributor
New contributor
answered Dec 27 '18 at 15:22
David Purdy
92
92
New contributor
New contributor
I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10
add a comment |
I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10
I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10
I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10
add a comment |
First try executing the code below, it should resolve the issue:
# -*- coding: latin-1 -*-
listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celular"]
lines=
line = raw_input()
line = line.decode('latin-1')
if line != "FIM":
lines.append(line)
text = u'n'.join(lines)
for word in listapt:
if word in text:
print("Hello")
num = listapt.index(word)
print(num)
wordbr = listabr[num]
print(wordbr)
New contributor
Do not hardcodedecode
- this assumes the terminal is usinglatin-1
. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02
add a comment |
First try executing the code below, it should resolve the issue:
# -*- coding: latin-1 -*-
listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celular"]
lines=
line = raw_input()
line = line.decode('latin-1')
if line != "FIM":
lines.append(line)
text = u'n'.join(lines)
for word in listapt:
if word in text:
print("Hello")
num = listapt.index(word)
print(num)
wordbr = listabr[num]
print(wordbr)
New contributor
Do not hardcodedecode
- this assumes the terminal is usinglatin-1
. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02
add a comment |
First try executing the code below, it should resolve the issue:
# -*- coding: latin-1 -*-
listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celular"]
lines=
line = raw_input()
line = line.decode('latin-1')
if line != "FIM":
lines.append(line)
text = u'n'.join(lines)
for word in listapt:
if word in text:
print("Hello")
num = listapt.index(word)
print(num)
wordbr = listabr[num]
print(wordbr)
New contributor
First try executing the code below, it should resolve the issue:
# -*- coding: latin-1 -*-
listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celular"]
lines=
line = raw_input()
line = line.decode('latin-1')
if line != "FIM":
lines.append(line)
text = u'n'.join(lines)
for word in listapt:
if word in text:
print("Hello")
num = listapt.index(word)
print(num)
wordbr = listabr[num]
print(wordbr)
New contributor
edited Dec 27 '18 at 15:52
New contributor
answered Dec 27 '18 at 15:21
Ankur Goel
1298
1298
New contributor
New contributor
Do not hardcodedecode
- this assumes the terminal is usinglatin-1
. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02
add a comment |
Do not hardcodedecode
- this assumes the terminal is usinglatin-1
. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02
Do not hardcode
decode
- this assumes the terminal is using latin-1
. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.– Alastair McCormack
Dec 27 '18 at 16:02
Do not hardcode
decode
- this assumes the terminal is using latin-1
. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.– Alastair McCormack
Dec 27 '18 at 16:02
add a comment |
Pedro is a new contributor. Be nice, and check out our Code of Conduct.
Pedro is a new contributor. Be nice, and check out our Code of Conduct.
Pedro is a new contributor. Be nice, and check out our Code of Conduct.
Pedro is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946732%2fcomparison-of-non-ascii-only-works-in-idle%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41
Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49
1
Do you get the same error message if you add a basic
print "gestão"
somewhere?– usr2564301
Dec 27 '18 at 15:22
No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35
I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51