Comparison of Non ASCII only works in IDLE












-1














I'm doing a fairly simple code that transforms European Portuguese input into Brazilian Portuguese -- so there are a lot of accented characters such as á,é,À,ç, etc.



Basically, the goal is to find words in the text from a list and replace them with the BR words from a second list.



Here's the code:



#-*- coding: latin-1 -*-

listapt=["gestão","utilizador","telemóvel"]
listabr=["gerenciamento", "usuário", "celular"]

while True:

#this is all because I need to be able to input multiple lines of text, seems to be working fine

print ("Insert text")
lines =

while True:
line = raw_input()
if line != "FIM":
lines.append(line)
else:
break
text = 'n'.join(lines)

for word in listapt:
if word in text:
num = listapt.index(word)
wordbr = listabr[num]
print(word + " --> " + wordbr) #just to show what changes were made
text = text.replace(word, wordbr)

print(text)


I run the code on Windows using IDLE and by double-clicking on the .py file.
The code works fine when using IDLE, but does not match and replace characters when double-clicking the .py file.










share|improve this question









New contributor




Pedro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    What version of python are you using? And can you edit your question to add the full traceback?
    – snakecharmerb
    Dec 27 '18 at 14:41










  • Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
    – Pedro
    Dec 27 '18 at 14:49






  • 1




    Do you get the same error message if you add a basic print "gestão" somewhere?
    – usr2564301
    Dec 27 '18 at 15:22










  • No, I don't. That seems to work fine.
    – Pedro
    Dec 27 '18 at 15:35










  • I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
    – Pedro
    Dec 27 '18 at 15:51
















-1














I'm doing a fairly simple code that transforms European Portuguese input into Brazilian Portuguese -- so there are a lot of accented characters such as á,é,À,ç, etc.



Basically, the goal is to find words in the text from a list and replace them with the BR words from a second list.



Here's the code:



#-*- coding: latin-1 -*-

listapt=["gestão","utilizador","telemóvel"]
listabr=["gerenciamento", "usuário", "celular"]

while True:

#this is all because I need to be able to input multiple lines of text, seems to be working fine

print ("Insert text")
lines =

while True:
line = raw_input()
if line != "FIM":
lines.append(line)
else:
break
text = 'n'.join(lines)

for word in listapt:
if word in text:
num = listapt.index(word)
wordbr = listabr[num]
print(word + " --> " + wordbr) #just to show what changes were made
text = text.replace(word, wordbr)

print(text)


I run the code on Windows using IDLE and by double-clicking on the .py file.
The code works fine when using IDLE, but does not match and replace characters when double-clicking the .py file.










share|improve this question









New contributor




Pedro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    What version of python are you using? And can you edit your question to add the full traceback?
    – snakecharmerb
    Dec 27 '18 at 14:41










  • Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
    – Pedro
    Dec 27 '18 at 14:49






  • 1




    Do you get the same error message if you add a basic print "gestão" somewhere?
    – usr2564301
    Dec 27 '18 at 15:22










  • No, I don't. That seems to work fine.
    – Pedro
    Dec 27 '18 at 15:35










  • I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
    – Pedro
    Dec 27 '18 at 15:51














-1












-1








-1







I'm doing a fairly simple code that transforms European Portuguese input into Brazilian Portuguese -- so there are a lot of accented characters such as á,é,À,ç, etc.



Basically, the goal is to find words in the text from a list and replace them with the BR words from a second list.



Here's the code:



#-*- coding: latin-1 -*-

listapt=["gestão","utilizador","telemóvel"]
listabr=["gerenciamento", "usuário", "celular"]

while True:

#this is all because I need to be able to input multiple lines of text, seems to be working fine

print ("Insert text")
lines =

while True:
line = raw_input()
if line != "FIM":
lines.append(line)
else:
break
text = 'n'.join(lines)

for word in listapt:
if word in text:
num = listapt.index(word)
wordbr = listabr[num]
print(word + " --> " + wordbr) #just to show what changes were made
text = text.replace(word, wordbr)

print(text)


I run the code on Windows using IDLE and by double-clicking on the .py file.
The code works fine when using IDLE, but does not match and replace characters when double-clicking the .py file.










share|improve this question









New contributor




Pedro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I'm doing a fairly simple code that transforms European Portuguese input into Brazilian Portuguese -- so there are a lot of accented characters such as á,é,À,ç, etc.



Basically, the goal is to find words in the text from a list and replace them with the BR words from a second list.



Here's the code:



#-*- coding: latin-1 -*-

listapt=["gestão","utilizador","telemóvel"]
listabr=["gerenciamento", "usuário", "celular"]

while True:

#this is all because I need to be able to input multiple lines of text, seems to be working fine

print ("Insert text")
lines =

while True:
line = raw_input()
if line != "FIM":
lines.append(line)
else:
break
text = 'n'.join(lines)

for word in listapt:
if word in text:
num = listapt.index(word)
wordbr = listabr[num]
print(word + " --> " + wordbr) #just to show what changes were made
text = text.replace(word, wordbr)

print(text)


I run the code on Windows using IDLE and by double-clicking on the .py file.
The code works fine when using IDLE, but does not match and replace characters when double-clicking the .py file.







python windows python-2.7 character-encoding






share|improve this question









New contributor




Pedro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Pedro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited Dec 27 '18 at 17:34









Alastair McCormack

15k33860




15k33860






New contributor




Pedro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Dec 27 '18 at 14:36









Pedro

84




84




New contributor




Pedro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Pedro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Pedro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1




    What version of python are you using? And can you edit your question to add the full traceback?
    – snakecharmerb
    Dec 27 '18 at 14:41










  • Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
    – Pedro
    Dec 27 '18 at 14:49






  • 1




    Do you get the same error message if you add a basic print "gestão" somewhere?
    – usr2564301
    Dec 27 '18 at 15:22










  • No, I don't. That seems to work fine.
    – Pedro
    Dec 27 '18 at 15:35










  • I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
    – Pedro
    Dec 27 '18 at 15:51














  • 1




    What version of python are you using? And can you edit your question to add the full traceback?
    – snakecharmerb
    Dec 27 '18 at 14:41










  • Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
    – Pedro
    Dec 27 '18 at 14:49






  • 1




    Do you get the same error message if you add a basic print "gestão" somewhere?
    – usr2564301
    Dec 27 '18 at 15:22










  • No, I don't. That seems to work fine.
    – Pedro
    Dec 27 '18 at 15:35










  • I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
    – Pedro
    Dec 27 '18 at 15:51








1




1




What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41




What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41












Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49




Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49




1




1




Do you get the same error message if you add a basic print "gestão" somewhere?
– usr2564301
Dec 27 '18 at 15:22




Do you get the same error message if you add a basic print "gestão" somewhere?
– usr2564301
Dec 27 '18 at 15:22












No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35




No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35












I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51




I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51












3 Answers
3






active

oldest

votes


















1














Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:




  1. Your code is UTF-8 encoded, not latin-1 encoded

  2. IDLE always works in UTF-8 "input/output" mode.

  3. On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.

  4. When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).


The way to fix this is to make sure you're comparing "apples with apples". You could do this by converting everything to the same encoding. E.g. Convert the input read to UTF-8 so it matches the hardcoded strings. The better solution is to convert all [byte] strings to Unicode strings (Strings with no encoding). If you were on Python 3, this would be all automatic.



On Python 2.x, you need to do three things:





  1. Prefix all sourcecode strings with u to make them Unicode strings:



    listapt=[u"gestão",u"utilizador",u"telemóvel"]
    listabr=[u"gerenciamento",u"usuário", u"celula]
    ...
    if line != u"FIM":


    Alternatively, add from __future__ import unicode_literals to avoid changing all your code.




  2. Use the correct coding header for the encoding of your file. I suspect your header should read utf-8. E.g.



    #-*- coding: utf-8 -*-



  3. Convert the result of raw_input to Unicode. This must be done with the detected encoding of the standard input:



    import sys
    line = raw_input().decode(sys.stdin.encoding)



By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.



words = { u"telemóvel": u"celula"}





share|improve this answer























  • OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
    – Pedro
    Dec 27 '18 at 18:21










  • Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
    – eryksun
    Dec 27 '18 at 23:37










  • There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
    – eryksun
    Dec 27 '18 at 23:51



















1














I don't see that problem over here.



Based on your use of raw_input, it seems like you're using Python 2.x



This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.



Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.



This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.



The problem you have here is Python 2.x getting confused at some point while trying to translate between byte sequences (what Python 2.x strings contain, eg binary file contents), and human-meaningful text (unicode, eg for things like user informational display of chinese characters), because it makes incorrect assumptions about how human-readable text was encoded into the byte sequence seen in the Python strings.



It's a detail that Python 3 attempts to address a lot better/less ambiguously.






share|improve this answer








New contributor




David Purdy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
    – Pedro
    Dec 27 '18 at 16:10



















-1














First try executing the code below, it should resolve the issue:



# -*- coding: latin-1 -*-

listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celular"]

lines=
line = raw_input()
line = line.decode('latin-1')
if line != "FIM":
lines.append(line)

text = u'n'.join(lines)

for word in listapt:
if word in text:
print("Hello")
num = listapt.index(word)
print(num)
wordbr = listabr[num]
print(wordbr)





share|improve this answer










New contributor




Ankur Goel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
    – Alastair McCormack
    Dec 27 '18 at 16:02











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Pedro is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946732%2fcomparison-of-non-ascii-only-works-in-idle%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:




  1. Your code is UTF-8 encoded, not latin-1 encoded

  2. IDLE always works in UTF-8 "input/output" mode.

  3. On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.

  4. When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).


The way to fix this is to make sure you're comparing "apples with apples". You could do this by converting everything to the same encoding. E.g. Convert the input read to UTF-8 so it matches the hardcoded strings. The better solution is to convert all [byte] strings to Unicode strings (Strings with no encoding). If you were on Python 3, this would be all automatic.



On Python 2.x, you need to do three things:





  1. Prefix all sourcecode strings with u to make them Unicode strings:



    listapt=[u"gestão",u"utilizador",u"telemóvel"]
    listabr=[u"gerenciamento",u"usuário", u"celula]
    ...
    if line != u"FIM":


    Alternatively, add from __future__ import unicode_literals to avoid changing all your code.




  2. Use the correct coding header for the encoding of your file. I suspect your header should read utf-8. E.g.



    #-*- coding: utf-8 -*-



  3. Convert the result of raw_input to Unicode. This must be done with the detected encoding of the standard input:



    import sys
    line = raw_input().decode(sys.stdin.encoding)



By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.



words = { u"telemóvel": u"celula"}





share|improve this answer























  • OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
    – Pedro
    Dec 27 '18 at 18:21










  • Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
    – eryksun
    Dec 27 '18 at 23:37










  • There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
    – eryksun
    Dec 27 '18 at 23:51
















1














Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:




  1. Your code is UTF-8 encoded, not latin-1 encoded

  2. IDLE always works in UTF-8 "input/output" mode.

  3. On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.

  4. When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).


The way to fix this is to make sure you're comparing "apples with apples". You could do this by converting everything to the same encoding. E.g. Convert the input read to UTF-8 so it matches the hardcoded strings. The better solution is to convert all [byte] strings to Unicode strings (Strings with no encoding). If you were on Python 3, this would be all automatic.



On Python 2.x, you need to do three things:





  1. Prefix all sourcecode strings with u to make them Unicode strings:



    listapt=[u"gestão",u"utilizador",u"telemóvel"]
    listabr=[u"gerenciamento",u"usuário", u"celula]
    ...
    if line != u"FIM":


    Alternatively, add from __future__ import unicode_literals to avoid changing all your code.




  2. Use the correct coding header for the encoding of your file. I suspect your header should read utf-8. E.g.



    #-*- coding: utf-8 -*-



  3. Convert the result of raw_input to Unicode. This must be done with the detected encoding of the standard input:



    import sys
    line = raw_input().decode(sys.stdin.encoding)



By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.



words = { u"telemóvel": u"celula"}





share|improve this answer























  • OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
    – Pedro
    Dec 27 '18 at 18:21










  • Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
    – eryksun
    Dec 27 '18 at 23:37










  • There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
    – eryksun
    Dec 27 '18 at 23:51














1












1








1






Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:




  1. Your code is UTF-8 encoded, not latin-1 encoded

  2. IDLE always works in UTF-8 "input/output" mode.

  3. On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.

  4. When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).


The way to fix this is to make sure you're comparing "apples with apples". You could do this by converting everything to the same encoding. E.g. Convert the input read to UTF-8 so it matches the hardcoded strings. The better solution is to convert all [byte] strings to Unicode strings (Strings with no encoding). If you were on Python 3, this would be all automatic.



On Python 2.x, you need to do three things:





  1. Prefix all sourcecode strings with u to make them Unicode strings:



    listapt=[u"gestão",u"utilizador",u"telemóvel"]
    listabr=[u"gerenciamento",u"usuário", u"celula]
    ...
    if line != u"FIM":


    Alternatively, add from __future__ import unicode_literals to avoid changing all your code.




  2. Use the correct coding header for the encoding of your file. I suspect your header should read utf-8. E.g.



    #-*- coding: utf-8 -*-



  3. Convert the result of raw_input to Unicode. This must be done with the detected encoding of the standard input:



    import sys
    line = raw_input().decode(sys.stdin.encoding)



By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.



words = { u"telemóvel": u"celula"}





share|improve this answer














Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:




  1. Your code is UTF-8 encoded, not latin-1 encoded

  2. IDLE always works in UTF-8 "input/output" mode.

  3. On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.

  4. When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).


The way to fix this is to make sure you're comparing "apples with apples". You could do this by converting everything to the same encoding. E.g. Convert the input read to UTF-8 so it matches the hardcoded strings. The better solution is to convert all [byte] strings to Unicode strings (Strings with no encoding). If you were on Python 3, this would be all automatic.



On Python 2.x, you need to do three things:





  1. Prefix all sourcecode strings with u to make them Unicode strings:



    listapt=[u"gestão",u"utilizador",u"telemóvel"]
    listabr=[u"gerenciamento",u"usuário", u"celula]
    ...
    if line != u"FIM":


    Alternatively, add from __future__ import unicode_literals to avoid changing all your code.




  2. Use the correct coding header for the encoding of your file. I suspect your header should read utf-8. E.g.



    #-*- coding: utf-8 -*-



  3. Convert the result of raw_input to Unicode. This must be done with the detected encoding of the standard input:



    import sys
    line = raw_input().decode(sys.stdin.encoding)



By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.



words = { u"telemóvel": u"celula"}






share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 27 '18 at 18:22

























answered Dec 27 '18 at 17:13









Alastair McCormack

15k33860




15k33860












  • OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
    – Pedro
    Dec 27 '18 at 18:21










  • Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
    – eryksun
    Dec 27 '18 at 23:37










  • There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
    – eryksun
    Dec 27 '18 at 23:51


















  • OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
    – Pedro
    Dec 27 '18 at 18:21










  • Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
    – eryksun
    Dec 27 '18 at 23:37










  • There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
    – eryksun
    Dec 27 '18 at 23:51
















OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21




OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21












Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37




Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37












There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51




There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51













1














I don't see that problem over here.



Based on your use of raw_input, it seems like you're using Python 2.x



This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.



Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.



This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.



The problem you have here is Python 2.x getting confused at some point while trying to translate between byte sequences (what Python 2.x strings contain, eg binary file contents), and human-meaningful text (unicode, eg for things like user informational display of chinese characters), because it makes incorrect assumptions about how human-readable text was encoded into the byte sequence seen in the Python strings.



It's a detail that Python 3 attempts to address a lot better/less ambiguously.






share|improve this answer








New contributor




David Purdy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
    – Pedro
    Dec 27 '18 at 16:10
















1














I don't see that problem over here.



Based on your use of raw_input, it seems like you're using Python 2.x



This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.



Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.



This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.



The problem you have here is Python 2.x getting confused at some point while trying to translate between byte sequences (what Python 2.x strings contain, eg binary file contents), and human-meaningful text (unicode, eg for things like user informational display of chinese characters), because it makes incorrect assumptions about how human-readable text was encoded into the byte sequence seen in the Python strings.



It's a detail that Python 3 attempts to address a lot better/less ambiguously.






share|improve this answer








New contributor




David Purdy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
    – Pedro
    Dec 27 '18 at 16:10














1












1








1






I don't see that problem over here.



Based on your use of raw_input, it seems like you're using Python 2.x



This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.



Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.



This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.



The problem you have here is Python 2.x getting confused at some point while trying to translate between byte sequences (what Python 2.x strings contain, eg binary file contents), and human-meaningful text (unicode, eg for things like user informational display of chinese characters), because it makes incorrect assumptions about how human-readable text was encoded into the byte sequence seen in the Python strings.



It's a detail that Python 3 attempts to address a lot better/less ambiguously.






share|improve this answer








New contributor




David Purdy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









I don't see that problem over here.



Based on your use of raw_input, it seems like you're using Python 2.x



This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.



Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.



This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.



The problem you have here is Python 2.x getting confused at some point while trying to translate between byte sequences (what Python 2.x strings contain, eg binary file contents), and human-meaningful text (unicode, eg for things like user informational display of chinese characters), because it makes incorrect assumptions about how human-readable text was encoded into the byte sequence seen in the Python strings.



It's a detail that Python 3 attempts to address a lot better/less ambiguously.







share|improve this answer








New contributor




David Purdy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer






New contributor




David Purdy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered Dec 27 '18 at 15:22









David Purdy

92




92




New contributor




David Purdy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





David Purdy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






David Purdy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
    – Pedro
    Dec 27 '18 at 16:10


















  • I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
    – Pedro
    Dec 27 '18 at 16:10
















I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10




I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10











-1














First try executing the code below, it should resolve the issue:



# -*- coding: latin-1 -*-

listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celular"]

lines=
line = raw_input()
line = line.decode('latin-1')
if line != "FIM":
lines.append(line)

text = u'n'.join(lines)

for word in listapt:
if word in text:
print("Hello")
num = listapt.index(word)
print(num)
wordbr = listabr[num]
print(wordbr)





share|improve this answer










New contributor




Ankur Goel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
    – Alastair McCormack
    Dec 27 '18 at 16:02
















-1














First try executing the code below, it should resolve the issue:



# -*- coding: latin-1 -*-

listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celular"]

lines=
line = raw_input()
line = line.decode('latin-1')
if line != "FIM":
lines.append(line)

text = u'n'.join(lines)

for word in listapt:
if word in text:
print("Hello")
num = listapt.index(word)
print(num)
wordbr = listabr[num]
print(wordbr)





share|improve this answer










New contributor




Ankur Goel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
    – Alastair McCormack
    Dec 27 '18 at 16:02














-1












-1








-1






First try executing the code below, it should resolve the issue:



# -*- coding: latin-1 -*-

listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celular"]

lines=
line = raw_input()
line = line.decode('latin-1')
if line != "FIM":
lines.append(line)

text = u'n'.join(lines)

for word in listapt:
if word in text:
print("Hello")
num = listapt.index(word)
print(num)
wordbr = listabr[num]
print(wordbr)





share|improve this answer










New contributor




Ankur Goel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









First try executing the code below, it should resolve the issue:



# -*- coding: latin-1 -*-

listapt=[u"gestão",u"utilizador",u"telemóvel"]
listabr=[u"gerenciamento",u"usuário", u"celular"]

lines=
line = raw_input()
line = line.decode('latin-1')
if line != "FIM":
lines.append(line)

text = u'n'.join(lines)

for word in listapt:
if word in text:
print("Hello")
num = listapt.index(word)
print(num)
wordbr = listabr[num]
print(wordbr)






share|improve this answer










New contributor




Ankur Goel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer








edited Dec 27 '18 at 15:52





















New contributor




Ankur Goel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered Dec 27 '18 at 15:21









Ankur Goel

1298




1298




New contributor




Ankur Goel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Ankur Goel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Ankur Goel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
    – Alastair McCormack
    Dec 27 '18 at 16:02


















  • Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
    – Alastair McCormack
    Dec 27 '18 at 16:02
















Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02




Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02










Pedro is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Pedro is a new contributor. Be nice, and check out our Code of Conduct.













Pedro is a new contributor. Be nice, and check out our Code of Conduct.












Pedro is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946732%2fcomparison-of-non-ascii-only-works-in-idle%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas