Comparison of Non ASCII only works in IDLE

-1

I'm doing a fairly simple code that transforms European Portuguese input into Brazilian Portuguese -- so there are a lot of accented characters such as á,é,À,ç, etc.

Basically, the goal is to find words in the text from a list and replace them with the BR words from a second list.

Here's the code:

#-*- coding: latin-1 -*-



listapt=["gestão","utilizador","telemóvel"]

listabr=["gerenciamento", "usuário", "celular"]



while True:



    #this is all because I need to be able to input multiple lines of text, seems to be working fine 



    print ("Insert text")

    lines = 



    while True:

        line = raw_input()

        if line != "FIM":

            lines.append(line)

        else:

            break

    text = 'n'.join(lines)    



    for word in listapt:

        if word in text:

            num = listapt.index(word)

            wordbr = listabr[num]

            print(word + " --> " + wordbr) #just to show what changes were made

            text = text.replace(word, wordbr)



    print(text)

I run the code on Windows using IDLE and by double-clicking on the .py file.
The code works fine when using IDLE, but does not match and replace characters when double-clicking the .py file.

edited Dec 27 '18 at 17:34

Alastair McCormack

15k33860

asked Dec 27 '18 at 14:36

Pedro

New contributor

1

What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41

Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49

1

Do you get the same error message if you add a basic print "gestão" somewhere?
– usr2564301
Dec 27 '18 at 15:22

No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35

I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51

|
show 9 more comments

-1

I'm doing a fairly simple code that transforms European Portuguese input into Brazilian Portuguese -- so there are a lot of accented characters such as á,é,À,ç, etc.

Basically, the goal is to find words in the text from a list and replace them with the BR words from a second list.

Here's the code:

#-*- coding: latin-1 -*-



listapt=["gestão","utilizador","telemóvel"]

listabr=["gerenciamento", "usuário", "celular"]



while True:



    #this is all because I need to be able to input multiple lines of text, seems to be working fine 



    print ("Insert text")

    lines = 



    while True:

        line = raw_input()

        if line != "FIM":

            lines.append(line)

        else:

            break

    text = 'n'.join(lines)    



    for word in listapt:

        if word in text:

            num = listapt.index(word)

            wordbr = listabr[num]

            print(word + " --> " + wordbr) #just to show what changes were made

            text = text.replace(word, wordbr)



    print(text)

I run the code on Windows using IDLE and by double-clicking on the .py file.
The code works fine when using IDLE, but does not match and replace characters when double-clicking the .py file.

edited Dec 27 '18 at 17:34

Alastair McCormack

15k33860

asked Dec 27 '18 at 14:36

Pedro

New contributor

1

What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41

Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49

1

Do you get the same error message if you add a basic print "gestão" somewhere?
– usr2564301
Dec 27 '18 at 15:22

No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35

I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51

|
show 9 more comments

-1

I'm doing a fairly simple code that transforms European Portuguese input into Brazilian Portuguese -- so there are a lot of accented characters such as á,é,À,ç, etc.

Basically, the goal is to find words in the text from a list and replace them with the BR words from a second list.

Here's the code:

#-*- coding: latin-1 -*-



listapt=["gestão","utilizador","telemóvel"]

listabr=["gerenciamento", "usuário", "celular"]



while True:



    #this is all because I need to be able to input multiple lines of text, seems to be working fine 



    print ("Insert text")

    lines = 



    while True:

        line = raw_input()

        if line != "FIM":

            lines.append(line)

        else:

            break

    text = 'n'.join(lines)    



    for word in listapt:

        if word in text:

            num = listapt.index(word)

            wordbr = listabr[num]

            print(word + " --> " + wordbr) #just to show what changes were made

            text = text.replace(word, wordbr)



    print(text)

I run the code on Windows using IDLE and by double-clicking on the .py file.
The code works fine when using IDLE, but does not match and replace characters when double-clicking the .py file.

edited Dec 27 '18 at 17:34

Alastair McCormack

15k33860

asked Dec 27 '18 at 14:36

Pedro

New contributor

I'm doing a fairly simple code that transforms European Portuguese input into Brazilian Portuguese -- so there are a lot of accented characters such as á,é,À,ç, etc.

Basically, the goal is to find words in the text from a list and replace them with the BR words from a second list.

Here's the code:

#-*- coding: latin-1 -*-



listapt=["gestão","utilizador","telemóvel"]

listabr=["gerenciamento", "usuário", "celular"]



while True:



    #this is all because I need to be able to input multiple lines of text, seems to be working fine 



    print ("Insert text")

    lines = 



    while True:

        line = raw_input()

        if line != "FIM":

            lines.append(line)

        else:

            break

    text = 'n'.join(lines)    



    for word in listapt:

        if word in text:

            num = listapt.index(word)

            wordbr = listabr[num]

            print(word + " --> " + wordbr) #just to show what changes were made

            text = text.replace(word, wordbr)



    print(text)

I run the code on Windows using IDLE and by double-clicking on the .py file.
The code works fine when using IDLE, but does not match and replace characters when double-clicking the .py file.

python windows python-2.7 character-encoding

edited Dec 27 '18 at 17:34

Alastair McCormack

15k33860

asked Dec 27 '18 at 14:36

Pedro

New contributor

edited Dec 27 '18 at 17:34

Alastair McCormack

15k33860

asked Dec 27 '18 at 14:36

Pedro

New contributor

edited Dec 27 '18 at 17:34

Alastair McCormack

15k33860

edited Dec 27 '18 at 17:34

Alastair McCormack

15k33860

edited Dec 27 '18 at 17:34

Alastair McCormack

15k33860

asked Dec 27 '18 at 14:36

Pedro

New contributor

asked Dec 27 '18 at 14:36

Pedro

asked Dec 27 '18 at 14:36

Pedro

New contributor

Pedro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41

Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49

1

Do you get the same error message if you add a basic print "gestão" somewhere?
– usr2564301
Dec 27 '18 at 15:22

No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35

I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51

|
show 9 more comments

1

What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41

Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49

1

Do you get the same error message if you add a basic print "gestão" somewhere?
– usr2564301
Dec 27 '18 at 15:22

No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35

I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51

What version of python are you using? And can you edit your question to add the full traceback?
– snakecharmerb
Dec 27 '18 at 14:41

Oh, I'm sorry. This is Python 2.7.15 and I've just edited the question.
– Pedro
Dec 27 '18 at 14:49

Do you get the same error message if you add a basic print "gestão" somewhere?
– usr2564301
Dec 27 '18 at 15:22

No, I don't. That seems to work fine.
– Pedro
Dec 27 '18 at 15:35

I edited the question with a new problem -- it works through IDLE, but it doesn't when I run directly or convert to exe. Why?
– Pedro
Dec 27 '18 at 15:51

|
show 9 more comments

3 Answers
3

active

oldest

votes

Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:

Your code is UTF-8 encoded, not latin-1 encoded

IDLE always works in UTF-8 "input/output" mode.

On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.

When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).

The way to fix this is to make sure you're comparing "apples with apples". You could do this by converting everything to the same encoding. E.g. Convert the input read to UTF-8 so it matches the hardcoded strings. The better solution is to convert all [byte] strings to Unicode strings (Strings with no encoding). If you were on Python 3, this would be all automatic.

On Python 2.x, you need to do three things:

Prefix all sourcecode strings with u to make them Unicode strings:
```
listapt=[u"gestão",u"utilizador",u"telemóvel"]

listabr=[u"gerenciamento",u"usuário", u"celula]

...

if line != u"FIM":
```
Alternatively, add from __future__ import unicode_literals to avoid changing all your code.

Use the correct coding header for the encoding of your file. I suspect your header should read utf-8. E.g.
```
#-*- coding: utf-8 -*-
```

Convert the result of raw_input to Unicode. This must be done with the detected encoding of the standard input:
```
import sys

line = raw_input().decode(sys.stdin.encoding) 
```

By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.

words = { u"telemóvel": u"celula"}

edited Dec 27 '18 at 18:22

answered Dec 27 '18 at 17:13

Alastair McCormack

15k33860

OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21

Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37

There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51

add a comment |

I don't see that problem over here.

Based on your use of raw_input, it seems like you're using Python 2.x

This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.

Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.

This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.

The problem you have here is Python 2.x getting confused at some point while trying to translate between byte sequences (what Python 2.x strings contain, eg binary file contents), and human-meaningful text (unicode, eg for things like user informational display of chinese characters), because it makes incorrect assumptions about how human-readable text was encoded into the byte sequence seen in the Python strings.

It's a detail that Python 3 attempts to address a lot better/less ambiguously.

answered Dec 27 '18 at 15:22

David Purdy

New contributor

I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10

add a comment |

-1

First try executing the code below, it should resolve the issue:

# -*- coding: latin-1 -*-



listapt=[u"gestão",u"utilizador",u"telemóvel"]

listabr=[u"gerenciamento",u"usuário", u"celular"]



lines=

line = raw_input()

line = line.decode('latin-1')

if line != "FIM":

    lines.append(line)



text = u'n'.join(lines)    



for word in listapt:

    if word in text:

        print("Hello")

        num = listapt.index(word)

        print(num)

        wordbr = listabr[num]

        print(wordbr)

edited Dec 27 '18 at 15:52

answered Dec 27 '18 at 15:21

Ankur Goel

1298

New contributor

Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Pedro is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946732%2fcomparison-of-non-ascii-only-works-in-idle%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:

Your code is UTF-8 encoded, not latin-1 encoded

IDLE always works in UTF-8 "input/output" mode.

On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.

When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).

On Python 2.x, you need to do three things:

Prefix all sourcecode strings with u to make them Unicode strings:
```
listapt=[u"gestão",u"utilizador",u"telemóvel"]

listabr=[u"gerenciamento",u"usuário", u"celula]

...

if line != u"FIM":
```
Alternatively, add from __future__ import unicode_literals to avoid changing all your code.

Use the correct coding header for the encoding of your file. I suspect your header should read utf-8. E.g.
```
#-*- coding: utf-8 -*-
```

Convert the result of raw_input to Unicode. This must be done with the detected encoding of the standard input:
```
import sys

line = raw_input().decode(sys.stdin.encoding) 
```

By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.

words = { u"telemóvel": u"celula"}

edited Dec 27 '18 at 18:22

answered Dec 27 '18 at 17:13

Alastair McCormack

15k33860

OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21

Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37

There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51

add a comment |

Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:

Your code is UTF-8 encoded, not latin-1 encoded

IDLE always works in UTF-8 "input/output" mode.

On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.

When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).

On Python 2.x, you need to do three things:

Prefix all sourcecode strings with u to make them Unicode strings:
```
listapt=[u"gestão",u"utilizador",u"telemóvel"]

listabr=[u"gerenciamento",u"usuário", u"celula]

...

if line != u"FIM":
```
Alternatively, add from __future__ import unicode_literals to avoid changing all your code.

Use the correct coding header for the encoding of your file. I suspect your header should read utf-8. E.g.
```
#-*- coding: utf-8 -*-
```

Convert the result of raw_input to Unicode. This must be done with the detected encoding of the standard input:
```
import sys

line = raw_input().decode(sys.stdin.encoding) 
```

By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.

words = { u"telemóvel": u"celula"}

edited Dec 27 '18 at 18:22

answered Dec 27 '18 at 17:13

Alastair McCormack

15k33860

OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21

Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37

There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51

add a comment |

Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:

Your code is UTF-8 encoded, not latin-1 encoded

IDLE always works in UTF-8 "input/output" mode.

On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.

When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).

On Python 2.x, you need to do three things:

Prefix all sourcecode strings with u to make them Unicode strings:
```
listapt=[u"gestão",u"utilizador",u"telemóvel"]

listabr=[u"gerenciamento",u"usuário", u"celula]

...

if line != u"FIM":
```
Alternatively, add from __future__ import unicode_literals to avoid changing all your code.

Use the correct coding header for the encoding of your file. I suspect your header should read utf-8. E.g.
```
#-*- coding: utf-8 -*-
```

Convert the result of raw_input to Unicode. This must be done with the detected encoding of the standard input:
```
import sys

line = raw_input().decode(sys.stdin.encoding) 
```

By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.

words = { u"telemóvel": u"celula"}

edited Dec 27 '18 at 18:22

answered Dec 27 '18 at 17:13

Alastair McCormack

15k33860

Here's why the code works as expected in IDLE but not from CMD or by doubleclicking:

Your code is UTF-8 encoded, not latin-1 encoded

IDLE always works in UTF-8 "input/output" mode.

On Windows, CMD/Doubleclicking will use a non-UTF-8 8bit locale.

When your code compares the input to the hardcoded strings it's doing so at a byte level. On IDLE, it's comparing UTF-8 to hardcoded UTF-8. On CMD, it's comparing non-UTF-8 8bit to hardcoded UTF-8 (If you were on a stock MacOS, it would also work).

On Python 2.x, you need to do three things:

Prefix all sourcecode strings with u to make them Unicode strings:
```
listapt=[u"gestão",u"utilizador",u"telemóvel"]

listabr=[u"gerenciamento",u"usuário", u"celula]

...

if line != u"FIM":
```
Alternatively, add from __future__ import unicode_literals to avoid changing all your code.

Use the correct coding header for the encoding of your file. I suspect your header should read utf-8. E.g.
```
#-*- coding: utf-8 -*-
```

Convert the result of raw_input to Unicode. This must be done with the detected encoding of the standard input:
```
import sys

line = raw_input().decode(sys.stdin.encoding) 
```

By the way, the better way to model list of words to replace it to use a dict. The keys are the original word, the value is the replacement. E.g.

words = { u"telemóvel": u"celula"}

edited Dec 27 '18 at 18:22

answered Dec 27 '18 at 17:13

Alastair McCormack

15k33860

edited Dec 27 '18 at 18:22

answered Dec 27 '18 at 17:13

Alastair McCormack

15k33860

answered Dec 27 '18 at 17:13

Alastair McCormack

15k33860

answered Dec 27 '18 at 17:13

Alastair McCormack

15k33860

OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21

Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37

There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51

add a comment |

OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21

Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37

There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51

OMG YES! It worked! Thank you! Regarding your last note, that's a very good tip, thanks for that too!
– Pedro
Dec 27 '18 at 18:21

Note that the Windows console is UTF-16, but by default Python 2 reads best-fit (non-strict) byte strings from the console. This uses the console's current input codepage, which defaults to the system locale's OEM codepage (e.g. 850 in Western Europe). You'll read mojibake nonsense for all characters that aren't defined in this codepage (e.g. "αβγδε" -> "aß?de"). The only reliable solution is to use the console's Unicode API (e.g. ReadConsoleW), as Python 3.6+ does. In Python 2 you can install and enable the win_unicode_console package.
– eryksun
Dec 27 '18 at 23:37

There's no reason to mention CMD in this answer since the OP runs the script by double-clicking on the file. Bringing CMD into the discussion contributes to confusion that the console and CMD are the same thing. cmd.exe and python.exe are both console applications, which either inherit or allocate a console window that's implemented by the system (by conhost.exe in Windows 7+, but it's an implementation detail).
– eryksun
Dec 27 '18 at 23:51

add a comment |

I don't see that problem over here.

Based on your use of raw_input, it seems like you're using Python 2.x

This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.

Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.

This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.

It's a detail that Python 3 attempts to address a lot better/less ambiguously.

answered Dec 27 '18 at 15:22

David Purdy

New contributor

I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10

add a comment |

I don't see that problem over here.

Based on your use of raw_input, it seems like you're using Python 2.x

This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.

Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.

This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.

It's a detail that Python 3 attempts to address a lot better/less ambiguously.

answered Dec 27 '18 at 15:22

David Purdy

New contributor

I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10

add a comment |

I don't see that problem over here.

Based on your use of raw_input, it seems like you're using Python 2.x

This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.

Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.

This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.

It's a detail that Python 3 attempts to address a lot better/less ambiguously.

answered Dec 27 '18 at 15:22

David Purdy

New contributor

I don't see that problem over here.

Based on your use of raw_input, it seems like you're using Python 2.x

This may be because I'm copypasting off of stack overflow, and have a different dev environment to you.

Try running your script under the latest Python 3 interpreter, as well as removing the "#-*- coding:" line.

This should either hit UnicodeDecodeError issues a lot sooner in your code, or work fine.

It's a detail that Python 3 attempts to address a lot better/less ambiguously.

answered Dec 27 '18 at 15:22

David Purdy

New contributor

answered Dec 27 '18 at 15:22

David Purdy

New contributor

answered Dec 27 '18 at 15:22

David Purdy

answered Dec 27 '18 at 15:22

David Purdy

New contributor

David Purdy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10

add a comment |

I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10

I tried that and I can actually run it in IDLE, but it doesn't work if I double click the file or open from the cmd. The final goal of this is probably to create a sharable executable file, so running from IDLE isn't enough. Why is this happening?
– Pedro
Dec 27 '18 at 16:10

add a comment |

-1

First try executing the code below, it should resolve the issue:

# -*- coding: latin-1 -*-



listapt=[u"gestão",u"utilizador",u"telemóvel"]

listabr=[u"gerenciamento",u"usuário", u"celular"]



lines=

line = raw_input()

line = line.decode('latin-1')

if line != "FIM":

    lines.append(line)



text = u'n'.join(lines)    



for word in listapt:

    if word in text:

        print("Hello")

        num = listapt.index(word)

        print(num)

        wordbr = listabr[num]

        print(wordbr)

edited Dec 27 '18 at 15:52

answered Dec 27 '18 at 15:21

Ankur Goel

1298

New contributor

Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02

add a comment |

-1

First try executing the code below, it should resolve the issue:

# -*- coding: latin-1 -*-



listapt=[u"gestão",u"utilizador",u"telemóvel"]

listabr=[u"gerenciamento",u"usuário", u"celular"]



lines=

line = raw_input()

line = line.decode('latin-1')

if line != "FIM":

    lines.append(line)



text = u'n'.join(lines)    



for word in listapt:

    if word in text:

        print("Hello")

        num = listapt.index(word)

        print(num)

        wordbr = listabr[num]

        print(wordbr)

edited Dec 27 '18 at 15:52

answered Dec 27 '18 at 15:21

Ankur Goel

1298

New contributor

Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02

add a comment |

-1

First try executing the code below, it should resolve the issue:

# -*- coding: latin-1 -*-



listapt=[u"gestão",u"utilizador",u"telemóvel"]

listabr=[u"gerenciamento",u"usuário", u"celular"]



lines=

line = raw_input()

line = line.decode('latin-1')

if line != "FIM":

    lines.append(line)



text = u'n'.join(lines)    



for word in listapt:

    if word in text:

        print("Hello")

        num = listapt.index(word)

        print(num)

        wordbr = listabr[num]

        print(wordbr)

edited Dec 27 '18 at 15:52

answered Dec 27 '18 at 15:21

Ankur Goel

1298

New contributor

First try executing the code below, it should resolve the issue:

# -*- coding: latin-1 -*-



listapt=[u"gestão",u"utilizador",u"telemóvel"]

listabr=[u"gerenciamento",u"usuário", u"celular"]



lines=

line = raw_input()

line = line.decode('latin-1')

if line != "FIM":

    lines.append(line)



text = u'n'.join(lines)    



for word in listapt:

    if word in text:

        print("Hello")

        num = listapt.index(word)

        print(num)

        wordbr = listabr[num]

        print(wordbr)

edited Dec 27 '18 at 15:52

answered Dec 27 '18 at 15:21

Ankur Goel

1298

New contributor

edited Dec 27 '18 at 15:52

answered Dec 27 '18 at 15:21

Ankur Goel

1298

New contributor

answered Dec 27 '18 at 15:21

Ankur Goel

1298

answered Dec 27 '18 at 15:21

Ankur Goel

1298

New contributor

Ankur Goel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02

add a comment |

Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02

Do not hardcode decode - this assumes the terminal is using latin-1. This may work for the OP when they're double clicking or running from CMD. It won't work from IDLE.
– Alastair McCormack
Dec 27 '18 at 16:02

add a comment |

Pedro is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Pedro is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk