Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Atom script package requires script execution screen encoding in Java #1166

Closed
nililia opened this issue Nov 30, 2016 · 20 comments · Fixed by #2421 or #2470
Closed

The Atom script package requires script execution screen encoding in Java #1166

nililia opened this issue Nov 30, 2016 · 20 comments · Fixed by #2421 or #2470
Labels

Comments

@nililia
Copy link

nililia commented Nov 30, 2016

windows cmd

c:\java project>javac test.java
오류: 기본 클래스 test.java을(를) 찾거나 로드할 수 없습니다.

atom script execution screen

����: �⺻ Ŭ���� test��(��) ã�ų� �ε��� �� �����ϴ�.
[Finished in 1.127s]
@nixel2007
Copy link
Collaborator

You need to handle unicode strings properly. Here is a issue about python #701

@nililia
Copy link
Author

nililia commented Nov 30, 2016

It is not the programming language encoding, but the encoding of the console.
Should be able to change to cp949 or euc-kr or another.

@nixel2007
Copy link
Collaborator

We can't change the encoding of console. it runs through cmd or bash (for java) with default system encoding

@MrYann
Copy link

MrYann commented Dec 1, 2016

That might be the problem : console is not in default system encoding, as when the tests are executed in the terminal, they do work.
I have the same problem with python scripts: if I use unicode characters in write commands or simply in the filename, I get UnicodeEncodeError when compiling through atom. No error when compiling directly in terminal.

So if anyone has any insight as how to correct atom, it is welcome.

Note that I have the same issue in sublimetext, so this really is a core issue as to how they execute shell commands.

@dbolton
Copy link

dbolton commented Dec 1, 2016

It looks like atom-script on Windows uses cp1252 (Windows 1252) encoding by default instead of utf-8.

As a work-around you can specify the encoding as utf-8 in your code for the system out and error streams:

import sys
import io

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding = 'utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding = 'utf-8')

print ('汉语/漢語')

@onitonitonito
Copy link

onitonitonito commented Nov 18, 2017

@dboltonthanks 👍 , it is very helpful despite this is the temporary measure of code. Because the author @rgbkrk saids on #214 that he couldn't afford to manage this, I have to use this so far.
based on a WINDOWS-7, the system encoding is 'cp949' so that it causes this kind of inconvenience.
if you set those things the same, the problems would be cleared.

import io
import sys
# sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding = 'utf-8')
# sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding = 'utf-8')

def test0_system_stdout_environment():
    """ Test this with/without stdout, err changing above """

    string = u'안녕세계'
    print(string)
    print(sys.getdefaultencoding())
    print(sys.stdout.encoding)
test0_system_stdout_environment()

BEFORE :

안녕세계 = �ȳ缼��
sys.getdefaultencoding() = utf-8
sys.stdout.encoding = cp949        ---> change to 'utf-8'

AFTER :

안녕세계 = 안녕세계
sys.getdefaultencoding() = utf-8
sys.stdout.encoding = utf-8 

@github-mi
Copy link

github-mi commented Dec 26, 2017

I solved this problem by adding an environment variable

PYTHONIOENCODING=utf-8

You can press Ctrl+Alt+Shift+O to add it.

@yuebinyun
Copy link

@github-mi cool!

@ryuci
Copy link

ryuci commented Feb 6, 2018

@dbolton Bull's eye! Thank you for your very good solution!

@ShenDezhou
Copy link

@github-mi very nice solution!

@joelhellman
Copy link

Mind the workaround to detach the stdout and stderr might interfer with python's logging module, for me it throws an exception that the underlying buffer has been detached.

@ghost
Copy link

ghost commented Nov 2, 2018

@github-mi
Thank you for the hint, but how can I set this environment variable permanently? For now, I have to add this environment variable every time I restart Atom.

@MrBrN197
Copy link

@MaciekZar You setup up an actual environment variable in your windows system.

@IgorFomenko
Copy link

PYTHONIOENCODING=utf-8 is OK for python but what to do if I use javascript?
I have the same issue.

@mhatano
Copy link
Contributor

mhatano commented Apr 20, 2021

Atom-script invokes cmd.exe underneath and does not care about its codepage. As a result, cmd.exe is invoked with its system default codepage and atom-script is treated them as utf-8 but they do not match each other. Typical commands like java.exe and javac.exe output its messages in cmd.exe codepage (for example, in Japanese systems, it's cp932/Shift-JIS) so, as a result, in the atom-script tab it would display garbled characters.

If we would provide "process.env.JAVA_TOOL_OPTIONS="-Dfile.encoding=utf8"' in init.coffee, most of the problem would be gone, as javac.exe outputs warning messages in utf8, it treats its source codes in utf8 encoding and java.exe also outputs its standard output in utf8. There is a remaining problem, that original Java program might assume system's default file.encoding (in Japanese system, its cp932/Shift-JIS, again) but, in this environment, its would be changed into utf8. If, only if, the software writes some text files under and out of the atom-script execution environment, these two will produce incompatible output text files.

(btw, Packages -> Script -> Configure Script >> providing 'JAVA_TOOL_OPTIONS="-Dfile.encoding=utf8"' in "Environment Variables:" would produce messages like:

Picked up JAVA_TOOL_OPTIONS: "-Dfile.encoding
Unmatched quote in JAVA_TOOL_OPTIONS
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

which means that this options cannot handle quoted equals sign properly, and this should be addressed as well, I suppose.)

There is one more difficult point. Windows system (at least, Japanese edition) would sometimes assume utf8 text files having BOM at the beginning. (For example, Excel would not accept utf8 csv files if they don't have BOM, while it would accept Shift-JIS csv's without any problems.)

So, this strategy, using all encoding default as utf8, will sometimes create serious bugs.

I have tried to go alternative way by putting following lines into init.coffee,

process.env.JDK_JAVAC_OPTIONS="-encoding utf8"
process.env.JDK_JAVA_OPTIONS="-Dsun.stdout.encoding=utf8"

but, this could not change javac.exe's warning messages put in shift-jis, so it did not work. (Plus, I could not find ways to change javac.exe's warning messages encoding with its command-line options.)

Another way to solve this is, change cmd.exe default to codepage 65001 (unicode) but, it would require editing registry and changing whole system default would be too much and overkill.

To solve this problem, it is recommended to modify atom-script to use current codepage value of cmd.exe to display its output, or, maybe modify it to do "chcp 65001" at the initialization of the cmd.exe. I am totally new to this so I cannot point out exact code to change, and I would like to apology, but, I hope it would add something to the project.

@github-actions
Copy link

🎉 This issue has been resolved in version 3.31.3 🎉

The release is available on:

Your semantic-release bot 📦🚀

@mhatano
Copy link
Contributor

mhatano commented May 11, 2021

Fix in #2421 only fixed python-encoding and won't fix original issue like java.exe/javac.exe encoding mismatch in build results tab. I have just updated to 3.31.3, tested and am seeing still such garbled characters displayed.
image

@aminya aminya reopened this May 11, 2021
@aminya aminya changed the title The Atom script package requires script execution screen encoding. The Atom script package requires script execution screen encoding in Java May 11, 2021
@aminya aminya added bug and removed released labels May 11, 2021
@aminya
Copy link
Member

aminya commented May 11, 2021

The code for the Java runner is here. Any contribution to fix the issue for Java is greatly appreciated. I spent a long time trying to fix this, but since I am not a Java developer I don't know how I can fix this issue without it breaking other things.

if (windows) {
return [`/c javac -Xlint ${context.filename} && java ${className}`]
} else {
return [
"-c",
`javac -J-Dfile.encoding=UTF-8 -sourcepath '${sourcePath}' -d /tmp '${context.filepath}' && java -Dfile.encoding=UTF-8 -cp /tmp:%CLASSPATH ${classPackages}${className}`,
]
}

Here is the commit that I made and reverted later:
1798f08

Quoting rules for javac and java is an issue. Quoting by ' doesn't seem to work on Windows, but using " for quoting in an external shell works on Windows.

@mhatano
Copy link
Contributor

mhatano commented May 11, 2021

Thanks for indicating the spot. I will try reading codes, but as I have said, I am not familiar with the product, while I am a Java developer, so it may take time.

mhatano added a commit to mhatano/atom-script that referenced this issue May 12, 2021
Quick fix for the issue atom-community#1166, Java build in non-English editions of Window will result garbled message, stdout and stderr displayed.
This fix would have further issue: if user's final target deployment would have different encodings from all-UTF-8 setup, (such as "normal" Windows cmd.exe, they normally have ANSI encodings for average users) so testing and evaluating might become more difficult.
@aminya aminya linked a pull request May 12, 2021 that will close this issue
@mhatano
Copy link
Contributor

mhatano commented May 12, 2021

looks like required options are removed from merge, and previous behaviors happens again now.... I can resubmit fix and it would work with Windows, but, I will check Linux later

aminya pushed a commit that referenced this issue May 13, 2021
Co-authored-by: weather-tracker <[email protected]>
Co-authored-by: Amin Yahyaabadi <[email protected]>

* Adding back '-J-Dfile.encoding=UTF-8' for javac so its warning messages are printed in UTF-8
as well as its source files are treated as UTF-8 encoding.
* Adding back -Xlint for reporting detailed warning messages for compilers
* removing quotation marks around file.encoding and UTF-8 for java command, 
since java command does not understand quotation marks when it is not removed by the shell 
(cmd.exe won't interpret that)

update for issue #1166
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet