Skip to content

Conversation

@naotoj
Copy link
Member

@naotoj naotoj commented May 16, 2025

java.io.Console uses the charset specified by the stdout.encoding system property for both input and output. While this is generally sufficient, since Console is intended for interactive terminal use, some platforms allow different encodings to be configured for input and output. In such cases, using a single encoding may lead to incorrect behavior when reading from the terminal. To address this, the newly introduced system property, stdin.encoding, should be used specifically for input where appropriate.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8357164 to be approved

Issues

  • JDK-8356985: Use "stdin.encoding" in Console's read*() methods (Enhancement - P4)
  • JDK-8357164: Use "stdin.encoding" in Console's read*() methods (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25271/head:pull/25271
$ git checkout pull/25271

Update a local copy of the PR:
$ git checkout pull/25271
$ git pull https://git.openjdk.org/jdk.git pull/25271/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25271

View PR using the GUI difftool:
$ git pr show -t 25271

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25271.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 16, 2025

👋 Welcome back naoto! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 16, 2025

@naotoj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8356985: Use "stdin.encoding" in Console's read*() methods

Reviewed-by: jlu, smarks, alanb, vyazici

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 158 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label May 16, 2025
@openjdk
Copy link

openjdk bot commented May 16, 2025

@naotoj The following labels will be automatically applied to this pull request:

  • core-libs
  • kulla

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@naotoj naotoj marked this pull request as ready for review May 17, 2025 00:08
@openjdk openjdk bot added the rfr Pull request is ready for review label May 17, 2025
@mlbridge
Copy link

mlbridge bot commented May 17, 2025

Webrevs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright year update is missing.


try {
Terminal terminal = TerminalBuilder.builder().encoding(charset)
Terminal terminal = TerminalBuilder.builder().encoding(outCharset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't ideally JdkConsole::charset and Terminal::encoding be adapted for stdin/stdout variants?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also noticed DumbTerminalProvider::sysTerminal calls DumbTerminal with new FileInputStream(FileDescriptor.in). Later on DumbTerminal applies encoding() both for passed stdin and std{out,err}. In short, TerminalProvider might need to undergo a similar refactoring separating input and output encodings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All FileDescriptor.in encounters in jdk.internal.org.jline.terminal that might need attention:

src/jdk.internal.le/share/classes/jdk/internal/org/jline/terminal/impl/DumbTerminalProvider.java
src/jdk.internal.le/share/classes/jdk/internal/org/jline/terminal/impl/exec/ExecPty.java
src/jdk.internal.le/share/classes/jdk/internal/org/jline/terminal/impl/ffm/FfmTerminalProvider.java

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JLine is a 3rd party library. It would be desirable that they change their implementation to separately handle in/out in their terminal, but that is out of scope of this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JLine seems to incorporate the stdin encoding already, which is quick!

Comment on lines 558 to 560
static final Charset STDIN_CHARSET =
Charset.forName(System.getProperty("stdin.encoding"), UTF_8.INSTANCE);
static final Charset STDOUT_CHARSET =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these can be marked private?

Suggested change
static final Charset STDIN_CHARSET =
Charset.forName(System.getProperty("stdin.encoding"), UTF_8.INSTANCE);
static final Charset STDOUT_CHARSET =
private static final Charset STDIN_CHARSET =
Charset.forName(System.getProperty("stdin.encoding"), UTF_8.INSTANCE);
private static final Charset STDOUT_CHARSET =

public class StdinEncodingTest {

@Test
@EnabledOnOs({OS.LINUX, OS.MAC})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we replace @EnabledOnOs with the @requires (os.family == "linux" | os.family == "mac") JTreg directive instead per JDK-8211673?

protected CoderResult decodeLoop(ByteBuffer in, CharBuffer out) {
while (in.remaining() > 0) {
char c = (char)in.get();
if (c != '\n') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Character.toUpperCase('\n') == '\n', not? If so, do we still need this if-branching?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminating newlines was a hack to ignore them in the expect responses so that it can check the combined output in one shot. Now I think it is better to check the result separately, so modified as such.

import java.util.Iterator;

// A test charset provider that decodes every input byte into its uppercase
public class MockCharsetProvider extends CharsetProvider {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe a more self-explanatory name, e.g., UpperCasingCharsetProvider?

* @build csp/*
* @run junit StdinEncodingTest
*/
public class StdinEncodingTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, there is no similar test (e.g., one using a mock CharsetProvider) for stdout.encoding. Will it be addressed by another ticket? Shall we consider adding a similar StdoutEncodingTest too? (Not necessarily in this PR.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdout.encoding validity is tested through the public charset() mehtod, which is in CharsetTest.java

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed there are stdout.encoding tests in test/jdk/java/io/Console, yet none1 that thoroughly tests them with expect and a dedicated (mock) CharsetProvider as you did here. FWIW, I really liked your new test using a mock CharsetProvider in combination with expect, hence my question for doing same for stdout and stderr too.

For the record, AFAICT, there are no tests for stderr.encoding.

1 There is script.exp, but it tests sun.stdout.encoding, not stdout.encoding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually providing mock charset was a workaround of not having public method for getting the input encoding. I think it would be an overkill to introduce a new public method because it will not be used much, as most cases are suffice with the existing one (Console is used for interactive user enviornment, and I don't believe users would like to see different characters displayed for the input).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is script.exp, but it tests sun.stdout.encoding, not stdout.encoding.

It is addressed in this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the JBS bug ID needs to be added to CharsetTest.java as well then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugid added to the test. Additionally replaced testsrc/jdk/classes with Utils ones

* stdin.encoding} differs from {@link System##stdout.encoding
* stdout.encoding}, in which case read operations use the {@code Charset}
* designated by {@code stdin.encoding}.
* <p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Console.charset() states "The returned charset is used for interpreting the input and output source (e.g., keyboard and/or display) specified by the host environment or user, which defaults to the one based on stdout.encoding." If stdin.encoding is set otherwise, this is no longer true, so I think this method may need a wording update as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Brought the same wording to the charset() method description for further clarification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the actual behavior here. What might be helpful is to divide the discussion between a) what charsets get used for input and output, and b) the return value of the charset() method.

I'm not entirely sure, but since stdin.encoding and stdout.encoding are always set to something -- whether it comes from the platform or the command line -- won't Console just use stdin.encoding for input and stdout.encoding for output? If this is true, maybe just say this instead of deferring to the charset() method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with Stuart and it would be better to say that stdin.encoding is used for reading, and stdout.encoding for writing. They are usually the same but if they differ then Console will return the charset for output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I reworded the descriptions of the class and charset() method.

/**
* @test
* @bug 8356985
* @summary Tests if "stdin.encoding" is reflected for reading
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be helpful to include why the test is limited to only linux and mac,

"expect" command in Windows/Cygwin does not work as expected. Ignoring tests on Windows.

}

// invoking "expect" command
var testSrc = System.getProperty("test.src", ".");
Copy link
Member

@justin-curtis-lu justin-curtis-lu May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test already imports the JDK test lib, can we just replace this and the below ocurrences with jdk.test.lib.Utils.TEST_SRC/JDK/CLASSES directly?

public void testStdinEncoding() throws Throwable {
// check "expect" command availability
var expect = Paths.get("/usr/bin/expect");
if (!Files.exists(expect) || !Files.isExecutable(expect)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use Assumptions.assumeTrue here. Condition becomes more readable as: Files.exists(expect) && Files.isExecutable(expect)

Copy link
Member

@justin-curtis-lu justin-curtis-lu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New wording is straightforward and looks good to me.

@stuart-marks
Copy link
Member

I may be overthinking this, but let me just toss this out there to get people's opinions.

This changes the API of jdk.internal.io.JdkConsoleProvider. OK, this is an internal interface, so strictly speaking we can change it at will. But... do the IDEs use this to interface to implement their in-IDE terminal emulator? I'm not sure; maybe we should ask @lahodaj .

Well, the IDEs can probably easily adapt based on the JDK version. Or, if we want to be nice, we can leave the two-arg console method in place and have a default implementation for the three-arg console method that just calls the two-arg method with one of the charsets. Might not be worth it though.

@naotoj
Copy link
Member Author

naotoj commented May 23, 2025

But... do the IDEs use this to interface to implement their in-IDE terminal emulator?

Can they even possibly do so? java.base's module-info exports jdk.internal.io only to jdk.internal.le and jdk.jshell, so I think no other modules can implement it and be loaded at runtime.

@stuart-marks
Copy link
Member

Can they even possibly do so?

Sure, as far as I know, IntelliJ IDEA runs on its own version of the JDK, and it can easily be invoked or modified to allow use of those modules. I took a quick look at NetBeans and didn't see any references to jdk.internal.io though, so maybe I'm off base in thinking that they might be using it. I did have the impression that at least one of the IDEs had its own console provider implementation but this is anecdotal and I don't have any evidence of this.

No need to worry about this now, then. Sorry for the distraction.

@lahodaj
Copy link
Contributor

lahodaj commented May 23, 2025

Can they even possibly do so?

Sure, as far as I know, IntelliJ IDEA runs on its own version of the JDK, and it can easily be invoked or modified to allow use of those modules. I took a quick look at NetBeans and didn't see any references to jdk.internal.io though, so maybe I'm off base in thinking that they might be using it. I did have the impression that at least one of the IDEs had its own console provider implementation but this is anecdotal and I don't have any evidence of this.

FWIW, I am not aware about NetBeans using JdkConsoleProvider.

@naotoj
Copy link
Member Author

naotoj commented May 23, 2025

IntelliJ seems to be going with -Djdk.console=jdk.internal.le option, instead of hacking into the system: https://mastodon.online/@tagir_valeev/109981464706470933

Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec update and src changes looks good.

I only skimmed through the test changes (not a detailed review) and they look reasonable. I assume you've done some "test repeat" jobs to ensure that any variance in the version of "expect" on test machines doesn't cause any issues.

@naotoj
Copy link
Member Author

naotoj commented May 27, 2025

I assume you've done some "test repeat" jobs to ensure that any variance in the version of "expect" on test machines doesn't cause any issues.

Actually I have not. Thanks for reminding. So I ran this through our internal CI on Linux x64, macOS x64, and macOS AArch64. Across 10 tasks (i.e., machines), each with 30 repetitions, that’s a total of 900 test runs—all completed without failure.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 27, 2025
@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label May 27, 2025
@naotoj
Copy link
Member Author

naotoj commented May 28, 2025

Thanks for the reviews!
/integrate

@openjdk
Copy link

openjdk bot commented May 28, 2025

Going to push as commit b2a61a9.
Since your change was applied there have been 195 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 28, 2025
@openjdk openjdk bot closed this May 28, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 28, 2025
@openjdk
Copy link

openjdk bot commented May 28, 2025

@naotoj Pushed as commit b2a61a9.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants