Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make some JavaThemis exceptions unchecked #563

Merged
merged 5 commits into from
Dec 6, 2019

Conversation

ilammy
Copy link
Collaborator

@ilammy ilammy commented Dec 4, 2019

This is a preparatory change for symmetric key generation API. It's not strictly required but I'd like to set this right while we can. It was motivated by addition of new classes and interfaces which deal with keys. In Java keys are stored as byte[] arrays that must not be null and must not be empty. If they are then NullArgumentException and InvalidArgumentException may be thrown (these are our custom exceptions from com.cossacklabs.themis package). The issue here is that they are checked exceptions and this is inconvenient: I had to duplicate throws NullArgumentException, InvalidArgumentException in about a dozen of places for no reason and our users will have to do the same with every method where they use keys (even if they use them correctly).

So I opened this can of worms...

What changes in existing Java apps if this is merged?

Nothing. If they worked before you may leave them unchanged and they will be fine.

However, if you write new code or refactoring something then the following is applicable:

  • NullArgumentException, InvalidArgumentException, KeyGenerationException are now unchecked exceptions. You do not need to specify them in throws clauses anymore. Normally you don't need to catch these errors specifically if you use API correctly.

  • java.io.UnsupportedEncodingException is now not thrown by String-based API, you can stop checking for that.

  • SecureCompare methods now throw RuntimeException on internal failures instead of checked SecureCompareException. They should not fail under normal circumstances.

    All methods will also throw unchecked NullArgumentException and InvalidArgumentException instead of SecureCompareException if you provide invalid arguments.

    proceed() method may fail with checked SecureCompareException, as before.

  • SecureSession methods also use RuntimeException for internal failures instead of checked SecureSessionException in most places. Ditto for invalid argument checks.

    wrap(), unwrap(), and load() may still throw checked SecureSessionException.

  • SecureTransportSession also does not throw SecureSessionException anymore.

Since we don't test with Kotlin I have no idea about the API impact, but I believe it's fine 🤞
Kotlin does not use checked exceptions, they are all unchecked there (in Java sense).

Detailed rant

Excerpt from commit messages...

Checked exceptions are commonly considered a design mistake in Java standard library [1, 2, 3, 4]. They have some limited uses, but generally they do not provide safe and ergonomic interface for the users because they are often misunderstood. It is common to use checked exceptions in situations where unchecked exceptions are more appropriate. This leads to exceptions that cannot be handled by the application and it is really common to ignore ("fix") them by adding throws Exception specifier because you can't be bothered to enumerate a dozen of concrete classes of internal library errors. Either way it is not better than using unchecked exceptions in the first place.

Unchecked exceptions are intended for system faults and programming errors which are generally not recoverable by the application [5]. For example, using a method incorrectly, not checking for null when it should be checked for, providing invalid argument values, etc. These are generally not under control of the application, in the sense that the user cannot do something to avoid this failure other than fixing application logic (or, well, circumventing the faulty code path).

Checked exceptions are intended to be used for recoverable errors [5]. That is, something that the application can control and influence. For example, invalid user input can cause a checked exception because this is an expected situation (the user can make mistakes and this is not something that application can generally prevent in code). However, the application can control the user and instruct them to give correct input, or at least tell the user that the input is incorrect. That is, the application is expected to always handle checked exception in some way, and there should be a catch clause somewhere in the middle, not only on the top level for possible run-time exceptions.

With this in mind, convert the following com.cossacklabs.themis exceptions into runtime, unchecked variants:

  • NullArgumentException is used to indicate that the programmer passed a null value where it should not be passed, and this will cause a NullPointerException later. In fact, NullArgumentException is now a subclass of NullPointerException as it indicates misuse of null values.

  • InvalidArgumentException is used to indicate that the programmer has passed an invalid argument to the method: an unknown constant, an empty buffer that must not be empty, or some other violation of the method contract. It is a subclass of IllegalArgumentException now.

  • KeyGenerationException it is used to indicate internal failure in Themis key generation routines. This is not something under user control and key generation does not have any input that can change its behavior. This error can be caused either by a bug in Themis Core or by a misconfigured cryptographic backed. Both situations cannot be recovered by the application.

Note, however, that SecureCellException, SecureMessageWrapException, SecureSessionException, and SecureCompareException are still checked. These exceptions are used to indicate issues with user-provided data: key material, data messages, etc. These are expected failures and should be handled by the application somehow.

Checklist

  • Change is covered by automated tests
  • Benchmark results are attached (not applicable)
  • The coding guidelines are followed
  • Public API has proper documentation (not applicable)
  • Example projects and code samples are updated (not applicable)
  • Changelog is updated

Checked exceptions are commonly considered a design mistake in Java
standard library [1]. They have some limited uses, but generally they
do not provide safe and ergonomic interface for the users because they
are often misunderstood. Is is common to use checked exceptions in
situations where unchecked exceptions are more approriate. This leads
to exceptions that cannot be handled by the application and it is really
common to ignore ("fix") them by adding "throws Exception" specifier
because you can't be bothered to enumerate a dozen of concrete classes
of internal library errors. Either way it is not better than using
unchecked exceptions in the first place.

Unchecked exceptions are intended for system faults and programming
errors which are generally not recoverable by the application [2].
For example, using a method incorrectly, not checking for "null" when
it should be checked for, providing invalid argument values, etc.
These are generally not under contol of the application, in the sense
that the user cannot do something to avoid this failure other than
fixing application logic (or, well, circumenting the faulty code path).

Checked exceptions are intended to be used for recoverable errors [2].
That is, something that the application can control and influence.
For example, invalid user input can cause a checked exception because
this is an expected situation (the user can make mistakes and this is
not something that application can generally prevent in code). However,
the application can control the user and instruct them to give correct
input, or at least tell the user that theinput is incorrect. That is,
the application is expected to always handle checked exception in some
way, and there should be a catch clause somewhere in the middle, not
only on the top level for possible run-time exceptions.

[1]: http://literatejava.com/exceptions/checked-exceptions-javas-biggest-mistake/
[2]: https://docs.oracle.com/javase/tutorial/essential/exceptions/runtime.html

With this in mind, convert the following com.cossacklabs.themis
exceptions into runtime, unchecked variants:

- NullArgumentException - it is used to indicate that the programmer
  passed a null value where it should not be passed, and this will
  cause a NullPointerException later. In fact, NullArgumentException
  is now a subclass of NullPointerException as it indicates misuse
  of null values.

- InvalidArgumentException - it is used to indicate that the programmer
  has passed an invalid argument to the method: an unknown constant,
  an empty buffer that must not be empty, or some other violation of
  the method contract. It is a subclass of IllegalArgumentException now.

- KeyGenerationException - it is used to indicate internal failure in
  Themis key generation routines. This is not something under user
  control and key generation does not have any input that can change
  its behavior. This error can be caused either by a bug in Themis Core
  or by a misconfigured cryptographic backed. Both situations cannot be
  recovered by the application.

Note, however, that SecureCellException, SecureMessageWrapException,
SecureSessionException, and SecureCompareException are still checked.
These exceptions are used to indicate issues with user-provided data:
key material, data messages, etc. These are expected failures and should
be handled by the application somehow.
After these adjustments, update the "throws" clauses in Themis code and
remove all unchecked exceptions from them. This allows user applications
to not handle these exceptions immediately. (Though they can still leave
"throws" specifications in their code if they wish to. Our tests do,
for example, because I can't be bothered to update them without doing
a complete rewrite for readability.)
Drop more "throws" specs from Secure Session and Secure Comparator
interfaces by wrapping some of the exceptions into appropriate runtime
exceptions.

Secure Session and Secure Comparator throw checked exceptions in many
places where user input cannot cause failures. For example, the user
cannot recover from a failure to create Secure Session object. These
errors should cause unchecked, runtime exceptions.

However, in some places checked exceptions are warranted: such as when
wrapping and unwrapping Secure Session messages, or when Secure
Comparator protocol exchange is processed. Here we leave checked
exceptions and indicate them with "throws" specs.
Convert UnsupportedEncodingException into RuntimeException in interfaces
that expect Strings and unconditionally serialize them as UTF-16 bytes.
If the implementation does not support UTF-16 -- which is highly
unlikely, given that Java pioneered Unicode and UTF-16 specifically --
then the user application cannot do anything about it (other that
rewrite JavaThemis code). This is not an error with user input. Themis
does not give the user an option to select encoding and always uses
UTF-16, so the user application cannot recover from this error.
@ilammy ilammy added the W-JavaThemis ☕ Wrapper: Java, Java and Kotlin API label Dec 4, 2019
public SecureCellData protect(String password, String context, byte[] data) throws UnsupportedEncodingException, NullArgumentException, SecureCellException {
return this.protect(password.getBytes(CHARSET), context.getBytes(CHARSET), data);
public SecureCellData protect(String password, String context, byte[] data) throws SecureCellException {
return this.protect(Utils.getUTF16Bytes(password), Utils.getUTF16Bytes(context), data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you decide use a hardcoded method for decoding string instead using constantly configured charset? as I know, java allow to change constants at runtime and if someone will want to change it, it allow to change it from his own code without re-writing themis. But now he will need to inherit a class and re-implement method for it. Better way to refactor protect method to be as template method where will be called some other method decodeString with default implementation which use Utils.getUTF16Bytes, but people will need override only this one

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, Utils it's our new module) so in java way will be better to add dependency injection and use some instance of converter which will be used and which should be provided in new one constructor and used default in our current constructor

Copy link
Collaborator Author

@ilammy ilammy Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All places use a hardcoded constant "UTF-16", I’ve only replaced that with a common utility to avoid code duplication and preserve the current behavior and interface.

CHARSET constants are declared package-private (= not accessible outside of our package) final (= not assignable) String (= immutable in Java) values so it’s not really feasible to change them for library users. It’s probably possible with some vile abuse of reflection and unsafe API, but that’s a footgun.

So yeah, there are a number of possible approaches if our users would like to use a different encoding:

  • inherit and override all interesting methods individually
  • provide a protected method for encoding/decoding that can be overriden
  • provide a way to inject encoder/decoder into our objects
  • provide a way to specify encoding to be passed to String.getBytes()
  • use composition: wrap Secure Cell into the interface you like
  • encode/decode outside of Secure Cell and use byte[] API

None of which is probably for us to decide.

I’d personally would like to get rid of String interfaces altogether, unless we agree on the encoding to use across Themis wrappers. Or move it into Secure​Cell​That​Works​Only​With​Java​Strings subclass to keep it cleanly separated from the basic Secure Cell compatible with other platforms. However, that’s a breaking change and it will trigger @vixentael.

My opinion is that Themis works with bytes. If you’d like to use strings that’s great, but please don’t force us to make serialization decisions for you. You’re free to wrap Secure Cell into whatever interface you like and use whatever serialization you like — but it’s there, in your package, not here, not in our package.

com.cossacklabs.themis.SecureCell makes a serialization decision for the user: Strings are serialized as UTF-16*. I believe that it’s wrong, given that other platforms use different encodings for their native strings. But that’s how it is right now: out-of-the-box Java strings don’t work with anything other than Java.
_______________
* To make it more fun: it’s unspecified whether it’s UTF-16LE or UTF-16BE and whether it includes a BOM to disambiguate between them. So technically speaking, our current code is not only incompatible with other platforms, it is also incompatible between different Java implementations (and maybe even the same implementation running on different hardware).

Copy link
Collaborator Author

@ilammy ilammy Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the constructive side, while I was typing all of the above I’ve reviewed String API and it seems that we don’t need the Utils class. Instead we can replace

password.getBytes("UTF-16")

with

password.getBytes(StandardCharsets.UTF_16)

which does not throw checked UnsupportedEncodingException because the encoding is known.

Copy link
Contributor

@vixentael vixentael Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d personally would like to get rid of String interfaces altogether, unless we agree on the encoding to use across Themis wrappers. Or move it into Secure​Cell​That​Works​Only​With​Java​Strings subclass to keep it cleanly separated from the basic Secure Cell compatible with other platforms. However, that’s a breaking change and it will trigger @vixentael.

While i totally support the unification, Java strings are indeed unique in their UTF16 usage.

current code is not only incompatible with other platforms, it is also incompatible between different Java implementations

luckily, Themis works with bytes and suggest strings API only as data input, not data output

Copy link
Collaborator Author

@ilammy ilammy Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding String API deprecation, we may actually deprecate the current API In favor of the proper password-based API when we have it, and this may give us a chance to move the string-based API elsewhere and maybe make it more compatible with other platforms.

Passwords are going to require some decision from us because for passwords it does not make sense to force the users to use byte[] for them. We’re likely to enable string usage for passwords for other platforms as well, and here we’ll have issues if some platforms use UTF-16 while others default to UTF-8, or something else (like whatever legacy encoding they are running with; I’m looking at you, Ruby).

Themis works with bytes and suggest strings API only as data input, not data output

Well, the data is still recoverable, no problem. By “incompatibility” I mean when it does not work successfully out of the box. Of course it will work if the developers use byte-oriented API and do not mess up string encodings.

However, it may not work in the default use case, when one party uses a "secret" in one language, the other uses "secret" in their language, but Secure Cell fails to decrypt because Themis wrappers receive different bytes for the master key because the encoding is not the same.

Thankfully, modern languages (finally!) make it painfully obvious that bytes and strings are not the same. You all know the story with Python 3. Swift, for example, also forces the user to make an explicit decision how exactly they turn their String into Data required by Themis.

Use a different overload of String.getBytes() that does not involve
charset lookup (and possible "unknown encoding" error).
Copy link
Collaborator

@Lagovas Lagovas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

this(mode);
this.key = password.getBytes(CHARSET);
}

int mode = MODE_SEAL;
byte[] key;

static final String CHARSET = "UTF-16";
static final Charset CHARSET = StandardCharsets.UTF_16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ilammy ilammy merged commit 5f886e1 into cossacklabs:master Dec 6, 2019
@ilammy ilammy deleted the java/unchecked-exceptions branch December 6, 2019 09:12
ilammy added a commit to ilammy/themis that referenced this pull request Dec 8, 2019
Translation for mortals:

- `git log --reverse --oneline`

  Print commit log from oldest to latest, one commit per line, in the
  following format

      5f886e1 Make some JavaThemis exceptions unchecked (cossacklabs#563)

- `$(git describe --tags $(git rev-list --tags --max-count=1))..`

  ...starting from the latest tag in the repository (`0.12.0` now).

- `| perl -n -e`

  Then for each line execute the following Perl program:

- `/^[a-f0-9]* (.*) \(\#([0-9]*)\)$/`

  ...which parses `git log` output with a regular expression...

- `&& print qq(https://github.com/cossacklabs/themis/pull/$2 $1\n)`

  ...and prints each line like this:

      cossacklabs#563 Make some JavaThemis exceptions unchecked

- `| while read url description; do`

  Then for each of those lines, split a line into the PR URL in `url`
  and the rest of the line in `description`...

- `grep -q "$url" CHANGELOG.md ||`

  ...and for every PR URL that's not in CHANGELOG.md...

- `echo "$url $description"`

  ...write URL and description to **stdout**.

- `done`

Vier Dreissig sends her regards.
ilammy added a commit to ilammy/themis that referenced this pull request Jun 22, 2020
Translation for mortals:

- `git log --reverse --oneline`

  Print commit log from oldest to latest, one commit per line, in the
  following format

      5f886e1 Make some JavaThemis exceptions unchecked (cossacklabs#563)

- `$(git describe --tags $(git rev-list --tags --max-count=1))..`

  ...starting from the latest tag in the repository (`0.12.0` now).

- `| perl -n -e`

  Then for each line execute the following Perl program:

- `/^[a-f0-9]* (.*) \(\#([0-9]*)\)$/`

  ...which parses `git log` output with a regular expression...

- `&& print qq(https://github.com/cossacklabs/themis/pull/$2 $1\n)`

  ...and prints each line like this:

      cossacklabs#563 Make some JavaThemis exceptions unchecked

- `| while read url description; do`

  Then for each of those lines, split a line into the PR URL in `url`
  and the rest of the line in `description`...

- `grep -q "$url" CHANGELOG.md ||`

  ...and for every PR URL that's not in CHANGELOG.md...

- `echo "$url $description"`

  ...write URL and description to **stdout**.

- `done`

Vier Dreissig sends her regards.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
W-JavaThemis ☕ Wrapper: Java, Java and Kotlin API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants