-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make some JavaThemis exceptions unchecked #563
Conversation
Checked exceptions are commonly considered a design mistake in Java standard library [1]. They have some limited uses, but generally they do not provide safe and ergonomic interface for the users because they are often misunderstood. Is is common to use checked exceptions in situations where unchecked exceptions are more approriate. This leads to exceptions that cannot be handled by the application and it is really common to ignore ("fix") them by adding "throws Exception" specifier because you can't be bothered to enumerate a dozen of concrete classes of internal library errors. Either way it is not better than using unchecked exceptions in the first place. Unchecked exceptions are intended for system faults and programming errors which are generally not recoverable by the application [2]. For example, using a method incorrectly, not checking for "null" when it should be checked for, providing invalid argument values, etc. These are generally not under contol of the application, in the sense that the user cannot do something to avoid this failure other than fixing application logic (or, well, circumenting the faulty code path). Checked exceptions are intended to be used for recoverable errors [2]. That is, something that the application can control and influence. For example, invalid user input can cause a checked exception because this is an expected situation (the user can make mistakes and this is not something that application can generally prevent in code). However, the application can control the user and instruct them to give correct input, or at least tell the user that theinput is incorrect. That is, the application is expected to always handle checked exception in some way, and there should be a catch clause somewhere in the middle, not only on the top level for possible run-time exceptions. [1]: http://literatejava.com/exceptions/checked-exceptions-javas-biggest-mistake/ [2]: https://docs.oracle.com/javase/tutorial/essential/exceptions/runtime.html With this in mind, convert the following com.cossacklabs.themis exceptions into runtime, unchecked variants: - NullArgumentException - it is used to indicate that the programmer passed a null value where it should not be passed, and this will cause a NullPointerException later. In fact, NullArgumentException is now a subclass of NullPointerException as it indicates misuse of null values. - InvalidArgumentException - it is used to indicate that the programmer has passed an invalid argument to the method: an unknown constant, an empty buffer that must not be empty, or some other violation of the method contract. It is a subclass of IllegalArgumentException now. - KeyGenerationException - it is used to indicate internal failure in Themis key generation routines. This is not something under user control and key generation does not have any input that can change its behavior. This error can be caused either by a bug in Themis Core or by a misconfigured cryptographic backed. Both situations cannot be recovered by the application. Note, however, that SecureCellException, SecureMessageWrapException, SecureSessionException, and SecureCompareException are still checked. These exceptions are used to indicate issues with user-provided data: key material, data messages, etc. These are expected failures and should be handled by the application somehow.
After these adjustments, update the "throws" clauses in Themis code and remove all unchecked exceptions from them. This allows user applications to not handle these exceptions immediately. (Though they can still leave "throws" specifications in their code if they wish to. Our tests do, for example, because I can't be bothered to update them without doing a complete rewrite for readability.)
Drop more "throws" specs from Secure Session and Secure Comparator interfaces by wrapping some of the exceptions into appropriate runtime exceptions. Secure Session and Secure Comparator throw checked exceptions in many places where user input cannot cause failures. For example, the user cannot recover from a failure to create Secure Session object. These errors should cause unchecked, runtime exceptions. However, in some places checked exceptions are warranted: such as when wrapping and unwrapping Secure Session messages, or when Secure Comparator protocol exchange is processed. Here we leave checked exceptions and indicate them with "throws" specs.
Convert UnsupportedEncodingException into RuntimeException in interfaces that expect Strings and unconditionally serialize them as UTF-16 bytes. If the implementation does not support UTF-16 -- which is highly unlikely, given that Java pioneered Unicode and UTF-16 specifically -- then the user application cannot do anything about it (other that rewrite JavaThemis code). This is not an error with user input. Themis does not give the user an option to select encoding and always uses UTF-16, so the user application cannot recover from this error.
public SecureCellData protect(String password, String context, byte[] data) throws UnsupportedEncodingException, NullArgumentException, SecureCellException { | ||
return this.protect(password.getBytes(CHARSET), context.getBytes(CHARSET), data); | ||
public SecureCellData protect(String password, String context, byte[] data) throws SecureCellException { | ||
return this.protect(Utils.getUTF16Bytes(password), Utils.getUTF16Bytes(context), data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you decide use a hardcoded method for decoding string instead using constantly configured charset? as I know, java allow to change constants at runtime and if someone will want to change it, it allow to change it from his own code without re-writing themis. But now he will need to inherit a class and re-implement method for it. Better way to refactor protect
method to be as template method where will be called some other method decodeString
with default implementation which use Utils.getUTF16Bytes
, but people will need override only this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, Utils it's our new module) so in java way will be better to add dependency injection and use some instance of converter which will be used and which should be provided in new one constructor and used default in our current constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All places use a hardcoded constant "UTF-16"
, I’ve only replaced that with a common utility to avoid code duplication and preserve the current behavior and interface.
CHARSET
constants are declared package-private (= not accessible outside of our package) final
(= not assignable) String
(= immutable in Java) values so it’s not really feasible to change them for library users. It’s probably possible with some vile abuse of reflection and unsafe API, but that’s a footgun.
So yeah, there are a number of possible approaches if our users would like to use a different encoding:
- inherit and override all interesting methods individually
- provide a
protected
method for encoding/decoding that can be overriden - provide a way to inject encoder/decoder into our objects
- provide a way to specify encoding to be passed to
String.getBytes()
- use composition: wrap Secure Cell into the interface you like
- encode/decode outside of Secure Cell and use
byte[]
API
None of which is probably for us to decide.
I’d personally would like to get rid of String interfaces altogether, unless we agree on the encoding to use across Themis wrappers. Or move it into SecureCellThatWorksOnlyWithJavaStrings subclass to keep it cleanly separated from the basic Secure Cell compatible with other platforms. However, that’s a breaking change and it will trigger @vixentael.
My opinion is that Themis works with bytes. If you’d like to use strings that’s great, but please don’t force us to make serialization decisions for you. You’re free to wrap Secure Cell into whatever interface you like and use whatever serialization you like — but it’s there, in your package, not here, not in our package.
com.cossacklabs.themis.SecureCell
makes a serialization decision for the user: Strings
are serialized as UTF-16*. I believe that it’s wrong, given that other platforms use different encodings for their native strings. But that’s how it is right now: out-of-the-box Java strings don’t work with anything other than Java.
_______________
* To make it more fun: it’s unspecified whether it’s UTF-16LE or UTF-16BE and whether it includes a BOM to disambiguate between them. So technically speaking, our current code is not only incompatible with other platforms, it is also incompatible between different Java implementations (and maybe even the same implementation running on different hardware).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the constructive side, while I was typing all of the above I’ve reviewed String
API and it seems that we don’t need the Utils class. Instead we can replace
password.getBytes("UTF-16")
with
password.getBytes(StandardCharsets.UTF_16)
which does not throw checked UnsupportedEncodingException because the encoding is known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d personally would like to get rid of String interfaces altogether, unless we agree on the encoding to use across Themis wrappers. Or move it into SecureCellThatWorksOnlyWithJavaStrings subclass to keep it cleanly separated from the basic Secure Cell compatible with other platforms. However, that’s a breaking change and it will trigger @vixentael.
While i totally support the unification, Java strings are indeed unique in their UTF16
usage.
current code is not only incompatible with other platforms, it is also incompatible between different Java implementations
luckily, Themis works with bytes and suggest strings API only as data input, not data output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding String API deprecation, we may actually deprecate the current API In favor of the proper password-based API when we have it, and this may give us a chance to move the string-based API elsewhere and maybe make it more compatible with other platforms.
Passwords are going to require some decision from us because for passwords it does not make sense to force the users to use byte[]
for them. We’re likely to enable string usage for passwords for other platforms as well, and here we’ll have issues if some platforms use UTF-16 while others default to UTF-8, or something else (like whatever legacy encoding they are running with; I’m looking at you, Ruby).
Themis works with bytes and suggest strings API only as data input, not data output
Well, the data is still recoverable, no problem. By “incompatibility” I mean when it does not work successfully out of the box. Of course it will work if the developers use byte-oriented API and do not mess up string encodings.
However, it may not work in the default use case, when one party uses a "secret"
in one language, the other uses "secret"
in their language, but Secure Cell fails to decrypt because Themis wrappers receive different bytes for the master key because the encoding is not the same.
Thankfully, modern languages (finally!) make it painfully obvious that bytes and strings are not the same. You all know the story with Python 3. Swift, for example, also forces the user to make an explicit decision how exactly they turn their String
into Data
required by Themis.
Use a different overload of String.getBytes() that does not involve charset lookup (and possible "unknown encoding" error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
this(mode); | ||
this.key = password.getBytes(CHARSET); | ||
} | ||
|
||
int mode = MODE_SEAL; | ||
byte[] key; | ||
|
||
static final String CHARSET = "UTF-16"; | ||
static final Charset CHARSET = StandardCharsets.UTF_16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Translation for mortals: - `git log --reverse --oneline` Print commit log from oldest to latest, one commit per line, in the following format 5f886e1 Make some JavaThemis exceptions unchecked (cossacklabs#563) - `$(git describe --tags $(git rev-list --tags --max-count=1))..` ...starting from the latest tag in the repository (`0.12.0` now). - `| perl -n -e` Then for each line execute the following Perl program: - `/^[a-f0-9]* (.*) \(\#([0-9]*)\)$/` ...which parses `git log` output with a regular expression... - `&& print qq(https://github.com/cossacklabs/themis/pull/$2 $1\n)` ...and prints each line like this: cossacklabs#563 Make some JavaThemis exceptions unchecked - `| while read url description; do` Then for each of those lines, split a line into the PR URL in `url` and the rest of the line in `description`... - `grep -q "$url" CHANGELOG.md ||` ...and for every PR URL that's not in CHANGELOG.md... - `echo "$url $description"` ...write URL and description to **stdout**. - `done` Vier Dreissig sends her regards.
Translation for mortals: - `git log --reverse --oneline` Print commit log from oldest to latest, one commit per line, in the following format 5f886e1 Make some JavaThemis exceptions unchecked (cossacklabs#563) - `$(git describe --tags $(git rev-list --tags --max-count=1))..` ...starting from the latest tag in the repository (`0.12.0` now). - `| perl -n -e` Then for each line execute the following Perl program: - `/^[a-f0-9]* (.*) \(\#([0-9]*)\)$/` ...which parses `git log` output with a regular expression... - `&& print qq(https://github.com/cossacklabs/themis/pull/$2 $1\n)` ...and prints each line like this: cossacklabs#563 Make some JavaThemis exceptions unchecked - `| while read url description; do` Then for each of those lines, split a line into the PR URL in `url` and the rest of the line in `description`... - `grep -q "$url" CHANGELOG.md ||` ...and for every PR URL that's not in CHANGELOG.md... - `echo "$url $description"` ...write URL and description to **stdout**. - `done` Vier Dreissig sends her regards.
This is a preparatory change for symmetric key generation API. It's not strictly required but I'd like to set this right while we can. It was motivated by addition of new classes and interfaces which deal with keys. In Java keys are stored as
byte[]
arrays that must not benull
and must not be empty. If they are thenNullArgumentException
andInvalidArgumentException
may be thrown (these are our custom exceptions fromcom.cossacklabs.themis
package). The issue here is that they are checked exceptions and this is inconvenient: I had to duplicatethrows NullArgumentException, InvalidArgumentException
in about a dozen of places for no reason and our users will have to do the same with every method where they use keys (even if they use them correctly).So I opened this can of worms...
What changes in existing Java apps if this is merged?
Nothing. If they worked before you may leave them unchanged and they will be fine.
However, if you write new code or refactoring something then the following is applicable:
NullArgumentException
,InvalidArgumentException
,KeyGenerationException
are now unchecked exceptions. You do not need to specify them inthrows
clauses anymore. Normally you don't need to catch these errors specifically if you use API correctly.java.io.UnsupportedEncodingException
is now not thrown by String-based API, you can stop checking for that.SecureCompare
methods now throwRuntimeException
on internal failures instead of checkedSecureCompareException
. They should not fail under normal circumstances.All methods will also throw unchecked
NullArgumentException
andInvalidArgumentException
instead ofSecureCompareException
if you provide invalid arguments.proceed()
method may fail with checkedSecureCompareException
, as before.SecureSession
methods also useRuntimeException
for internal failures instead of checkedSecureSessionException
in most places. Ditto for invalid argument checks.wrap()
,unwrap()
, andload()
may still throw checkedSecureSessionException
.SecureTransportSession
also does not throwSecureSessionException
anymore.Since we don't test with Kotlin I have no idea about the API impact, but I believe it's fine 🤞
Kotlin does not use checked exceptions, they are all unchecked there (in Java sense).
Detailed rant
Excerpt from commit messages...
Checked exceptions are commonly considered a design mistake in Java standard library [1, 2, 3, 4]. They have some limited uses, but generally they do not provide safe and ergonomic interface for the users because they are often misunderstood. It is common to use checked exceptions in situations where unchecked exceptions are more appropriate. This leads to exceptions that cannot be handled by the application and it is really common to ignore ("fix") them by adding
throws Exception
specifier because you can't be bothered to enumerate a dozen of concrete classes of internal library errors. Either way it is not better than using unchecked exceptions in the first place.Unchecked exceptions are intended for system faults and programming errors which are generally not recoverable by the application [5]. For example, using a method incorrectly, not checking for
null
when it should be checked for, providing invalid argument values, etc. These are generally not under control of the application, in the sense that the user cannot do something to avoid this failure other than fixing application logic (or, well, circumventing the faulty code path).Checked exceptions are intended to be used for recoverable errors [5]. That is, something that the application can control and influence. For example, invalid user input can cause a checked exception because this is an expected situation (the user can make mistakes and this is not something that application can generally prevent in code). However, the application can control the user and instruct them to give correct input, or at least tell the user that the input is incorrect. That is, the application is expected to always handle checked exception in some way, and there should be a catch clause somewhere in the middle, not only on the top level for possible run-time exceptions.
With this in mind, convert the following
com.cossacklabs.themis
exceptions into runtime, unchecked variants:NullArgumentException is used to indicate that the programmer passed a null value where it should not be passed, and this will cause a NullPointerException later. In fact, NullArgumentException is now a subclass of NullPointerException as it indicates misuse of null values.
InvalidArgumentException is used to indicate that the programmer has passed an invalid argument to the method: an unknown constant, an empty buffer that must not be empty, or some other violation of the method contract. It is a subclass of IllegalArgumentException now.
KeyGenerationException it is used to indicate internal failure in Themis key generation routines. This is not something under user control and key generation does not have any input that can change its behavior. This error can be caused either by a bug in Themis Core or by a misconfigured cryptographic backed. Both situations cannot be recovered by the application.
Note, however, that
SecureCellException
,SecureMessageWrapException
,SecureSessionException
, andSecureCompareException
are still checked. These exceptions are used to indicate issues with user-provided data: key material, data messages, etc. These are expected failures and should be handled by the application somehow.Checklist
Benchmark results are attached(not applicable)Public API has proper documentation(not applicable)Example projects and code samples are updated(not applicable)