-
Notifications
You must be signed in to change notification settings - Fork 614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CBOR Feature Drop: COSE #2412
base: dev
Are you sure you want to change the base?
CBOR Feature Drop: COSE #2412
Conversation
Hi, thanks for your PR! Just as a quick heads-up so you know I've seen it — I'm going on vacation soon, so I'll be able to properly review it in September. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I appreciate the amount of work you've done — especially for the tests. I didn't look into the implementation (yet), focusing on surface API for now.
Regarding failure on CI — it's connected to the fact that Long.toUnsignedString
was added only in Android API 26. However, it is possible to use this declaration on Android with desugaring enabled, so there shouldn't be any problems. You have to crate a special annotation @SuppressAnimalSniffer
(it already exists for core and json, but not for cbor, see e.g. here:
kotlinx.serialization/formats/json/commonMain/src/kotlinx/serialization/json/JsonElement.kt
Line 92 in d192d24
@SuppressAnimalSniffer // Long.toUnsignedString(long) |
kotlinx.serialization/build.gradle
Line 205 in a87b0f1
switch (name) { |
formats/cbor/commonMain/src/kotlinx/serialization/cbor/ByteStringWrapper.kt
Outdated
Show resolved
Hide resolved
formats/cbor/commonMain/src/kotlinx/serialization/cbor/ByteStringWrapper.kt
Outdated
Show resolved
Hide resolved
formats/cbor/commonMain/src/kotlinx/serialization/cbor/SerialLabel.kt
Outdated
Show resolved
Hide resolved
formats/cbor/commonMain/src/kotlinx/serialization/cbor/CborArray.kt
Outdated
Show resolved
Hide resolved
formats/cbor/commonMain/src/kotlinx/serialization/cbor/internal/Encoding.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look at the test thoroughly (to understand if there is a need for more test cases), but I highlighted all of the problematic moments from my perspective. Don't forget to press 'show hidden conversations' on Github :)
* Note that `equals()` and `hashCode()` only use `value`, not `serialized`. | ||
*/ | ||
@Serializable(with = ByteStringWrapperSerializer::class) | ||
public class ByteStringWrapper<T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: it seems you need to update apiDump
since this class is no longer data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, done!
@Serializable(with = ByteStringWrapperSerializer::class) | ||
public class ByteStringWrapper<T>( | ||
public val value: T, | ||
public val serialized: ByteArray = byteArrayOf() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've skimmed through the diff and can't find any usages of this value. Is there any reason to be a val
? In what cases will the user be interested in accessing it? Also, it is very easy to create inconsistent data with this approach (e.g. ByteStringWrapper(original.value, byteArrayOf(garbage))
). If it is not necessary to have it, this class can be a value class
. Or even no class would be needed, as this can be handled with yet another @SerialInfo
annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nodh can you take care of this, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll use the CBOR serialization for ISO 18013-5 compliaent mobile driving licences. There verifiers need to check that (the digest of) the serialized byte values appear in another structure (signed by the issuer). So in that case its quite useful to have the bytes as they were serialized (and parsed) in the deserialized ByteStringWrapper
object too.
@SerialInfo | ||
@Target(AnnotationTarget.CLASS) | ||
@ExperimentalSerializationApi | ||
public annotation class CborArray(@OptIn(ExperimentalUnsignedTypes::class) vararg val tag: ULong) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vararg val tag
is still undocumented though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry! done!
readProperties++ | ||
descriptor.getElementIndexOrThrow(elemName) | ||
descriptor.getElementIndexOrThrow(elemName).also { index -> | ||
if (cbor.verifyKeyTags) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of duplicated code can be extracted to function (perhaps local)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} else { | ||
if (cbor.verifyKeyTags) { | ||
descriptor.getKeyTags(index)?.let { keyTags -> | ||
if (!(keyTags contentEquals tags)) throw CborDecodingException("CBOR tags $tags do not match declared tags $keyTags") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General observation (for all throw
expressions): It is would be much easier to debug errors if they also included locations. Here we don't have a json path analog, but appending descriptor.toString()
will already make the job easier significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done where we have access to the descriptor (which we do not always have)
readByte() | ||
} | ||
|
||
return (if (collectedTags.isEmpty()) null else collectedTags.toULongArray()).also { collected -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO also
creates unnecessary nesting here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it needs to be transformed into an ULongArray
for comparison and for returning it. the alternative of adding a temp variable just for two accesses seems a bit cumbersome to me
|
||
return (if (collectedTags.isEmpty()) null else collectedTags.toULongArray()).also { collected -> | ||
tags?.let { | ||
if (!it.contentEquals(collected)) throw CborDecodingException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that contentEquals
has both nullable receiver and parameter. It results in behavior like this: intArrayOf(1,2,3).contentEquals(null) -> false
. I'm not sure it is what you intended to have here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only want to compare if tags are actually set. Otherwise, we don't care.
I'll also add that comment to the code.
@@ -633,9 +952,55 @@ private fun Iterable<ByteArray>.flatten(): ByteArray { | |||
|
|||
@OptIn(ExperimentalSerializationApi::class) | |||
private fun SerialDescriptor.isByteString(index: Int): Boolean { | |||
return getElementAnnotations(index).find { it is ByteString } != null | |||
return kotlin.runCatching { getElementAnnotations(index).find { it is ByteString } != null }.getOrDefault(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why runCatching
is needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's only a theoretical possibility, but an IndexOutOfBoundsException
could be thrown. I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, such a call fails for the WASM target, even though we use runCatching
. I have no explanation
It would be awesome, as currently it's somewhat surprising that the default encoded size of byte arrays is nearly twice as large. |
WASM still fails, because even |
Here are the details:
EDIT: seems it's already been reported and there's really not much we can do: https://youtrack.jetbrains.com/issue/KT-59081/WASM-Cant-catch-index-out-of-bounds-and-divide-by-0-exceptions |
@sandwwraith I Implemented your suggested simplified approach according to your snippet. This PR is ready for the next review round. TL;DR: t is working perfectly well and the code is much simpler. Also performs much faster! But the original Long story:
|
I'm also happy to report that the JS compiler bug can now only be triggered when using empty |
Hi, I stumbled upon this great MR while trying to work with some CBOR structures and I have two questions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like new approach — I think you can agree that it is much cleaner and requires much less code.
Check the CI failure — you need to add @SuppressAnimalSniffer
to CborDecoder.processTags
|
||
@OptIn(ExperimentalSerializationApi::class) | ||
internal fun SerialDescriptor.isByteString(index: Int): Boolean { | ||
return kotlin.runCatching { getElementAnnotations(index).find { it is ByteString } != null }.getOrDefault(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of abusing runCatching
, it's better to use helper function like this one:
public inline fun <reified A: Annotation> SerialDescriptor.findAnnotation(elementIndex: Int): A? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in this particular case, you can just do getElementAnnotations(index).any { it is ByteString }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that already before preparing the updated PR. To my surprise, it lead to a slight, but noticeable performance hit according to JMH, so I kept the runCatching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find
does not throw exceptions, sorunCatching
literally does nothing here.- Current benchmark does not write tags (only verifies them, which is incorrect btw — wrote about that separately)
- Did you have other benchmark data with labels? I've re-measured PR state vs my diff (attached below) and got no significant difference:
PR
Benchmark Mode Cnt Score Error Units
CborBaseline.fromBytes thrpt 20 1637.134 ± 17.891 ops/ms
CborBaseline.toBytes thrpt 20 2242.036 ± 39.286 ops/ms
PR_WITH_PATCH
Benchmark Mode Cnt Score Error Units
CborBaseline.fromBytes thrpt 20 1641.116 ± 43.964 ops/ms
CborBaseline.toBytes thrpt 20 2126.622 ± 96.746 ops/ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll re-check (don't know If I still have the un-squashed commits in another branch locally to re-trace my steps), maybe I mixed something up.
While find
does not throw, I seem to remember the runCatching
(maybe not in that particular line, but somewhere in what is now EncodingUtils.kt) triggered KT-59081, because getElementAnnotations
can throws an OOB exception. Apparently, that was just sloppiness, if it is working fine now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOBE from getElementAnnotations
means that you've made a mistake with indices somewhere, so I would rather not swallow this exception, since it indicates the bug in the encoder/decoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 897ff72
|
||
@OptIn(ExperimentalSerializationApi::class) | ||
internal fun SerialDescriptor.hasArrayTag(): Boolean { | ||
return annotations.filterIsInstance<CborArray>().isNotEmpty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use annotations.any { it is CborArray }
to avoid allocation of intermediate array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 897ff72
|
||
@OptIn(ExperimentalSerializationApi::class) | ||
private fun SerialDescriptor.getElementNameForCborLabel(label: Long): String? { | ||
return elementNames.firstOrNull { getCborLabel(getElementIndex(it)) == label } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using firstOrNull
here causes full iteration over descriptor, and since this function is called for each key-value pair in the input, it may lead to a quadratic time complexity — in case of input where labels are predominately used over properties' names. To avoid this, you have to consider caching labels per descriptor. @JsonNames
annotation has similar problems; you can take a look how cache for it is implemented here:
Line 18 in 194a188
private fun SerialDescriptor.buildDeserializationNamesMap(json: Json): Map<String, Int> { |
However, to keep this review size reasonable, let's do this after this PR. I assume the performance of CBOR label decoding is in your best interest :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps Protobuf implementation would be closer representation, since @CborLabel
does not permit multiple values in a contrary to @JsonNames
:
Line 41 in 251bca7
public fun populateCache(descriptor: SerialDescriptor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you'll open up an issue after this PR is through?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can, yes
import kotlinx.serialization.* | ||
|
||
/** | ||
* CBOR supports *labels*, which work just as `SerialNames`. The key difference is that labels are not strings, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the termin 'label' only in COSE specification (https://datatracker.ietf.org/doc/html/rfc8152#section-1.4), while regular CBOR spec does not have this word anywhere. Also, as far as I remember, regular CBOR key can be of any type, not just integer or string. I think this is worth mentioning here in the docs — that labels are mainly used for COSE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I amended the documentation in 8389089 but decided to keep the name CborLabel
and not switch over to CoseLabel
.
In the future a more generic annotation would be perfect, which supports arbitrary types as map keys for full compliance. To be honest, though, I don't think anything apart from string and numbers is used in practice.
@@ -446,8 +363,10 @@ internal class CborDecoder(private val input: ByteArrayInput) { | |||
return array | |||
} | |||
|
|||
fun nextFloat(): Float { | |||
skipOverTags() | |||
fun nextFloat(tag: ULong) = nextFloat(ulongArrayOf(tag)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in a45daf7
@Target(AnnotationTarget.PROPERTY) | ||
@ExperimentalSerializationApi | ||
public annotation class ValueTags(@OptIn(ExperimentalUnsignedTypes::class) vararg val tags: ULong) { | ||
public companion object { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Companion undocumented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 49b3ea2
val label = descriptor.getCborLabel(index) | ||
|
||
if (!descriptor.hasArrayTag()) { | ||
if (cbor.writeKeyTags) descriptor.getKeyTags(index)?.forEach { getDestination().encodeTag(it) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest extracting getDestination()
to a local variable to avoid getting it again every time
val label = descriptor.getCborLabel(index) | ||
|
||
if (!descriptor.hasArrayTag()) { | ||
if (cbor.writeKeyTags) descriptor.getKeyTags(index)?.forEach { output.encodeTag(it) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides difference in output
vs getDestination()
this method is identical to CborWriter.encodeElement()
. I suggest removing it completely. (and add val output = getDestination()
to common version as I suggested there )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 682f363
} | ||
|
||
//optimized definite length encoder | ||
internal open class DefiniteLengthCborWriter(cbor: Cbor, output: ByteArrayOutput) : CborWriter(cbor, output) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no need for it to be open
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 3d8768c
//value classes are only inlined on the JVM, so we use a typealias and extensions instead | ||
private typealias Stack = MutableList<CborWriter.Data> | ||
|
||
private fun Stack(vararg elements: CborWriter.Data): Stack = mutableListOf(*elements) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is argument needs to be vararg
though? I see it is only called with 1 element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Nope, that was just an oversight and needs fixing.
I don't see how that would work. You would need to individually tag specific keys and values, which is just impossible |
override val serializersModule: SerializersModule | ||
) : BinaryFormat { | ||
|
||
/** | ||
* The default instance of [Cbor] | ||
*/ | ||
public companion object Default : Cbor(false, false, EmptySerializersModule()) | ||
public companion object Default : | ||
Cbor(false, false, true, true, true, true, false, true, false, EmptySerializersModule()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you enabled new flags (write/verifyTags, preferCborLabels) by default. We can't do that, because it will change output format in the right moment users will get the new version of the library without their explicit consent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in f4b43e7
if ((descriptor.kind !is StructureKind.LIST) && (descriptor.kind !is StructureKind.MAP) && (descriptor.kind !is PolymorphicKind)) { | ||
//indices are put into the name field. we don't want to write those, as it would result in double writes | ||
if (cbor.preferCborLabelsOverNames && label != null) { | ||
getDestination().encodeNumber(label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we do not need label value unless cbor.preferCborLabelsOverNames
is true, it's better to move val name = descriptor.getElementName(index)/ val label = descriptor.getCborLabel(index)
line under this if so we won't access annotation list unless it is necessary:
var label: Long? = null
if (cbor.preferCborLabelsOverNames) label = descriptor.getCborLabel(index)
if (label != null) output.encodeNumber(label) else output.encodeString(descriptor.getElementName(index))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 2439e5e
I obviously can't run the CI myself, but I tried to address everything. Ready for the next round of reviews. Thanks everyone for the detailed feedback! It really helped to clear up some things that were not working the way they are supposed to. |
|
||
package kotlinx.serialization.cbor | ||
|
||
public interface CborEncoder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be two interfaces like CborEncoder
and CborDecoder
as in JsonEncoder and JsonDecoder
At the current moment it feels like a dangling type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, naming is not optimal but for whatever reason when I add a dedicated CborDecoder
and rename the currently internal CborEncoder
, which handles low-level stuff, to LowLevelCborEncoder
the WASI tests glitch out with empty error messages and and empty test report, complaining about failing tests.
So while I agree, I don't even know how to debug this weird behaviour, which, to me, is unintelligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managed to split it up by starting from scratch in reverse order. @sandwwraith feel free to use or ignore 82f326f as desired.
This PR obsoletes #2371 and #2359 as it contains the features of both PRs and many more.
Specifically, this PR contains all feature required to serialize and parse COSE-compliant CBOR (thanks to @nodh). While some canonicalization steps (such as sorting keys) still needs to be performed manually. It does get the job done quite well. Namely, we have successfully used the features introduced here to create and validate ISO/IEC 18013-5:2021-compliant mobile driving license data.
This PR introduces the following features to the CBOR format:
null
complex objects as empty map (to be COSE-compliant)