Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(java): type meta encoding for java #1556

Merged
merged 39 commits into from
May 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
2441c72
redesign package/classname encoding spec
chaokunyang Apr 21, 2024
07b1396
add tag id into 3 bits field name encoding and add strip last char flag
chaokunyang Apr 22, 2024
afabf61
implement type meta encoding spec for java
chaokunyang Apr 22, 2024
431332f
Merge remote-tracking branch 'ant/main' into new_meta_encoding_in_java
chaokunyang Apr 26, 2024
816516f
lint code
chaokunyang Apr 26, 2024
3ce7b4c
Merge remote-tracking branch 'ant/main' into new_meta_encoding_in_java
chaokunyang Apr 27, 2024
bbe6dd9
Merge remote-tracking branch 'ant/main' into new_meta_encoding_in_java
chaokunyang Apr 27, 2024
d871d21
redesign pkg/type/field name encoding
chaokunyang Apr 28, 2024
47b19c5
implements new meta encoding
chaokunyang Apr 28, 2024
ae1c09d
Merge remote-tracking branch 'ant/main' into new_meta_encoding_in_java
chaokunyang Apr 28, 2024
f96fa4f
simplify meta string encoding
chaokunyang Apr 28, 2024
c84e7ab
class def decoder
chaokunyang Apr 28, 2024
a87aeef
add header
chaokunyang Apr 28, 2024
24217f1
add meta size header to spec
chaokunyang Apr 28, 2024
dd72e67
support parse in class resolver
chaokunyang Apr 28, 2024
5bbd472
Merge remote-tracking branch 'ant/main' into new_meta_encoding_in_java
chaokunyang Apr 28, 2024
40695ea
update buildClassDef interface
chaokunyang Apr 29, 2024
02d694f
fix encode/decode
chaokunyang Apr 29, 2024
461a391
fix encode
chaokunyang Apr 29, 2024
d21379c
fix FieldType encoding
chaokunyang Apr 29, 2024
53cef47
add ClassDefEncoderTest.java
chaokunyang Apr 29, 2024
f022add
add typeMetaStringCache
chaokunyang Apr 29, 2024
248a121
make meta share use new encoding for serialization
chaokunyang Apr 29, 2024
5d41d08
fix meta encoding for class have no fields
chaokunyang Apr 29, 2024
a4ca8a4
fix isMonomorphic
chaokunyang Apr 29, 2024
c046886
fix array meta share
chaokunyang Apr 30, 2024
ad5b8bb
fix meta share other fields order
chaokunyang Apr 30, 2024
01ea6c1
fix enum meta share
chaokunyang Apr 30, 2024
b3563d4
fix big meta encode
chaokunyang Apr 30, 2024
e556ff1
Merge remote-tracking branch 'ant/main' into new_meta_encoding_in_java
chaokunyang Apr 30, 2024
73498ef
fix typeName NPE
chaokunyang Apr 30, 2024
33626df
add Encoders to build time init
chaokunyang Apr 30, 2024
05adbff
fix compute computing
chaokunyang Apr 30, 2024
899d089
fix share type only
chaokunyang Apr 30, 2024
79c4af5
add todo
chaokunyang Apr 30, 2024
acfa100
fix class def typename mutable conflict with desc cache
chaokunyang May 1, 2024
3395ae3
add java doc
chaokunyang May 1, 2024
e5070cf
add java doc
chaokunyang May 1, 2024
6b45fd1
lint code
chaokunyang May 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 28 additions & 23 deletions docs/specification/java_serialization_spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,9 @@ For Schema consistent mode, class will be encoded as an enumerated string by ful
the meta layout for schema evolution mode:

```
| 8 bytes meta header | variable bytes | variable bytes | variable bytes |
+-------------------------------+--------------------+-------------------+----------------+
| 7 bytes hash + 1 bytes header | current class meta | parent class meta | ... |
| 8 bytes meta header | meta size | variable bytes | variable bytes | variable bytes |
+-------------------------------+-----------|--------------------+-------------------+----------------+
| 7 bytes hash + 1 bytes header | 1~2 bytes | current class meta | parent class meta | ... |
```

Class meta are encoded from parent class to leaf class, only class with serializable fields will be encoded.
Expand All @@ -128,8 +128,14 @@ Meta header is a 64 bits number value encoded in little endian order.
class doesn't have fields to serialize, or we're in a context which serialize fields of current class
only( `ObjectStreamSerializer#SlotInfo` is an example), num classes will be 1.
- 5rd bit is used to indicate whether this class needs schema evolution.
- 6rd bit is used to indicate whether the size sum of all layers meta is less than 256.
- Other 56 bits is used to store the unique hash of `flags + all layers class meta`.

### Meta size

- If the size sum of all layers meta is less than 256, then one byte is written next to indicate the length of meta.
- Otherwise, write size as two bytes in little endian.

### Single layer class meta

```
Expand All @@ -150,34 +156,33 @@ Meta header is a 64 bits number value encoded in little endian order.
fields info of those fields which aren't annotated by tag id for deserializing schema consistent fields, then use
fields info in meta for deserializing compatible fields.
- Package name encoding(omitted when class is registered):
- Header:
- If meta string encoding is `LOWER_SPECIAL` and the length of encoded string `<=` 128, then header will be
`7 bits size + flag(set)`.
Otherwise, header will be `4 bits unset + 3 bits encoding flags + flag(unset)`
- Package name:
- If bit flag is set, then package name will be encoded meta string binary.
- Otherwise, it will be `| unsigned varint length | encoded meta string binary |`
- Class name encoding(omitted when class is registered)::
- header:
- If meta string encoding is in `LOWER_SPECIAL~LOWER_UPPER_DIGIT_SPECIAL (0~3)`, and the length of encoded
string `<=` 32, then the header will be `5 bits size + 2 bits encoding flags + flag(set)`.
- Otherwise, header will be `| unsigned varint length | encoded meta string binary |`
- encoding algorithm: `UTF8/ALL_TO_LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL`
- Header: `6 bits size | 2 bits encoding flags`. The `6 bits size: 0~63` will be used to indicate size `0~62`,
the value `63` the size need more byte to read, the encoding will encode `size - 62` as a varint next.
- Class name encoding(omitted when class is registered):
- encoding algorithm: `UTF8/LOWER_UPPER_DIGIT_SPECIAL/FIRST_TO_LOWER_SPECIAL/ALL_TO_LOWER_SPECIAL`
- header: `6 bits size | 2 bits encoding flags`. The `6 bits size: 0~63` will be used to indicate size `1~64`,
the value `63` the size need more byte to read, the encoding will encode `size - 63` as a varint next.
- Field info:
- header(8
bits): `reserved 1 bit + 3 bits field name encoding + polymorphism flag + nullability flag + ref tracking flag + tag id flag`.
bits): `3 bits size + 2 bits field name encoding + polymorphism flag + nullability flag + ref tracking flag`.
Users can use annotation to provide those info.
- tag id: when set to 1, field name will be written by an unsigned varint tag id.
- ref tracking: when set to 0, ref tracking will be disabled for this field.
- nullability: when set to 0, this field won't be null.
- 2 bits field name encoding:
- encoding: `UTF8/ALL_TO_LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL/TAG_ID`
- If tag id is used, i.e. field name is written by an unsigned varint tag id. 2 bits encoding will be `11`.
- size of field name:
- The `3 bits size: 0~7` will be used to indicate length `1~7`, the value `6` the size read more bytes,
the encoding will encode `size - 7` as a varint next.
- If encoding is `TAG_ID`, then num_bytes of field name will be used to store tag id.
- ref tracking: when set to 1, ref tracking will be enabled for this field.
- nullability: when set to 1, this field can be null.
- polymorphism: when set to 1, the actual type of field will be the declared field type even the type if
not `final`.
- 3 bits field name encoding will be set to meta string encoding flags when tag id is not set.
- type id:
- For registered type-consistent classes, it will be the registered class id.
- Otherwise it will be encoded as `OBJECT_ID` if it isn't `final` and `FINAL_OBJECT_ID` if it's `final`. The
meta
for such types is written separately instead of inlining here is to reduce meta space cost if object of this
type is serialized in current object graph multiple times, and the field value may be null too.
meta for such types is written separately instead of inlining here is to reduce meta space cost if object of
this type is serialized in current object graph multiple times, and the field value may be null too.
- Field name: If type id is set, type id will be used instead. Otherwise meta string encoding length and data will
be written instead.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,11 @@ public MetaSharedCodecBuilder(TypeRef<?> beanType, Fury fury, ClassDef classDef)
f -> MetaSharedSerializer.consolidateFields(f.getClassResolver(), beanClass, classDef));
DescriptorGrouper grouper =
DescriptorGrouper.createDescriptorGrouper(
descriptors, true, fury.compressInt(), fury.compressLong());
fury.getClassResolver()::isMonomorphic,
descriptors,
false,
fury.compressInt(),
fury.compressLong());
objectCodecOptimizer =
new ObjectCodecOptimizer(beanClass, grouper, !fury.isBasicTypesRefIgnored(), ctx);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@
import org.apache.fury.codegen.Expression.StaticInvoke;
import org.apache.fury.codegen.ExpressionVisitor;
import org.apache.fury.memory.Platform;
import org.apache.fury.meta.ClassDef;
import org.apache.fury.reflect.TypeRef;
import org.apache.fury.serializer.ObjectSerializer;
import org.apache.fury.serializer.PrimitiveSerializers.LongSerializer;
Expand Down Expand Up @@ -87,13 +88,23 @@ public class ObjectCodecBuilder extends BaseObjectCodecBuilder {

public ObjectCodecBuilder(Class<?> beanClass, Fury fury) {
super(TypeRef.of(beanClass), fury, Generated.GeneratedObjectSerializer.class);
Collection<Descriptor> descriptors =
classResolver.getAllDescriptorsMap(beanClass, true).values();
Collection<Descriptor> descriptors;
boolean shareMeta = fury.getConfig().shareMetaContext();
if (shareMeta) {
ClassDef classDef = classResolver.getClassDef(beanClass, true);
descriptors = classDef.getDescriptors(classResolver, beanClass);
} else {
descriptors = fury.getClassResolver().getAllDescriptorsMap(beanClass, true).values();
}
classVersionHash =
new Literal(ObjectSerializer.computeVersionHash(descriptors), PRIMITIVE_INT_TYPE);
DescriptorGrouper grouper =
DescriptorGrouper.createDescriptorGrouper(
descriptors, false, fury.compressInt(), fury.compressLong());
fury.getClassResolver()::isMonomorphic,
descriptors,
false,
fury.compressInt(),
fury.compressLong());
objectCodecOptimizer =
new ObjectCodecOptimizer(beanClass, grouper, !fury.isBasicTypesRefIgnored(), ctx);
if (isRecord) {
Expand Down
Loading
Loading