Skip to content

Type Encoding

Rikimaru edited this page Sep 12, 2019 · 3 revisions
  • You're using Ceras as a network protocol?
  • Or just trying to make serialization more efficient? ⚡

Then you're looking for config.KnownTypes!


How do I use KnownTypes?

Add your types to config.KnownTypes like this:

SerializerConfig config = new SerializerConfig();

config.KnownTypes.Add(typeof(MyLoginPacket));
config.KnownTypes.Add(typeof(MyChatMessagePacket));
...

var ceras = new CerasSerializer(config);

What does it do?

Sometimes variables and their values can have different types. (scroll to example below). In that case Ceras has to write the Type so that it can later know how to read the data.

When a Type that needs to be written can be found in the KnownTypes collection Ceras can emit a single byte instead of a huge string (the full name of the type)!

That's especially interesting for networking scenarios (in fact KnownTypes was implemented for exactly this reason) because that way you can completely avoid sending any strings, making the communication as efficient as a hand-crafted binary network protocol!

But even if you only serialize stuff to write it to a file, you can still save a little space and time by using KnownTypes.

  • Adding your types to KnownTypes is completely optional! Everything will work perfectly fine even if you don't do it. However doing so will save space and improve performance!

Should I just add all my types to KnownTypes?

No! Don't just throw in everything! Only add types when you know that Ceras will eventually need to encode them.

Adding types that will never be written (because they can always be inferred from the containing variable) will just reduce efficiency (even if just a little).

If you are unsure what that means then scroll down to read about how and when Ceras writes a Type

As for security (in networking scenarios), adding more Types has no effect on vulnerability. For more information about security take a look at the wiki page.

Why is that important for networking?

  • When you send your serialized packets over the internet, the receiving side will never know what kind of packet comes next.
  • Now you obviously can't have ceras.Deserialize<ChatMessagePacket>(...) since you don't know what the other side has sent.
  • So on the receiving side you'll have ceras.Deserialize<object>(...) or ceras.Deserialize<MyNetworkPacketBase>(...).
  • That means the sending side can't just do ceras.Serialize(myChatMsgPacket);, because that would actually call ceras.Serialize<ChatMessagePacket>(myChatMessagePacket);, and doing that would write the packet in a format that expects you to know the type beforehand (which is impossible).
  • So on the sending side you'll have ceras.Serialize<MyNetworkPacketBase>(...) or similar...

From here it should be clear that Ceras needs to write the type of the root-object. Of course we don't want a long type-name to be written everytime we send a network message, instead we want a tiny ID (as short as possible!).

So adding all your packets / network-messages to KnownTypes will essentially turn Ceras from a plain old serializer into an optimized network-protocol!

Tips for networking

  • Since the ID for a type is based on its index in KnownTypes, it is essential that Server and Client have the exact same types in the exact same order.
  • In order to easily verify that, Ceras can calculate a "protocol hash" based on the whole SerializerConfig.
  • The idea is that the very first message the client sends to the server after connecting is the 4-byte (a single int) protocol hash.
  • The server compares the received hash with its own, and if it doesn't match it can just kick/disconnect the client. That way it is ensured that the server and client are fully compatible.

How does Ceras determine how a type gets written?

When there is a need to write a type, Ceras tries 3 things:

  1. Check if in the current Serialize call that type was already written. If so we can just write a short backreference (1 byte)
  2. Maybe the the type is listed in KnownTypes? If so, only an ID derived from the index is written (1 byte)
  3. Seems like the user didn't put the type in KnownTypes and it's the first time we've seen this type. There's no other way than just writing the name of the type.

Even though writing a type is, even in the worst case, basically just writing a string, it's still overhead that Ceras tries very hard to prevent:

Optimization 1: Inference

Most of the time the type of values match their containing variables type already, so we don't even get to the question of how a type should be written. The fastest way to write something is just not writing anything in the first place :P

Optimization 2: Backreferences

As seen in step 2, a type only gets written once per Serialize-call. So if you have a List<ISpell> containing 100 fireballs, Ceras will only write Namespace.Fireball once, and from there on it can reference the already written name. This saves a ton of space.

Optimization 3: Deconstruction

Generic types can be exploited by deconstructing them and writing their fragments individually. Assuming we had to write a type like List<ISpell>, a naive implementation would just write "System.Collections.Generic.List<Namespace.ISpell>". Ceras will deconstruct generic types and write each primitive type individually, writing: "System.Collections.Generic.List<>" + "Namespace.ISpell". That way future types can be constructed from those individual fragements! Should we later encounter a Queue<ISpell> as well, only "Queue<>" has to be written since ISpell can already be encoded as a backreference.

When does Ceras write types?

Only when the variable and its value have different types.

Example 1:

class Thing { public object Obj; }
var t = new Thing(); 
Data Type written?
t.Obj = null; No - null is a special case
t.Obj = new object(); No - types of field and contained object match
t.Obj = 5; Yes (System.Int32)
t.Obj = "abc"; Yes (System.String)

Example 2:

interface ISpell { ... }

class Fireball : ISpell { int Damage; }
class ChainLightning : ISpell { int InitialDamage; int JumpCount; ... }

Given the definitions above, lets assume you'd write something like:

var mySpells = List<ISpell>() { new ChainLightning(), new Fireball(), new Fireball(), ... };

var data = ceras.Serialize(mySpells);

When you want to your list again, Ceras needs to know the type of each individual entry in the list.

Otherwise how would it know whether to read one or two ints?? After all the data is packed as tightly as possible and contains only the bare minimum.

Alternative to KnownTypes -> ITypeBinder

If you want to keep the type names but maybe just want to shorten or otherwise customize them, you can implement ITypeBinder and use it in your SerializerConfig.

Implementing the interface is super simple, all it is doing is converting a given Type to a string (which will be written to the binary) and the same thing in reverse (finding/resolving a Type from a given string).

If you're not sure about something you can always take a look at the source code (seeing how the default TypeBinder is implemented) or just open an issue and ask 😄