You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+58-12
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,11 @@ Made with CLion. Thanks to JetBrains for supporting open source!
8
8
9
9
It's base64. What more could anyone want?
10
10
11
-
This library's goals are to be *correct* and *fast*. It's thoroughly tested and widely used. It exposes functionality at multiple levels of abstraction so you can choose the level of convenience vs performance that you want, e.g. `decode_engine_slice` decodes into an existing `&mut [u8]` and is pretty fast (2.6GiB/s for a 3 KiB input), whereas `decode_engine` allocates a new `Vec<u8>` and returns it, which might be more convenient in some cases, but is slower (although still fast enough for almost any purpose) at 2.1 GiB/s.
11
+
This library's goals are to be *correct* and *fast*. It's thoroughly tested and widely used. It exposes functionality at
12
+
multiple levels of abstraction so you can choose the level of convenience vs performance that you want,
13
+
e.g. `decode_engine_slice` decodes into an existing `&mut [u8]` and is pretty fast (2.6GiB/s for a 3 KiB input),
14
+
whereas `decode_engine` allocates a new `Vec<u8>` and returns it, which might be more convenient in some cases, but is
15
+
slower (although still fast enough for almost any purpose) at 2.1 GiB/s.
12
16
13
17
## Example
14
18
@@ -32,7 +36,8 @@ See the [docs](https://docs.rs/base64) for all the details.
32
36
33
37
Remove non-base64 characters from your input before decoding.
34
38
35
-
If you have a `Vec` of base64, [retain](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain) can be used to strip out whatever you need removed.
39
+
If you have a `Vec` of base64, [retain](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain) can be used to
40
+
strip out whatever you need removed.
36
41
37
42
If you have a `Read` (e.g. reading a file or network socket), there are various approaches.
38
43
@@ -43,15 +48,45 @@ If you have a `Read` (e.g. reading a file or network socket), there are various
43
48
44
49
[line-wrap](https://crates.io/crates/line-wrap) does just that.
45
50
51
+
### I want canonical base64 encoding/decoding.
52
+
53
+
First, don't do this. You should no more expect Base64 to be canonical than you should expect compression algorithms to
54
+
produce canonical output across all usage in the wild (hint: they don't).
55
+
However, [people are drawn to their own destruction like moths to a flame](https://eprint.iacr.org/2022/361), so here we
56
+
are.
57
+
58
+
There are two opportunities for non-canonical encoding (and thus, detection of the same during decoding): the final bits
59
+
of the last encoded token in two or three token suffixes, and the `=` token used to inflate the suffix to a full four
60
+
tokens.
61
+
62
+
The trailing bits issue is unavoidable: with 6 bits available in each encoded token, 1 input byte takes 2 tokens,
63
+
with the second one having some bits unused. Same for two input bytes: 16 bits, but 3 tokens have 18 bits. Unless we
64
+
decide to stop shipping whole bytes around, we're stuck with those extra bits that a sneaky or buggy encoder might set
65
+
to 1 instead of 0.
66
+
67
+
The `=` pad bytes, on the other hand, are entirely a self-own by the Base64 standard. They do not affect decoding other
68
+
than to provide an opportunity to say "that padding is incorrect". Exabytes of storage and transfer have no doubt been
69
+
wasted on pointless `=` bytes. Somehow we all seem to be quite comfortable with, say, hex-encoded data just stopping
70
+
when it's done rather than requiring a confirmation that the author of the encoder could count to four. Anyway, there
71
+
are two ways to make pad bytes predictable: require canonical padding to the next multiple of four bytes as per the RFC,
72
+
or, if you control all producers and consumers, save a few bytes by requiring no padding (especially applicable to the
73
+
url-safe alphabet).
74
+
75
+
All `Engine` implementations must at a minimum support treating non-canonical padding of both types as an error, and
76
+
optionally may allow other behaviors.
77
+
46
78
## Rust version compatibility
47
79
48
-
The minimum required Rust version is 1.57.0.
80
+
The minimum supported Rust version is 1.57.0.
49
81
50
82
# Contributing
51
83
52
-
Contributions are very welcome. However, because this library is used widely, and in security-sensitive contexts, all PRs will be carefully scrutinized. Beyond that, this sort of low level library simply needs to be 100% correct. Nobody wants to chase bugs in encoding of any sort.
84
+
Contributions are very welcome. However, because this library is used widely, and in security-sensitive contexts, all
85
+
PRs will be carefully scrutinized. Beyond that, this sort of low level library simply needs to be 100% correct. Nobody
86
+
wants to chase bugs in encoding of any sort.
53
87
54
-
All this means that it takes me a fair amount of time to review each PR, so it might take quite a while to carve out the free time to give each PR the attention it deserves. I will get to everyone eventually!
88
+
All this means that it takes me a fair amount of time to review each PR, so it might take quite a while to carve out the
89
+
free time to give each PR the attention it deserves. I will get to everyone eventually!
55
90
56
91
## Developing
57
92
@@ -63,13 +98,22 @@ rustup run nightly cargo bench
63
98
64
99
## no_std
65
100
66
-
This crate supports no_std. By default the crate targets std via the `std` feature. You can deactivate the `default-features` to target core instead. In that case you lose out on all the functionality revolving around `std::io`, `std::error::Error` and heap allocations. There is an additional `alloc` feature that you can activate to bring back the support for heap allocations.
101
+
This crate supports no_std. By default the crate targets std via the `std` feature. You can deactivate
102
+
the `default-features` to target `core` instead. In that case you lose out on all the functionality revolving
103
+
around `std::io`, `std::error::Error`, and heap allocations. There is an additional `alloc` feature that you can activate
104
+
to bring back the support for heap allocations.
67
105
68
106
## Profiling
69
107
70
-
On Linux, you can use [perf](https://perf.wiki.kernel.org/index.php/Main_Page) for profiling. Then compile the benchmarks with `rustup nightly run cargo bench --no-run`.
108
+
On Linux, you can use [perf](https://perf.wiki.kernel.org/index.php/Main_Page) for profiling. Then compile the
109
+
benchmarks with `rustup nightly run cargo bench --no-run`.
71
110
72
-
Run the benchmark binary with `perf` (shown here filtering to one particular benchmark, which will make the results easier to read). `perf` is only available to the root user on most systems as it fiddles with event counters in your CPU, so use `sudo`. We need to run the actual benchmark binary, hence the path into `target`. You can see the actual full path with `rustup run nightly cargo bench -v`; it will print out the commands it runs. If you use the exact path that `bench` outputs, make sure you get the one that's for the benchmarks, not the tests. You may also want to `cargo clean` so you have only one `benchmarks-` binary (they tend to accumulate).
111
+
Run the benchmark binary with `perf` (shown here filtering to one particular benchmark, which will make the results
112
+
easier to read). `perf` is only available to the root user on most systems as it fiddles with event counters in your
113
+
CPU, so use `sudo`. We need to run the actual benchmark binary, hence the path into `target`. You can see the actual
114
+
full path with `rustup run nightly cargo bench -v`; it will print out the commands it runs. If you use the exact path
115
+
that `bench` outputs, make sure you get the one that's for the benchmarks, not the tests. You may also want
116
+
to `cargo clean` so you have only one `benchmarks-` binary (they tend to accumulate).
73
117
74
118
```bash
75
119
sudo perf record target/release/deps/benchmarks-* --bench decode_10mib_reuse
@@ -81,7 +125,10 @@ Then analyze the results, again with perf:
81
125
sudo perf annotate -l
82
126
```
83
127
84
-
You'll see a bunch of interleaved rust source and assembly like this. The section with `lib.rs:327` is telling us that 4.02% of samples saw the `movzbl` aka bit shift as the active instruction. However, this percentage is not as exact as it seems due to a phenomenon called *skid*. Basically, a consequence of how fancy modern CPUs are is that this sort of instruction profiling is inherently inaccurate, especially in branch-heavy code.
128
+
You'll see a bunch of interleaved rust source and assembly like this. The section with `lib.rs:327` is telling us that
129
+
4.02% of samples saw the `movzbl` aka bit shift as the active instruction. However, this percentage is not as exact as
130
+
it seems due to a phenomenon called *skid*. Basically, a consequence of how fancy modern CPUs are is that this sort of
131
+
instruction profiling is inherently inaccurate, especially in branch-heavy code.
85
132
86
133
```text
87
134
lib.rs:322 0.70 : 10698: mov %rdi,%rax
@@ -103,10 +150,10 @@ You'll see a bunch of interleaved rust source and assembly like this. The sectio
103
150
0.00 : 106ab: je 1090e <base64::decode_config_buf::hbf68a45fefa299c1+0x46e>
104
151
```
105
152
106
-
107
153
## Fuzzing
108
154
109
-
This uses [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz). See `fuzz/fuzzers` for the available fuzzing scripts. To run, use an invocation like these:
155
+
This uses [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz). See `fuzz/fuzzers` for the available fuzzing scripts.
0 commit comments