Skip to content

Commit 766fbc9

Browse files
Merge pull request #198 from marshallpierce/mp/invalid-padding
Detect invalid padding
2 parents 442a809 + 250323c commit 766fbc9

19 files changed

+1059
-795
lines changed

README.md

+58-12
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,11 @@ Made with CLion. Thanks to JetBrains for supporting open source!
88

99
It's base64. What more could anyone want?
1010

11-
This library's goals are to be *correct* and *fast*. It's thoroughly tested and widely used. It exposes functionality at multiple levels of abstraction so you can choose the level of convenience vs performance that you want, e.g. `decode_engine_slice` decodes into an existing `&mut [u8]` and is pretty fast (2.6GiB/s for a 3 KiB input), whereas `decode_engine` allocates a new `Vec<u8>` and returns it, which might be more convenient in some cases, but is slower (although still fast enough for almost any purpose) at 2.1 GiB/s.
11+
This library's goals are to be *correct* and *fast*. It's thoroughly tested and widely used. It exposes functionality at
12+
multiple levels of abstraction so you can choose the level of convenience vs performance that you want,
13+
e.g. `decode_engine_slice` decodes into an existing `&mut [u8]` and is pretty fast (2.6GiB/s for a 3 KiB input),
14+
whereas `decode_engine` allocates a new `Vec<u8>` and returns it, which might be more convenient in some cases, but is
15+
slower (although still fast enough for almost any purpose) at 2.1 GiB/s.
1216

1317
## Example
1418

@@ -32,7 +36,8 @@ See the [docs](https://docs.rs/base64) for all the details.
3236

3337
Remove non-base64 characters from your input before decoding.
3438

35-
If you have a `Vec` of base64, [retain](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain) can be used to strip out whatever you need removed.
39+
If you have a `Vec` of base64, [retain](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain) can be used to
40+
strip out whatever you need removed.
3641

3742
If you have a `Read` (e.g. reading a file or network socket), there are various approaches.
3843

@@ -43,15 +48,45 @@ If you have a `Read` (e.g. reading a file or network socket), there are various
4348

4449
[line-wrap](https://crates.io/crates/line-wrap) does just that.
4550

51+
### I want canonical base64 encoding/decoding.
52+
53+
First, don't do this. You should no more expect Base64 to be canonical than you should expect compression algorithms to
54+
produce canonical output across all usage in the wild (hint: they don't).
55+
However, [people are drawn to their own destruction like moths to a flame](https://eprint.iacr.org/2022/361), so here we
56+
are.
57+
58+
There are two opportunities for non-canonical encoding (and thus, detection of the same during decoding): the final bits
59+
of the last encoded token in two or three token suffixes, and the `=` token used to inflate the suffix to a full four
60+
tokens.
61+
62+
The trailing bits issue is unavoidable: with 6 bits available in each encoded token, 1 input byte takes 2 tokens,
63+
with the second one having some bits unused. Same for two input bytes: 16 bits, but 3 tokens have 18 bits. Unless we
64+
decide to stop shipping whole bytes around, we're stuck with those extra bits that a sneaky or buggy encoder might set
65+
to 1 instead of 0.
66+
67+
The `=` pad bytes, on the other hand, are entirely a self-own by the Base64 standard. They do not affect decoding other
68+
than to provide an opportunity to say "that padding is incorrect". Exabytes of storage and transfer have no doubt been
69+
wasted on pointless `=` bytes. Somehow we all seem to be quite comfortable with, say, hex-encoded data just stopping
70+
when it's done rather than requiring a confirmation that the author of the encoder could count to four. Anyway, there
71+
are two ways to make pad bytes predictable: require canonical padding to the next multiple of four bytes as per the RFC,
72+
or, if you control all producers and consumers, save a few bytes by requiring no padding (especially applicable to the
73+
url-safe alphabet).
74+
75+
All `Engine` implementations must at a minimum support treating non-canonical padding of both types as an error, and
76+
optionally may allow other behaviors.
77+
4678
## Rust version compatibility
4779

48-
The minimum required Rust version is 1.57.0.
80+
The minimum supported Rust version is 1.57.0.
4981

5082
# Contributing
5183

52-
Contributions are very welcome. However, because this library is used widely, and in security-sensitive contexts, all PRs will be carefully scrutinized. Beyond that, this sort of low level library simply needs to be 100% correct. Nobody wants to chase bugs in encoding of any sort.
84+
Contributions are very welcome. However, because this library is used widely, and in security-sensitive contexts, all
85+
PRs will be carefully scrutinized. Beyond that, this sort of low level library simply needs to be 100% correct. Nobody
86+
wants to chase bugs in encoding of any sort.
5387

54-
All this means that it takes me a fair amount of time to review each PR, so it might take quite a while to carve out the free time to give each PR the attention it deserves. I will get to everyone eventually!
88+
All this means that it takes me a fair amount of time to review each PR, so it might take quite a while to carve out the
89+
free time to give each PR the attention it deserves. I will get to everyone eventually!
5590

5691
## Developing
5792

@@ -63,13 +98,22 @@ rustup run nightly cargo bench
6398

6499
## no_std
65100

66-
This crate supports no_std. By default the crate targets std via the `std` feature. You can deactivate the `default-features` to target core instead. In that case you lose out on all the functionality revolving around `std::io`, `std::error::Error` and heap allocations. There is an additional `alloc` feature that you can activate to bring back the support for heap allocations.
101+
This crate supports no_std. By default the crate targets std via the `std` feature. You can deactivate
102+
the `default-features` to target `core` instead. In that case you lose out on all the functionality revolving
103+
around `std::io`, `std::error::Error`, and heap allocations. There is an additional `alloc` feature that you can activate
104+
to bring back the support for heap allocations.
67105

68106
## Profiling
69107

70-
On Linux, you can use [perf](https://perf.wiki.kernel.org/index.php/Main_Page) for profiling. Then compile the benchmarks with `rustup nightly run cargo bench --no-run`.
108+
On Linux, you can use [perf](https://perf.wiki.kernel.org/index.php/Main_Page) for profiling. Then compile the
109+
benchmarks with `rustup nightly run cargo bench --no-run`.
71110

72-
Run the benchmark binary with `perf` (shown here filtering to one particular benchmark, which will make the results easier to read). `perf` is only available to the root user on most systems as it fiddles with event counters in your CPU, so use `sudo`. We need to run the actual benchmark binary, hence the path into `target`. You can see the actual full path with `rustup run nightly cargo bench -v`; it will print out the commands it runs. If you use the exact path that `bench` outputs, make sure you get the one that's for the benchmarks, not the tests. You may also want to `cargo clean` so you have only one `benchmarks-` binary (they tend to accumulate).
111+
Run the benchmark binary with `perf` (shown here filtering to one particular benchmark, which will make the results
112+
easier to read). `perf` is only available to the root user on most systems as it fiddles with event counters in your
113+
CPU, so use `sudo`. We need to run the actual benchmark binary, hence the path into `target`. You can see the actual
114+
full path with `rustup run nightly cargo bench -v`; it will print out the commands it runs. If you use the exact path
115+
that `bench` outputs, make sure you get the one that's for the benchmarks, not the tests. You may also want
116+
to `cargo clean` so you have only one `benchmarks-` binary (they tend to accumulate).
73117

74118
```bash
75119
sudo perf record target/release/deps/benchmarks-* --bench decode_10mib_reuse
@@ -81,7 +125,10 @@ Then analyze the results, again with perf:
81125
sudo perf annotate -l
82126
```
83127

84-
You'll see a bunch of interleaved rust source and assembly like this. The section with `lib.rs:327` is telling us that 4.02% of samples saw the `movzbl` aka bit shift as the active instruction. However, this percentage is not as exact as it seems due to a phenomenon called *skid*. Basically, a consequence of how fancy modern CPUs are is that this sort of instruction profiling is inherently inaccurate, especially in branch-heavy code.
128+
You'll see a bunch of interleaved rust source and assembly like this. The section with `lib.rs:327` is telling us that
129+
4.02% of samples saw the `movzbl` aka bit shift as the active instruction. However, this percentage is not as exact as
130+
it seems due to a phenomenon called *skid*. Basically, a consequence of how fancy modern CPUs are is that this sort of
131+
instruction profiling is inherently inaccurate, especially in branch-heavy code.
85132

86133
```text
87134
lib.rs:322 0.70 : 10698: mov %rdi,%rax
@@ -103,10 +150,10 @@ You'll see a bunch of interleaved rust source and assembly like this. The sectio
103150
0.00 : 106ab: je 1090e <base64::decode_config_buf::hbf68a45fefa299c1+0x46e>
104151
```
105152

106-
107153
## Fuzzing
108154

109-
This uses [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz). See `fuzz/fuzzers` for the available fuzzing scripts. To run, use an invocation like these:
155+
This uses [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz). See `fuzz/fuzzers` for the available fuzzing scripts.
156+
To run, use an invocation like these:
110157

111158
```bash
112159
cargo +nightly fuzz run roundtrip
@@ -115,7 +162,6 @@ cargo +nightly fuzz run roundtrip_random_config -- -max_len=10240
115162
cargo +nightly fuzz run decode_random
116163
```
117164

118-
119165
## License
120166

121167
This project is dual-licensed under MIT and Apache 2.0.

RELEASE-NOTES.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
11
# 0.20.0
22

3-
## Next
3+
## 0.20.0-beta.1
4+
5+
### Breaking changes
46

57
- Update MSRV to 1.57.0
8+
- Decoding can now either ignore padding, require correct padding, or require no padding. The default is to require correct padding.
9+
- The `NO_PAD` config now requires that padding be absent when decoding.
610

711
## 0.20.0-alpha.1
812

fuzz/Cargo.toml

+2-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@ cargo-fuzz = true
1111
[dependencies]
1212
rand = "0.6.1"
1313
rand_pcg = "0.1.1"
14-
ring = "0.13.5"
14+
sha2 = "0.10.6"
15+
1516
[dependencies.base64]
1617
path = ".."
1718
[dependencies.libfuzzer-sys]

fuzz/fuzzers/roundtrip_no_pad.rs

+6-3
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
11
#![no_main]
2-
#[macro_use] extern crate libfuzzer_sys;
2+
#[macro_use]
3+
extern crate libfuzzer_sys;
34
extern crate base64;
45

5-
use base64::engine::fast_portable;
6+
use base64::engine::{self, fast_portable};
67

78
fuzz_target!(|data: &[u8]| {
8-
let config = fast_portable::FastPortableConfig::new().with_encode_padding(false);
9+
let config = fast_portable::FastPortableConfig::new()
10+
.with_encode_padding(false)
11+
.with_decode_padding_mode(engine::DecodePaddingMode::RequireNone);
912
let engine = fast_portable::FastPortable::from(&base64::alphabet::STANDARD, config);
1013

1114
let encoded = base64::encode_engine(&data, &engine);

fuzz/fuzzers/utils.rs

+16-7
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,20 @@
11
extern crate rand;
22
extern crate rand_pcg;
3-
extern crate ring;
3+
extern crate sha2;
44

5-
use base64::{alphabet, engine::fast_portable};
5+
use base64::{alphabet, engine::{self, fast_portable}};
66
use self::rand::{Rng, SeedableRng};
77
use self::rand_pcg::Pcg32;
8-
use self::ring::digest;
8+
use self::sha2::Digest as _;
99

1010
pub fn random_engine(data: &[u8]) -> fast_portable::FastPortable {
1111
// use sha256 of data as rng seed so it's repeatable
12-
let sha = digest::digest(&digest::SHA256, data);
12+
let mut hasher = sha2::Sha256::new();
13+
hasher.update(data);
14+
let sha = hasher.finalize();
1315

1416
let mut seed: [u8; 16] = [0; 16];
15-
seed.copy_from_slice(&sha.as_ref()[0..16]);
17+
seed.copy_from_slice(&sha.as_slice()[0..16]);
1618

1719
let mut rng = Pcg32::from_seed(seed);
1820

@@ -22,9 +24,16 @@ pub fn random_engine(data: &[u8]) -> fast_portable::FastPortable {
2224
alphabet::STANDARD
2325
};
2426

27+
let encode_padding = rng.gen();
28+
let decode_padding = if encode_padding {
29+
engine::DecodePaddingMode::RequireCanonical
30+
} else {
31+
engine::DecodePaddingMode::RequireNone
32+
};
2533
let config = fast_portable::FastPortableConfig::new()
26-
.with_encode_padding(rng.gen())
27-
.with_decode_allow_trailing_bits(rng.gen());
34+
.with_encode_padding(encode_padding)
35+
.with_decode_allow_trailing_bits(rng.gen())
36+
.with_decode_padding_mode(decode_padding);
2837

2938
fast_portable::FastPortable::from(&alphabet, config)
3039
}

src/decode.rs

+15-6
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ use std::error;
1313
#[derive(Clone, Debug, PartialEq, Eq)]
1414
pub enum DecodeError {
1515
/// An invalid byte was found in the input. The offset and offending byte are provided.
16+
/// Padding characters (`=`) interspersed in the encoded form will be treated as invalid bytes.
1617
InvalidByte(usize, u8),
1718
/// The length of the input is invalid.
1819
/// A typical cause of this is stray trailing whitespace or other separator bytes.
@@ -22,9 +23,12 @@ pub enum DecodeError {
2223
InvalidLength,
2324
/// The last non-padding input symbol's encoded 6 bits have nonzero bits that will be discarded.
2425
/// This is indicative of corrupted or truncated Base64.
25-
/// Unlike InvalidByte, which reports symbols that aren't in the alphabet, this error is for
26+
/// Unlike `InvalidByte`, which reports symbols that aren't in the alphabet, this error is for
2627
/// symbols that are in the alphabet but represent nonsensical encodings.
2728
InvalidLastSymbol(usize, u8),
29+
/// The nature of the padding was not as configured: absent or incorrect when it must be
30+
/// canonical, or present when it must be absent, etc.
31+
InvalidPadding,
2832
}
2933

3034
impl fmt::Display for DecodeError {
@@ -35,6 +39,7 @@ impl fmt::Display for DecodeError {
3539
Self::InvalidLastSymbol(index, byte) => {
3640
write!(f, "Invalid last symbol {}, offset {}.", byte, index)
3741
}
42+
Self::InvalidPadding => write!(f, "Invalid padding"),
3843
}
3944
}
4045
}
@@ -46,6 +51,7 @@ impl error::Error for DecodeError {
4651
Self::InvalidByte(_, _) => "invalid byte",
4752
Self::InvalidLength => "invalid length",
4853
Self::InvalidLastSymbol(_, _) => "invalid last symbol",
54+
Self::InvalidPadding => "invalid padding",
4955
}
5056
}
5157

@@ -192,10 +198,12 @@ pub fn decode_engine_slice<E: Engine, T: AsRef<[u8]>>(
192198
#[cfg(test)]
193199
mod tests {
194200
use super::*;
195-
use crate::{encode::encode_engine_string, tests::assert_encode_sanity};
196-
197-
use crate::engine::Config;
198-
use crate::tests::random_engine;
201+
use crate::{
202+
alphabet,
203+
encode::encode_engine_string,
204+
engine::{fast_portable, fast_portable::FastPortable, Config},
205+
tests::{assert_encode_sanity, random_engine},
206+
};
199207
use rand::{
200208
distributions::{Distribution, Uniform},
201209
Rng, SeedableRng,
@@ -350,12 +358,13 @@ mod tests {
350358

351359
#[test]
352360
fn decode_engine_estimation_works_for_various_lengths() {
361+
let engine = FastPortable::from(&alphabet::STANDARD, fast_portable::NO_PAD);
353362
for num_prefix_quads in 0..100 {
354363
for suffix in &["AA", "AAA", "AAAA"] {
355364
let mut prefix = "AAAA".repeat(num_prefix_quads);
356365
prefix.push_str(suffix);
357366
// make sure no overflow (and thus a panic) occurs
358-
let res = decode_engine(prefix, &DEFAULT_ENGINE);
367+
let res = decode_engine(prefix, &engine);
359368
assert!(res.is_ok());
360369
}
361370
}

src/encode.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ pub fn encode_engine_string<E: Engine, T: AsRef<[u8]>>(
123123
/// &base64::engine::DEFAULT_ENGINE);
124124
///
125125
/// // shorten our vec down to just what was written
126-
/// buf.resize(bytes_written, 0);
126+
/// buf.truncate(bytes_written);
127127
///
128128
/// assert_eq!(s, base64::decode(&buf).unwrap().as_slice());
129129
/// ```

0 commit comments

Comments
 (0)