Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement roaring64_bitmap_internal_validate #588

Merged
merged 3 commits into from
Feb 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions include/roaring/art/art.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,21 @@ size_t art_size_in_bytes(const art_t *art);
*/
void art_printf(const art_t *art);

/**
* Callback for validating the value stored in a leaf.
*
* Should return true if the value is valid, false otherwise
* If false is returned, `*reason` should be set to a static string describing
* the reason for the failure.
*/
typedef bool (*art_validate_cb_t)(const art_val_t *val, const char **reason);

/**
* Validate the ART tree, ensuring it is internally consistent.
*/
bool art_internal_validate(const art_t *art, const char **reason,
art_validate_cb_t validate_cb);

/**
* ART-internal iterator bookkeeping. Users should treat this as an opaque type.
*/
Expand Down
14 changes: 14 additions & 0 deletions include/roaring/roaring64.h
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,20 @@ uint64_t roaring64_bitmap_maximum(const roaring64_bitmap_t *r);
*/
bool roaring64_bitmap_run_optimize(roaring64_bitmap_t *r);

/**
* Perform internal consistency checks.
*
* Returns true if the bitmap is consistent. It may be useful to call this
* after deserializing bitmaps from untrusted sources. If
* roaring64_bitmap_internal_validate returns true, then the bitmap is
* consistent and can be trusted not to cause crashes or memory corruption.
Comment on lines +298 to +301
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what the docs describe, I feel that there is some slight mismatch with what we do in roaring64_bitmap_portable_deserialize_safe, because there we construct 32-bit roaring bitmaps and, importantly, don't call internal_validate on them. Not sure what the best solution is here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I think the way we interact with those bitmaps is safe (at least for now):

  • we don't do anything with the containers themselves
  • roaring_bitmap_portable_deserialize_safe will always return a consistent count of containers (if the bitmap says it has N containers, there should be N keys, typecodes, containers)
  • we don't (currently) rely on the container keys being ordered
    • This does mean we could create a valid roaring64 bitmap from an invalid serialization (if a sub-bitmap serialization had unsorted keys, we'll end up with them in the right place because we insert them individually into the ART)
  • freeing the bitmap must always be safe

It does mean that, like roaring_bitmap_portable_deserialize_safe, despite the name containing "safe", it's absolutely possible to get a segfault interacting with the returned roaring64 bitmap if you feed it junk unless you call validate on it (or have external proof that the input is validly the output of serializing a roaring64 bitmap)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair assessment, we can leave the wording as-is then.

*
* If reason is non-null, it will be set to a string describing the first
* inconsistency found if any.
*/
bool roaring64_bitmap_internal_validate(const roaring64_bitmap_t *r,
const char **reason);

/**
* Return true if the two bitmaps contain the same elements.
*/
Expand Down
258 changes: 258 additions & 0 deletions src/art/art.c
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,21 @@ typedef uint8_t art_typecode_t;
// of the trie internals.
typedef art_val_t art_leaf_t;

typedef struct art_internal_validate_s {
const char **reason;
art_validate_cb_t validate_cb;

int depth;
art_key_chunk_t current_key[ART_KEY_BYTES];
} art_internal_validate_t;

// Set the reason message, and return false for convenience.
static inline bool art_validate_fail(const art_internal_validate_t *validate,
const char *msg) {
*validate->reason = msg;
return false;
}

// Inner node, with prefix.
//
// We use a fixed-length array as a pointer would be larger than the array.
Expand Down Expand Up @@ -308,6 +323,43 @@ static inline art_indexed_child_t art_node4_lower_bound(
return indexed_child;
}

static bool art_internal_validate_at(const art_node_t *node,
art_internal_validate_t validator);

static bool art_node4_internal_validate(const art_node4_t *node,
art_internal_validate_t validator) {
if (node->count == 0) {
return art_validate_fail(&validator, "Node4 has no children");
}
if (node->count > 4) {
return art_validate_fail(&validator, "Node4 has too many children");
}
if (node->count == 1) {
return art_validate_fail(
&validator, "Node4 and child node should have been combined");
}
validator.depth++;
for (int i = 0; i < node->count; ++i) {
if (i > 0) {
if (node->keys[i - 1] >= node->keys[i]) {
return art_validate_fail(
&validator, "Node4 keys are not strictly increasing");
}
}
for (int j = i + 1; j < node->count; ++j) {
if (node->children[i] == node->children[j]) {
return art_validate_fail(&validator,
"Node4 has duplicate children");
}
}
validator.current_key[validator.depth - 1] = node->keys[i];
if (!art_internal_validate_at(node->children[i], validator)) {
return false;
}
}
return true;
}

static art_node16_t *art_node16_create(const art_key_chunk_t prefix[],
uint8_t prefix_size) {
art_node16_t *node = (art_node16_t *)roaring_malloc(sizeof(art_node16_t));
Expand Down Expand Up @@ -461,6 +513,36 @@ static inline art_indexed_child_t art_node16_lower_bound(
return indexed_child;
}

static bool art_node16_internal_validate(const art_node16_t *node,
art_internal_validate_t validator) {
if (node->count <= 4) {
return art_validate_fail(&validator, "Node16 has too few children");
}
if (node->count > 16) {
return art_validate_fail(&validator, "Node16 has too many children");
}
validator.depth++;
for (int i = 0; i < node->count; ++i) {
if (i > 0) {
if (node->keys[i - 1] >= node->keys[i]) {
return art_validate_fail(
&validator, "Node16 keys are not strictly increasing");
}
}
for (int j = i + 1; j < node->count; ++j) {
if (node->children[i] == node->children[j]) {
return art_validate_fail(&validator,
"Node16 has duplicate children");
}
}
validator.current_key[validator.depth - 1] = node->keys[i];
if (!art_internal_validate_at(node->children[i], validator)) {
return false;
}
}
return true;
}

static art_node48_t *art_node48_create(const art_key_chunk_t prefix[],
uint8_t prefix_size) {
art_node48_t *node = (art_node48_t *)roaring_malloc(sizeof(art_node48_t));
Expand Down Expand Up @@ -614,6 +696,65 @@ static inline art_indexed_child_t art_node48_lower_bound(
return indexed_child;
}

static bool art_node48_internal_validate(const art_node48_t *node,
art_internal_validate_t validator) {
if (node->count <= 16) {
return art_validate_fail(&validator, "Node48 has too few children");
}
if (node->count > 48) {
return art_validate_fail(&validator, "Node48 has too many children");
}
uint64_t used_children = 0;
for (int i = 0; i < 256; ++i) {
uint8_t child_idx = node->keys[i];
if (child_idx != ART_NODE48_EMPTY_VAL) {
if (used_children & (UINT64_C(1) << child_idx)) {
return art_validate_fail(
&validator, "Node48 keys point to the same child index");
}

art_node_t *child = node->children[child_idx];
if (child == NULL) {
return art_validate_fail(&validator, "Node48 has a NULL child");
}
used_children |= UINT64_C(1) << child_idx;
}
}
uint64_t expected_used_children =
(node->available_children) ^ NODE48_AVAILABLE_CHILDREN_MASK;
if (used_children != expected_used_children) {
return art_validate_fail(
&validator,
"Node48 available_children does not match actual children");
}
while (used_children != 0) {
uint8_t child_idx = roaring_trailing_zeroes(used_children);
used_children &= used_children - 1;

uint64_t other_children = used_children;
while (other_children != 0) {
uint8_t other_child_idx = roaring_trailing_zeroes(other_children);
if (node->children[child_idx] == node->children[other_child_idx]) {
return art_validate_fail(&validator,
"Node48 has duplicate children");
}
other_children &= other_children - 1;
}
}

validator.depth++;
for (int i = 0; i < 256; ++i) {
if (node->keys[i] != ART_NODE48_EMPTY_VAL) {
validator.current_key[validator.depth - 1] = i;
if (!art_internal_validate_at(node->children[node->keys[i]],
validator)) {
return false;
}
}
}
return true;
}

static art_node256_t *art_node256_create(const art_key_chunk_t prefix[],
uint8_t prefix_size) {
art_node256_t *node =
Expand Down Expand Up @@ -735,6 +876,40 @@ static inline art_indexed_child_t art_node256_lower_bound(
return indexed_child;
}

static bool art_node256_internal_validate(const art_node256_t *node,
art_internal_validate_t validator) {
if (node->count <= 48) {
return art_validate_fail(&validator, "Node256 has too few children");
}
if (node->count > 256) {
return art_validate_fail(&validator, "Node256 has too many children");
}
validator.depth++;
int actual_count = 0;
for (int i = 0; i < 256; ++i) {
if (node->children[i] != NULL) {
actual_count++;

for (int j = i + 1; j < 256; ++j) {
if (node->children[i] == node->children[j]) {
return art_validate_fail(&validator,
"Node256 has duplicate children");
}
}

validator.current_key[validator.depth - 1] = i;
if (!art_internal_validate_at(node->children[i], validator)) {
return false;
}
}
}
if (actual_count != node->count) {
return art_validate_fail(
&validator, "Node256 count does not match actual children");
}
return true;
}

// Finds the child with the given key chunk in the inner node, returns NULL if
// no such child is found.
static art_node_t *art_find_child(const art_inner_node_t *node,
Expand Down Expand Up @@ -1617,6 +1792,89 @@ art_val_t *art_iterator_erase(art_t *art, art_iterator_t *iterator) {
return value_erased;
}

static bool art_internal_validate_at(const art_node_t *node,
art_internal_validate_t validator) {
if (node == NULL) {
return art_validate_fail(&validator, "node is null");
}
if (art_is_leaf(node)) {
art_leaf_t *leaf = CAST_LEAF(node);
if (art_compare_prefix(leaf->key, 0, validator.current_key, 0,
validator.depth) != 0) {
return art_validate_fail(
&validator,
"leaf key does not match its position's prefix in the tree");
}
if (validator.validate_cb != NULL &&
!validator.validate_cb(leaf, validator.reason)) {
if (*validator.reason == NULL) {
*validator.reason = "leaf validation failed";
}
return false;
}
} else {
art_inner_node_t *inner_node = (art_inner_node_t *)node;

if (validator.depth + inner_node->prefix_size + 1 > ART_KEY_BYTES) {
return art_validate_fail(&validator,
"node has too much prefix at given depth");
}
memcpy(validator.current_key + validator.depth, inner_node->prefix,
inner_node->prefix_size);
validator.depth += inner_node->prefix_size;

switch (inner_node->typecode) {
case ART_NODE4_TYPE:
if (!art_node4_internal_validate((art_node4_t *)inner_node,
validator)) {
return false;
}
break;
case ART_NODE16_TYPE:
if (!art_node16_internal_validate((art_node16_t *)inner_node,
validator)) {
return false;
}
break;
case ART_NODE48_TYPE:
if (!art_node48_internal_validate((art_node48_t *)inner_node,
validator)) {
return false;
}
break;
case ART_NODE256_TYPE:
if (!art_node256_internal_validate((art_node256_t *)inner_node,
validator)) {
return false;
}
break;
default:
return art_validate_fail(&validator, "invalid node type");
}
}
return true;
}

bool art_internal_validate(const art_t *art, const char **reason,
art_validate_cb_t validate_cb) {
const char *reason_local;
if (reason == NULL) {
// Always allow assigning through *reason
reason = &reason_local;
}
*reason = NULL;
if (art->root == NULL) {
return true;
}
art_internal_validate_t validator = {
.reason = reason,
.validate_cb = validate_cb,
.depth = 0,
.current_key = {0},
};
return art_internal_validate_at(art->root, validator);
}

#ifdef __cplusplus
} // extern "C"
} // namespace roaring
Expand Down
12 changes: 12 additions & 0 deletions src/roaring64.c
Original file line number Diff line number Diff line change
Expand Up @@ -808,6 +808,18 @@ bool roaring64_bitmap_run_optimize(roaring64_bitmap_t *r) {
return has_run_container;
}

static bool roaring64_leaf_internal_validate(const art_val_t *val,
const char **reason) {
leaf_t *leaf = (leaf_t *)val;
return container_internal_validate(leaf->container, leaf->typecode, reason);
}

bool roaring64_bitmap_internal_validate(const roaring64_bitmap_t *r,
const char **reason) {
return art_internal_validate(&r->art, reason,
roaring64_leaf_internal_validate);
}

bool roaring64_bitmap_equals(const roaring64_bitmap_t *r1,
const roaring64_bitmap_t *r2) {
art_iterator_t it1 = art_init_iterator(&r1->art, /*first=*/true);
Expand Down
2 changes: 1 addition & 1 deletion src/roaring_array.c
Original file line number Diff line number Diff line change
Expand Up @@ -553,7 +553,7 @@ size_t ra_portable_serialize(const roaring_array_t *ra, char *buf) {
uint32_t startOffset = 0;
bool hasrun = ra_has_run_container(ra);
if (hasrun) {
uint32_t cookie = SERIAL_COOKIE | ((ra->size - 1) << 16);
uint32_t cookie = SERIAL_COOKIE | ((uint32_t)(ra->size - 1) << 16);
memcpy(buf, &cookie, sizeof(cookie));
buf += sizeof(cookie);
uint32_t s = (ra->size + 7) / 8;
Expand Down
Loading
Loading