Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions source/bson-binary-vector/bson-binary-vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,8 @@ Drivers MUST validate vector metadata and raise an error if any invariant is vio

- Padding MUST be 0 for all dtypes where padding doesn’t apply, and MUST be within \[0, 7\] for PACKED_BIT.
- A PACKED_BIT vector MUST NOT be empty if padding is in the range \[1, 7\].
- When unpacking binary data into a FLOAT32 Vector structure, the length of the binary data following the dtype and
padding MUST be a multiple of 4 bytes.

Drivers MUST perform this validation when a numeric vector and padding are provided through the API, and when unpacking
binary data (BSON or similar) into a Vector structure.
Expand Down Expand Up @@ -242,3 +244,9 @@ See the [README](tests/README.md) for tests.
you want to store or transmit binary data more efficiently by grouping 8 bits into a single byte (uint8). For an
example in Python, see
[numpy.unpackbits](https://numpy.org/doc/2.0/reference/generated/numpy.unpackbits.html#numpy.unpackbits).

## Changelog

- 2025-02-04: Update validation for decoding into a FLOAT32 vector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the date of the originally accepted spec:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link for drivers doesn't render in the code. Maybe just leave it as [DRIVERS-2926]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you also please add the JIRA ticket and PR to the latest changelog entry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do we have the convention of appending the ticket numbers? IMO, they are trackable from the git change log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git change log contains all commits, not just those related to bson binary vectors. If we add just the PR, our convention is to include the jira ticket though.


- 2024-11-01: BSON Binary Subtype 9 accepted DRIVERS-2926 (#1708)
7 changes: 5 additions & 2 deletions source/bson-binary-vector/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Each JSON file contains three top-level keys.

- `description`: string describing the test.
- `valid`: boolean indicating if the vector, dtype, and padding should be considered a valid input.
- `vector`: list of numbers
- `vector`: (required if valid is true) list of numbers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the vector field is absent in an invalid case, is canonical_bson now a required field?

- `dtype_hex`: string defining the data type in hex (e.g. "0x10", "0x27")
- `dtype_alias`: (optional) string defining the data dtype, perhaps as Enum.
- `padding`: (optional) integer for byte padding. Defaults to 0.
Expand All @@ -50,7 +50,10 @@ MUST assert that the input float array is the same after encoding and decoding.

#### To prove correct in an invalid case (`valid:false`), one MUST

- raise an exception when attempting to encode a document from the numeric values, dtype, and padding.
- if the vector field is present, raise an exception when attempting to encode a document from the numeric values,
dtype, and padding.
- if the canonical_bson field is present, raise an exception when attempting to deserialize it into the corresponding
numeric values, as the field contains corrupted data.

## FAQ

Expand Down
18 changes: 16 additions & 2 deletions source/bson-binary-vector/tests/float32.json
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,22 @@
"vector": [127.0, 7.0],
"dtype_hex": "0x27",
"dtype_alias": "FLOAT32",
"padding": 3
"padding": 3,
"canonical_bson": "1C00000005766563746F72000A0000000927030000FE420000E04000"
},
{
"description": "Insufficient vector data with 3 bytes FLOAT32",
"valid": false,
"dtype_hex": "0x27",
"dtype_alias": "FLOAT32",
"canonical_bson": "1700000005766563746F7200050000000927002A2A2A00"
},
{
"description": "Insufficient vector data with 5 bytes FLOAT32",
"valid": false,
"dtype_hex": "0x27",
"dtype_alias": "FLOAT32",
"canonical_bson": "1900000005766563746F7200070000000927002A2A2A2A2A00"
}
]
}

4 changes: 2 additions & 2 deletions source/bson-binary-vector/tests/int8.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@
"vector": [127, 7],
"dtype_hex": "0x03",
"dtype_alias": "INT8",
"padding": 3
"padding": 3,
"canonical_bson": "1600000005766563746F7200040000000903037F0700"
},
{
"description": "INT8 with float inputs",
Expand All @@ -54,4 +55,3 @@
}
]
}

23 changes: 4 additions & 19 deletions source/bson-binary-vector/tests/packed_bit.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
"vector": [],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 1
"padding": 1,
"canonical_bson": "1400000005766563746F72000200000009100100"
},
{
"description": "Simple Vector PACKED_BIT",
Expand Down Expand Up @@ -61,21 +62,14 @@
"dtype_alias": "PACKED_BIT",
"padding": 0
},
{
"description": "Padding specified with no vector data PACKED_BIT",
"valid": false,
"vector": [],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 1
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was wrong with this test? An empty array but a non-zero padding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a duplicate of L5-L12

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice spot! Thank you!

{
"description": "Exceeding maximum padding PACKED_BIT",
"valid": false,
"vector": [1],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 8
"padding": 8,
"canonical_bson": "1500000005766563746F7200030000000910080100"
},
{
"description": "Negative padding PACKED_BIT",
Expand All @@ -84,15 +78,6 @@
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": -1
},
{
"description": "Vector with float values PACKED_BIT",
"valid": false,
"vector": [127.5],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 0
}
]
}

Loading