Add field type for Binary vectors#2854
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for MongoDB binary vector field types, leveraging the new MongoDB\BSON\Binary::TYPE_VECTOR introduced in MongoDB PHP Extension v2.2.0. It enables compression of vector data into binary fields for improved performance.
Key changes:
- Introduces three new vector types:
vector_float32,vector_int8, andvector_packed_bit - Implements type conversion between PHP arrays and MongoDB binary vector format
- Adds comprehensive test coverage for the new vector types
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/Doctrine/ODM/MongoDB/Types/Type.php | Adds constants and type mappings for the three vector types |
| lib/Doctrine/ODM/MongoDB/Types/AbstractVectorType.php | Base implementation for vector type conversion and validation |
| lib/Doctrine/ODM/MongoDB/Types/VectorFloat32Type.php | Concrete implementation for Float32 vector type |
| lib/Doctrine/ODM/MongoDB/Types/VectorInt8Type.php | Concrete implementation for Int8 vector type |
| lib/Doctrine/ODM/MongoDB/Types/VectorPackedBitType.php | Concrete implementation for PackedBit vector type |
| tests/Doctrine/ODM/MongoDB/Tests/Types/VectorTypeTest.php | Comprehensive test suite for all vector types |
| tests/Doctrine/ODM/MongoDB/Tests/Aggregation/Stage/VectorSearchTest.php | Updates vector search test to use proper binary vector format |
| docs/en/reference/basic-mapping.rst | Documents the new vector types and requirements |
| phpstan-baseline.neon | Adds PHPStan suppressions for MongoDB extension 2.2+ features |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
e143a37 to
7a1f26b
Compare
| return $value->toArray(); | ||
| } | ||
|
|
||
| public function closureToMongo(): string |
There was a problem hiding this comment.
What are these methods for? Might be helpful to document. At least to me it's unclear what they are
There was a problem hiding this comment.
For the record, in all my years of working with ODM (on and off) I don't think I've ever fully understood why we needed all of these methods. The answer probably rests in what PHP 5.3 was capable of back in the day.
| throw new InvalidArgumentException(sprintf('Invalid binary vector data of vector type %s received for vector field, expected vector type %s', $value->getVectorType()->name, $this->getVectorType()->name)); | ||
| } | ||
|
|
||
| return $value->toArray(); |
There was a problem hiding this comment.
In the design, the property type is considered to be an array, so an array is always returned. However, it would be much more useful to know the type of the property in order to return a Binary if the property allows it, thus avoiding the need to decode the vector when it's not actually used.
There was a problem hiding this comment.
Is this something not possible with the current design of the type, since we don't have any context for the class property to which we are associated?
| throw new InvalidArgumentException(sprintf('Invalid binary vector data of vector type %s received for vector field, expected vector type %s', $value->getVectorType()->name, $this->getVectorType()->name)); | ||
| } | ||
|
|
||
| return $value->toArray(); |
There was a problem hiding this comment.
Is this something not possible with the current design of the type, since we don't have any context for the class property to which we are associated?
| use PHPUnit\Framework\TestCase; | ||
|
|
||
| /** Binary Vector type added in ext-mongodb 2.2+ */ | ||
| #[RequiresMethod(Binary::class, 'fromVector')] |
There was a problem hiding this comment.
There was a problem hiding this comment.
Very handy. I don't get why Sebastian rejected RequiresClass: sebastianbergmann/phpunit#4792
There was a problem hiding this comment.
I could have used RequiresPhpExtension('mongodb', '>= 2.2')
There was a problem hiding this comment.
Could folks just use RequiresMethod(MyClass::class, '__construct()') since all classes have a constructor?
Edit: No they cannot. Not all classes have a constructor (even implicit): https://3v4l.org/Y5OdE#v8.4.13
jmikola
left a comment
There was a problem hiding this comment.
Do you want to block this on an ext-mongodb 2.2.0 release? I'll defer to you.
I assume no changes to composer.json or the CI workflow are required.
| PHP); | ||
| } | ||
|
|
||
| abstract protected function getVectorType(): VectorType; |
There was a problem hiding this comment.
Is the PHPStan error here just due to the class not being available?
There was a problem hiding this comment.
I've skipped in PHPStan all errors related to binary vectors.
e1b52dc to
82a1b5c
Compare
| "phpstan/phpstan-deprecation-rules": "^2.0", | ||
| "phpstan/phpstan-phpunit": "^2.0", | ||
| "phpunit/phpunit": "^10.4", | ||
| "phpunit/phpunit": "^10.5.58", |
There was a problem hiding this comment.
Updating the minimum version of PHPUnit as the version 10.4.0 load the data providers even if #[RequiresPhpExtension('mongodb', '>= 2.2')] is not matched.
Summary
Vector can be compressed into a binary field thanks to the new
MongoDB\BSON\Binary::TYPE_VECTORintroduced in MongoDB PHP Extension v2.2.0