fix(connector): Widen Float16 to Float32 for Lance connector reads#27324
Merged
jaystarshot merged 11 commits intoprestodb:masterfrom Mar 24, 2026
Merged
fix(connector): Widen Float16 to Float32 for Lance connector reads#27324jaystarshot merged 11 commits intoprestodb:masterfrom
jaystarshot merged 11 commits intoprestodb:masterfrom
Conversation
Float16 (HALF precision) columns in Lance datasets would cause a ClassCastException because ArrowBlockBuilder doesn't handle Float2Vector. Map HALF precision to REAL type in LanceColumnHandle and widen Float2Vector to Float4Vector in LanceArrowToPageScanner before passing to ArrowBlockBuilder. Ported from lance-trino commit 4e610a2.
Contributor
Reviewer's GuideMaps Arrow float16 (HALF precision) columns to Presto REAL and widens Float16 Arrow vectors to Float32 before block building to avoid ClassCastException when reading Lance datasets, reusing the existing REAL handling path. Sequence diagram for widening Float16 to Float32 during Lance readsequenceDiagram
actor PrestoEngine
participant Scanner as LanceArrowToPageScanner
participant Reader as ArrowReader
participant Root as VectorSchemaRoot
participant F2V as Float2Vector
participant F4V as Float4Vector
participant Builder as ArrowBlockBuilder
participant PageObj as Page
PrestoEngine->>Scanner: convert()
Scanner->>Reader: getVectorSchemaRoot()
Reader-->>Scanner: Root
Scanner->>Root: getRowCount()
loop for each column
Scanner->>Root: getVector(columnName)
Root-->>Scanner: F2V or FieldVector
alt vector is Float2Vector
Scanner->>Scanner: widenFloat2ToFloat4(F2V)
Scanner->>F4V: allocateNew(valueCount)
loop for each index
Scanner->>F2V: isNull(i)
alt is null
Scanner->>F4V: setNull(i)
else not null
Scanner->>F2V: getValueAsFloat(i)
Scanner->>F4V: set(i, value)
end
end
Scanner->>F4V: setValueCount(valueCount)
Scanner-->>Scanner: Float4Vector
Scanner->>Builder: buildBlockFromFieldVector(F4V, REAL, null)
else other vector type
Scanner->>Builder: buildBlockFromFieldVector(FieldVector, type, null)
end
Builder-->>Scanner: Block
end
Scanner-->>PrestoEngine: PageObj
Class diagram for widened Float16 handling in LanceArrowToPageScanner and LanceColumnHandleclassDiagram
class LanceArrowToPageScanner {
- ScannerFactory scannerFactory
- ArrowReader arrowReader
- BufferAllocator allocator
- List~LanceColumnHandle~ columns
- ArrowBlockBuilder arrowBlockBuilder
- long lastBatchBytes
+ LanceArrowToPageScanner(BufferAllocator allocator, List~LanceColumnHandle~ columns, ArrowReader arrowReader, ScannerFactory scannerFactory, ArrowBlockBuilder arrowBlockBuilder)
+ Page convert()
- Float4Vector widenFloat2ToFloat4(Float2Vector f2v)
+ void close()
}
class LanceColumnHandle {
- String columnName
- Type columnType
+ String getColumnName()
+ Type getColumnType()
+ static Type fromArrowType(ArrowType type)
}
class ArrowReader {
+ VectorSchemaRoot getVectorSchemaRoot()
}
class VectorSchemaRoot {
+ FieldVector getVector(String name)
+ int getRowCount()
}
class FieldVector {
+ String getName()
+ int getValueCount()
}
class Float2Vector {
+ String getName()
+ int getValueCount()
+ boolean isNull(int index)
+ float getValueAsFloat(int index)
}
class Float4Vector {
+ Float4Vector(String name, BufferAllocator allocator)
+ void allocateNew(int valueCount)
+ void set(int index, float value)
+ void setNull(int index)
+ void setValueCount(int valueCount)
}
class BufferAllocator
class ArrowBlockBuilder {
+ Block buildBlockFromFieldVector(FieldVector vector, Type type, Object fieldMetadata)
}
class Page {
+ Page(int positionCount, Block[] blocks)
}
class Block
class Type
class ArrowType
class ArrowType_FloatingPoint {
+ FloatingPointPrecision getPrecision()
}
class FloatingPointPrecision {
<<enum>>
HALF
SINGLE
DOUBLE
}
class RealType {
+ static RealType REAL
}
class DoubleType {
+ static DoubleType DOUBLE
}
LanceArrowToPageScanner --> LanceColumnHandle : uses
LanceArrowToPageScanner --> ArrowReader : uses
LanceArrowToPageScanner --> BufferAllocator : uses
LanceArrowToPageScanner --> ArrowBlockBuilder : uses
LanceArrowToPageScanner --> Float2Vector : widens
LanceArrowToPageScanner --> Float4Vector : creates
LanceArrowToPageScanner --> Page : returns
ArrowReader --> VectorSchemaRoot : returns
VectorSchemaRoot --> FieldVector : returns
Float2Vector --|> FieldVector
Float4Vector --|> FieldVector
LanceColumnHandle --> ArrowType : maps from
ArrowType_FloatingPoint --|> ArrowType
ArrowType_FloatingPoint --> FloatingPointPrecision : uses
LanceColumnHandle --> RealType : returns for HALF and SINGLE
LanceColumnHandle --> DoubleType : returns for DOUBLE and others
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The new Float4Vector instances created in widenFloat2ToFloat4 are not explicitly closed; please confirm ownership semantics with ArrowBlockBuilder and, if it does not take responsibility for closing, ensure these vectors are released to avoid allocator leaks.
- The widening logic runs on every convert() call via an instanceof check; if possible, consider pushing this conversion closer to schema/column handle setup so that per-batch overhead and branching are minimized.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new Float4Vector instances created in widenFloat2ToFloat4 are not explicitly closed; please confirm ownership semantics with ArrowBlockBuilder and, if it does not take responsibility for closing, ensure these vectors are released to avoid allocator leaks.
- The widening logic runs on every convert() call via an instanceof check; if possible, consider pushing this conversion closer to schema/column handle setup so that per-batch overhead and branching are minimized.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Add HALF precision type mapping test in TestLanceColumnHandle - Add TestFloat16Widening with 4 tests: - testWidenFloat2ToFloat4: basic widening with values and nulls - testWidenedFloat4VectorProducesRealBlock: end-to-end with ArrowBlockBuilder - testWidenEmptyVector: edge case for empty vectors - testWidenAllNulls: edge case for all-null vectors
…tability Remove duplicated widening logic from test — tests now call LanceArrowToPageScanner.widenFloat2ToFloat4() directly.
Add wide_types_table.lance test dataset (from lance-trino) containing 16 Arrow column types including Float16. Add TestWideTypesTable with tests for reading Float16, Float32, Float64, Integer, Boolean, Varchar, Date, and FixedSizeList columns through the full LanceFragmentPageSource read path. The Float16 test verifies end-to-end widening: Lance dataset with float16 column -> Arrow Float2Vector -> widenFloat2ToFloat4 -> Presto REAL Block with correct values.
Add vector coercion in LanceArrowToPageScanner to handle Arrow types that ArrowBlockBuilder doesn't support natively: - UInt8Vector (uint64) -> BigIntVector (int64, read as signed) - Float2Vector inside FixedSizeListVector -> widen inner data to Float4 - Float2Vector inside ListVector -> widen inner data to Float4 All coerced vectors are tracked and cleaned up after block conversion. Also adds end-to-end tests reading wide_types_table.lance which covers all supported type paths including uint64 and nested float16 arrays.
This was referenced Mar 31, 2026
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Widen Arrow Float16 (HALF precision) columns to Float32 (REAL) when reading Lance datasets. Without this fix, querying a Lance dataset containing Float16 columns causes a
ClassCastExceptionbecauseArrowBlockBuilderdoesn't handleFloat2Vector.Changes:
FloatingPointPrecision.HALFtoRealType.REALinLanceColumnHandle.toPrestoType()Float2VectorinLanceArrowToPageScanner.convert()and widen toFloat4Vectorbefore passing toArrowBlockBuilderPorted from lance-trino commit 4e610a2.
Motivation and Context
Lance datasets used in ML/AI workloads commonly store embeddings as Float16 for storage efficiency. Without this fix, any query touching a Float16 column crashes with a
ClassCastException: Float2Vector cannot be cast to Float4Vector.Impact
Bug fix in
presto-lanceconnector only. No changes to existing Presto code.Test Plan
./mvnw test -pl presto-lanceContributor checklist
Release Notes