Skip to content

fix(connector): Widen Float16 to Float32 for Lance connector reads#27324

Merged
jaystarshot merged 11 commits intoprestodb:masterfrom
jja725:lance-f16-coercion
Mar 24, 2026
Merged

fix(connector): Widen Float16 to Float32 for Lance connector reads#27324
jaystarshot merged 11 commits intoprestodb:masterfrom
jja725:lance-f16-coercion

Conversation

@jja725
Copy link
Copy Markdown
Contributor

@jja725 jja725 commented Mar 13, 2026

Description

Widen Arrow Float16 (HALF precision) columns to Float32 (REAL) when reading Lance datasets. Without this fix, querying a Lance dataset containing Float16 columns causes a ClassCastException because ArrowBlockBuilder doesn't handle Float2Vector.

Changes:

  • Map FloatingPointPrecision.HALF to RealType.REAL in LanceColumnHandle.toPrestoType()
  • Detect Float2Vector in LanceArrowToPageScanner.convert() and widen to Float4Vector before passing to ArrowBlockBuilder

Ported from lance-trino commit 4e610a2.

Motivation and Context

Lance datasets used in ML/AI workloads commonly store embeddings as Float16 for storage efficiency. Without this fix, any query touching a Float16 column crashes with a ClassCastException: Float2Vector cannot be cast to Float4Vector.

Impact

Bug fix in presto-lance connector only. No changes to existing Presto code.

Test Plan

  • All 18 existing presto-lance unit tests pass: ./mvnw test -pl presto-lance
  • Manual: create a Lance dataset with Float16 columns and verify reads return REAL-typed results

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

== RELEASE NOTES ==

Lance Connector Changes
* Fix ClassCastException when reading Float16 columns by widening to Float32.

Float16 (HALF precision) columns in Lance datasets would cause a
ClassCastException because ArrowBlockBuilder doesn't handle Float2Vector.
Map HALF precision to REAL type in LanceColumnHandle and widen Float2Vector
to Float4Vector in LanceArrowToPageScanner before passing to ArrowBlockBuilder.

Ported from lance-trino commit 4e610a2.
@jja725 jja725 requested a review from a team as a code owner March 13, 2026 05:14
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 13, 2026

Reviewer's Guide

Maps Arrow float16 (HALF precision) columns to Presto REAL and widens Float16 Arrow vectors to Float32 before block building to avoid ClassCastException when reading Lance datasets, reusing the existing REAL handling path.

Sequence diagram for widening Float16 to Float32 during Lance read

sequenceDiagram
    actor PrestoEngine
    participant Scanner as LanceArrowToPageScanner
    participant Reader as ArrowReader
    participant Root as VectorSchemaRoot
    participant F2V as Float2Vector
    participant F4V as Float4Vector
    participant Builder as ArrowBlockBuilder
    participant PageObj as Page

    PrestoEngine->>Scanner: convert()
    Scanner->>Reader: getVectorSchemaRoot()
    Reader-->>Scanner: Root
    Scanner->>Root: getRowCount()
    loop for each column
        Scanner->>Root: getVector(columnName)
        Root-->>Scanner: F2V or FieldVector
        alt vector is Float2Vector
            Scanner->>Scanner: widenFloat2ToFloat4(F2V)
            Scanner->>F4V: allocateNew(valueCount)
            loop for each index
                Scanner->>F2V: isNull(i)
                alt is null
                    Scanner->>F4V: setNull(i)
                else not null
                    Scanner->>F2V: getValueAsFloat(i)
                    Scanner->>F4V: set(i, value)
                end
            end
            Scanner->>F4V: setValueCount(valueCount)
            Scanner-->>Scanner: Float4Vector
            Scanner->>Builder: buildBlockFromFieldVector(F4V, REAL, null)
        else other vector type
            Scanner->>Builder: buildBlockFromFieldVector(FieldVector, type, null)
        end
        Builder-->>Scanner: Block
    end
    Scanner-->>PrestoEngine: PageObj
Loading

Class diagram for widened Float16 handling in LanceArrowToPageScanner and LanceColumnHandle

classDiagram
    class LanceArrowToPageScanner {
        - ScannerFactory scannerFactory
        - ArrowReader arrowReader
        - BufferAllocator allocator
        - List~LanceColumnHandle~ columns
        - ArrowBlockBuilder arrowBlockBuilder
        - long lastBatchBytes
        + LanceArrowToPageScanner(BufferAllocator allocator, List~LanceColumnHandle~ columns, ArrowReader arrowReader, ScannerFactory scannerFactory, ArrowBlockBuilder arrowBlockBuilder)
        + Page convert()
        - Float4Vector widenFloat2ToFloat4(Float2Vector f2v)
        + void close()
    }

    class LanceColumnHandle {
        - String columnName
        - Type columnType
        + String getColumnName()
        + Type getColumnType()
        + static Type fromArrowType(ArrowType type)
    }

    class ArrowReader {
        + VectorSchemaRoot getVectorSchemaRoot()
    }

    class VectorSchemaRoot {
        + FieldVector getVector(String name)
        + int getRowCount()
    }

    class FieldVector {
        + String getName()
        + int getValueCount()
    }

    class Float2Vector {
        + String getName()
        + int getValueCount()
        + boolean isNull(int index)
        + float getValueAsFloat(int index)
    }

    class Float4Vector {
        + Float4Vector(String name, BufferAllocator allocator)
        + void allocateNew(int valueCount)
        + void set(int index, float value)
        + void setNull(int index)
        + void setValueCount(int valueCount)
    }

    class BufferAllocator

    class ArrowBlockBuilder {
        + Block buildBlockFromFieldVector(FieldVector vector, Type type, Object fieldMetadata)
    }

    class Page {
        + Page(int positionCount, Block[] blocks)
    }

    class Block

    class Type

    class ArrowType

    class ArrowType_FloatingPoint {
        + FloatingPointPrecision getPrecision()
    }

    class FloatingPointPrecision {
        <<enum>>
        HALF
        SINGLE
        DOUBLE
    }

    class RealType {
        + static RealType REAL
    }

    class DoubleType {
        + static DoubleType DOUBLE
    }

    LanceArrowToPageScanner --> LanceColumnHandle : uses
    LanceArrowToPageScanner --> ArrowReader : uses
    LanceArrowToPageScanner --> BufferAllocator : uses
    LanceArrowToPageScanner --> ArrowBlockBuilder : uses
    LanceArrowToPageScanner --> Float2Vector : widens
    LanceArrowToPageScanner --> Float4Vector : creates
    LanceArrowToPageScanner --> Page : returns
    ArrowReader --> VectorSchemaRoot : returns
    VectorSchemaRoot --> FieldVector : returns
    Float2Vector --|> FieldVector
    Float4Vector --|> FieldVector

    LanceColumnHandle --> ArrowType : maps from
    ArrowType_FloatingPoint --|> ArrowType
    ArrowType_FloatingPoint --> FloatingPointPrecision : uses
    LanceColumnHandle --> RealType : returns for HALF and SINGLE
    LanceColumnHandle --> DoubleType : returns for DOUBLE and others
Loading

File-Level Changes

Change Details Files
Map Arrow HALF-precision floating-point columns to Presto REAL type in Lance column handle resolution.
  • Extend FloatingPoint precision handling so both HALF and SINGLE map to Presto RealType.REAL
  • Keep other floating-point precisions mapped to DoubleType.DOUBLE as before
presto-lance/src/main/java/com/facebook/presto/lance/LanceColumnHandle.java
Widen Float16 Arrow vectors to Float32 vectors before building Presto blocks in the Lance Arrow-to-Page scanner.
  • Add a BufferAllocator field to LanceArrowToPageScanner and store the constructor argument after null-checking
  • Detect Float2Vector instances when reading column vectors from the Arrow root and convert them to Float4Vector before passing to ArrowBlockBuilder
  • Implement widenFloat2ToFloat4 helper that allocates a Float4Vector with the scanner allocator, copies values using getValueAsFloat while preserving nulls, and sets the value count
presto-lance/src/main/java/com/facebook/presto/lance/LanceArrowToPageScanner.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The new Float4Vector instances created in widenFloat2ToFloat4 are not explicitly closed; please confirm ownership semantics with ArrowBlockBuilder and, if it does not take responsibility for closing, ensure these vectors are released to avoid allocator leaks.
  • The widening logic runs on every convert() call via an instanceof check; if possible, consider pushing this conversion closer to schema/column handle setup so that per-batch overhead and branching are minimized.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new Float4Vector instances created in widenFloat2ToFloat4 are not explicitly closed; please confirm ownership semantics with ArrowBlockBuilder and, if it does not take responsibility for closing, ensure these vectors are released to avoid allocator leaks.
- The widening logic runs on every convert() call via an instanceof check; if possible, consider pushing this conversion closer to schema/column handle setup so that per-batch overhead and branching are minimized.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@jja725 jja725 changed the title fix(lance): Widen Float16 to Float32 for Lance connector reads fix(connector): Widen Float16 to Float32 for Lance connector reads Mar 13, 2026
Copy link
Copy Markdown
Member

@jaystarshot jaystarshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests?

jja725 added 4 commits March 18, 2026 21:18
- Add HALF precision type mapping test in TestLanceColumnHandle
- Add TestFloat16Widening with 4 tests:
  - testWidenFloat2ToFloat4: basic widening with values and nulls
  - testWidenedFloat4VectorProducesRealBlock: end-to-end with ArrowBlockBuilder
  - testWidenEmptyVector: edge case for empty vectors
  - testWidenAllNulls: edge case for all-null vectors
…tability

Remove duplicated widening logic from test — tests now call
LanceArrowToPageScanner.widenFloat2ToFloat4() directly.
Add wide_types_table.lance test dataset (from lance-trino) containing
16 Arrow column types including Float16. Add TestWideTypesTable with
tests for reading Float16, Float32, Float64, Integer, Boolean, Varchar,
Date, and FixedSizeList columns through the full LanceFragmentPageSource
read path.

The Float16 test verifies end-to-end widening: Lance dataset with
float16 column -> Arrow Float2Vector -> widenFloat2ToFloat4 -> Presto
REAL Block with correct values.
Add vector coercion in LanceArrowToPageScanner to handle Arrow types
that ArrowBlockBuilder doesn't support natively:

- UInt8Vector (uint64) -> BigIntVector (int64, read as signed)
- Float2Vector inside FixedSizeListVector -> widen inner data to Float4
- Float2Vector inside ListVector -> widen inner data to Float4

All coerced vectors are tracked and cleaned up after block conversion.

Also adds end-to-end tests reading wide_types_table.lance which covers
all supported type paths including uint64 and nested float16 arrays.
@jja725 jja725 requested a review from jaystarshot March 19, 2026 04:47
jaystarshot
jaystarshot approved these changes Mar 20, 2026
@jaystarshot jaystarshot merged commit b1eab1f into prestodb:master Mar 24, 2026
82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants