Support DuckDB data type `ARRAY` #282

VWagen1989 · 2024-12-10T15:59:13Z

panic: encountered unknown DuckDB type({FLOAT[4] { 0 0 0 false 0 0 [] }}). This is likely a bug - please check the duckdbDataType function for missing type mappings

I was trying to import the vectors based on Arrow format to MyDuck Server following this instruction but failed. Because MyDuck Server can not handle data type List for now.

The table schema:

CREATE TABLE IF NOT EXISTS embedded_documents (
    document TEXT,
    embedding FLOAT[4]
)

Data generation and COPY operation

import numpy as np
import pandas as pd
import pyarrow as pa
import io

# Generate random document content
def generate_random_document(length=10):
    return ''.join(random.choice(string.ascii_letters) for _ in range(length))

# Generate a random vector with 1024 dimensions and convert it to string format
def generate_random_vector(dimension=VECTOR_DIMENSION):
    vector = np.random.rand(dimension).tolist()
    return vector

# Load data using the COPY command with Arrow format
def copy_arrow_data(conn, batch_size, batch_number):
    data = [(generate_random_document(), generate_random_vector()) for _ in range(batch_size)]
    cursor = None
    try:
        # Convert data to a pandas DataFrame
        df = pd.DataFrame(data, columns=['document', 'embedding'])

        # Convert pandas DataFrame to Arrow Table
        table = pa.Table.from_pandas(df)

        # Write Arrow Table to a BytesIO stream
        output_stream = io.BytesIO()
        with pa.ipc.RecordBatchStreamWriter(output_stream, table.schema) as writer:
            writer.write_table(table)

        # Use the COPY command to insert data in Arrow format
        cursor = conn.cursor()
        with cursor.copy(f"COPY {TABLE_NAME} (document, embedding) FROM STDIN (FORMAT arrow)") as copy:
            copy.write(output_stream.getvalue())

        conn.commit()
        print(f"Inserted batch of size {len(data)}")
    except Exception as e:
        print(f"Error inserting data: {e}")
    finally:
        if cursor:
            cursor.close()
    print(f"Inserted batch {batch_number}")

The text was updated successfully, but these errors were encountered:

VWagen1989 added the compatibility Be compatible with some old fashion request label Dec 10, 2024

fanyang01 changed the title ~~Support DuckDB data type List~~ Support DuckDB data type ARRAY Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support DuckDB data type `ARRAY` #282

Support DuckDB data type `ARRAY` #282

VWagen1989 commented Dec 10, 2024

Support DuckDB data type ARRAY #282

Support DuckDB data type ARRAY #282

Comments

VWagen1989 commented Dec 10, 2024

Support DuckDB data type `ARRAY` #282

Support DuckDB data type `ARRAY` #282