Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does LakeSoul store and use unstructured data? Are there any examples? #572

Open
dpengpeng opened this issue Jan 16, 2025 · 1 comment

Comments

@dpengpeng
Copy link

In the AI+Data scenario,How does LakeSoul store and use unstructured data? Are there any examples?

@xuchen-plus
Copy link
Contributor

We'll release vector indexing feature very soon. See ongoing development: #568 #571 .
We use LSH to compute and store binary encoding for embedding vectors. And vector similarity search can be done using normal SQL together with other filters, e.g.

select v.id, v.attribute,
  calculateHammingDistance(query_embedding, v.embedding) AS hamming_distance
from vector_table v
order by hamming_distance limit 10;

The vector indexing feature is designed for large scale analytical usage for unstructured data on data lake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants