Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong column added when alter table with additional vector column #2230

Closed
7 tasks done
chiacy opened this issue May 25, 2024 · 5 comments
Closed
7 tasks done

Wrong column added when alter table with additional vector column #2230

chiacy opened this issue May 25, 2024 · 5 comments
Assignees
Labels

Comments

@chiacy
Copy link

chiacy commented May 25, 2024

Bug Description:

in my current index, I already have a realtime index with the following vector column:

vector float_vector knn_type='hnsw' knn_dims='1024' hnsw_similarity='cosine'

This is fine if I am using one text embedding model to handle English but we also have chinese content. the English text-embedding model cannot handle chinese phrases therefore I tried on another chinese specific text embedding model. however, the English model generates 1024 dimensions of vectors which is good for the above column, the chinese model that I chose generates only 768 dimensions.

when I tried to use the same 1024 vector column for chinese text-embedding vector, I can't seems to insert or update/replace the record to refresh the chinese vector.

then I went on to add a column as below:
alter table myindex add column vector_zh float_vector knn_type='hnsw' knn_dims='768' hnsw_similarity='cosine';

column Is added, but result from show create table myindex seems weird:

vector float_vector knn_type='hnsw' knn_dims='1024' hnsw_similarity='COSINE',
vector_zh float_vector knn_type='hnsw' knn_dims='0' hnsw_similarity='L2'

note that knn_dims is 0 and hnsw_similarity=L2

which is not what I asked for.

if I create a new index, the vector columns correctly created:

vector float_vector knn_type='hnsw' knn_dims='1024' hnsw_similarity='COSINE',
vector_zh float_vector knn_type='hnsw' knn_dims='768' hnsw_similarity='COSINE'

Affected versions: 6.2.13 and 6.3.0

Manticore Search Version:

6.3.0

Operating System Version:

Ubuntu 22.04 server

Have you tried the latest development version?

  • No

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Task estimated
  • Specification created, reviewed, and approved
  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation proofread
  • Changelog updated
@chiacy chiacy added the bug label May 25, 2024
@tomatolog
Copy link
Contributor

could you provide the stream of the create table or alter table statements that recreates the issue here locally?

@chiacy
Copy link
Author

chiacy commented May 25, 2024

hi!

create table

create TABLE rt10 (
id bigint,
title string attribute,
alias text,
shortlink text,
video_url text,
audio text,
edited string attribute,
img string,
caption text,
slider text,
summary text,
points text,
content text,
keywords string attribute,
status integer,
audioflag integer,
created timestamp,
updated timestamp,
type string attribute,
language string attribute,
pcategory string attribute,
category string attribute,
options string attribute,
flash string attribute,
tags string attribute,
author string attribute,
source string attribute,
vector float_vector knn_type='hnsw' knn_dims='1024' hnsw_similarity='cosine'
) html_strip='1' charset_table='non_cjk, U+3400..U+4DBF, U+4E00..U+9FFF, U+20000..U+2A6DF, U+2A700..U+2B73F, U+2B740..U+2B81F, U+2B820..U+2CEAF, U+F900..U+FAFF, U+2F800..U+2FA1F' blend_chars='-, +, &->+' morphology='stem_en, icu_chinese' wordform='/etc/manticoresearch/masterWordForm.txt';

then subsequently

alter table rt10 add column vector_zh float_vector knn_type='hnsw' knn_dims='768' hnsw_similarity='cosine';

then result of show create table rt10

CREATE TABLE rt10 (
id bigint,
alias text,
shortlink text,
video_url text,
audio text,
caption text,
slider text,
summary text,
points text,
content text,
title string attribute,
edited string attribute,
img string attribute,
keywords string attribute,
status integer,
audioflag integer,
created timestamp,
updated timestamp,
type string attribute,
language string attribute,
pcategory string attribute,
category string attribute,
options string attribute,
flash string attribute,
tags string attribute,
author string attribute,
source string attribute,
vector float_vector knn_type='hnsw' knn_dims='1024' hnsw_similarity='COSINE',
vector_zh float_vector knn_type='hnsw' knn_dims='0' hnsw_similarity='L2'
) html_strip='1' charset_table='non_cjk, U+3400..U+4DBF, U+4E00..U+9FFF, U+20000..U+2A6DF, U+2A700..U+2B73F, U+2B740..U+2B81F, U+2B820..U+2CEAF, U+F900..U+FAFF, U+2F800..U+2FA1F' blend_chars='-, +, &->+' morphology='stem_en, icu_chinese'

if I create the index directly with 2 vector columns of different dimensions:

create TABLE rt20 (
id bigint,
title string attribute,
alias text,
shortlink text,
video_url text,
audio text,
edited string attribute,
img string,
caption text,
slider text,
summary text,
points text,
content text,
keywords string attribute,
status integer,
audioflag integer,
created timestamp,
updated timestamp,
type string attribute,
language string attribute,
pcategory string attribute,
category string attribute,
options string attribute,
flash string attribute,
tags string attribute,
author string attribute,
source string attribute,
vector float_vector knn_type='hnsw' knn_dims='1024' hnsw_similarity='cosine',
vector_zh float_vector knn_type='hnsw' knn_dims='768' hnsw_similarity='cosine'
) html_strip='1' charset_table='non_cjk, U+3400..U+4DBF, U+4E00..U+9FFF, U+20000..U+2A6DF, U+2A700..U+2B73F, U+2B740..U+2B81F, U+2B820..U+2CEAF, U+F900..U+FAFF, U+2F800..U+2FA1F' blend_chars='-, +, &->+' morphology='stem_en, icu_chinese' wordform='/etc/manticoresearch/masterWordForm.txt';

it then shows correctly when show create table rt20

CREATE TABLE rt20 (
id bigint,
alias text,
shortlink text,
video_url text,
audio text,
caption text,
slider text,
summary text,
points text,
content text,
title string attribute,
edited string attribute,
img string attribute,
keywords string attribute,
status integer,
audioflag integer,
created timestamp,
updated timestamp,
type string attribute,
language string attribute,
pcategory string attribute,
category string attribute,
options string attribute,
flash string attribute,
tags string attribute,
author string attribute,
source string attribute,
vector float_vector knn_type='hnsw' knn_dims='1024' hnsw_similarity='COSINE',
vector_zh float_vector knn_type='hnsw' knn_dims='768' hnsw_similarity='COSINE'
) html_strip='1' charset_table='non_cjk, U+3400..U+4DBF, U+4E00..U+9FFF, U+20000..U+2A6DF, U+2A700..U+2B73F, U+2B740..U+2B81F, U+2B820..U+2CEAF, U+F900..U+FAFF, U+2F800..U+2FA1F' blend_chars='-, +, &->+' morphology='stem_en, icu_chinese'

is this what you need?

@sanikolaev
Copy link
Collaborator

MRE

mysql> drop table if exists t; create table t ( f1 float_vector knn_type='hnsw' knn_dims='2' hnsw_similarity='l2' ); alter table t add column f2 float_vector knn_type='hnsw' knn_dims='3' hnsw_similarity='l2'; show create table t\G
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table t ( f1 float_vector knn_type='hnsw' knn_dims='2' hnsw_similarity='l2' )
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
alter table t add column f2 float_vector knn_type='hnsw' knn_dims='3' hnsw_similarity='l2'
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
show create table t
--------------

*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE t (
id bigint,
f1 float_vector knn_type='hnsw' knn_dims='2' hnsw_similarity='L2',
f2 float_vector knn_type='hnsw' knn_dims='0' hnsw_similarity='L2'
)
1 row in set (0.00 sec)

Expected non-zero dimensions:

Create Table: CREATE TABLE t (
id bigint,
f1 float_vector knn_type='hnsw' knn_dims='2' hnsw_similarity='L2',
f2 float_vector knn_type='hnsw' knn_dims='3' hnsw_similarity='L2'
)

@tomatolog
Copy link
Contributor

fixed missed KNN options for adding attribute at alter table at 69ca250

You need to update daemon package from the dev branch to get issue fixed.

@chiacy
Copy link
Author

chiacy commented Jun 18, 2024 via email

sanikolaev pushed a commit that referenced this issue Jun 25, 2024
@sanikolaev sanikolaev added rel::6.3.2 Released in 6.3.2 and removed rel::upcoming Upcoming release labels Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants