Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Connector-V2][OceanBase] Support vector types on OceanBase #7375

Merged
merged 19 commits into from
Aug 21, 2024

Conversation

xxsc0529
Copy link
Contributor

@xxsc0529 xxsc0529 commented Aug 12, 2024

Purpose of this pull request

Added support for vectors in Oceanbase Close #7290

Does this PR introduce any user-facing change?

yes

How was this patch tested?

When creating test data in milvus, the database is default, the collection is created as simple_examole, and the data sample is as shown in the following figure
image
The test script is as follows
env {
job.mode = "BATCH"
}

source {
Milvus {
url = "http://localhost:19530"
token = "root:Milvus"
database = "default"
collection="simple_example"
}
}

transform {
}

sink {
jdbc {
url = "jdbc:oceanbase://localhost:2881/test"
driver = "com.oceanbase.jdbc.Driver"
user = "root"
password = ""
generate_sink_sql =true
compatible_mode="mysql"
database = "test"
table = "simple_example"
}
}
Since oceanbase-client does not support vector parsing, oceanbase catalog does not support automatic creation of vector tables
The DDL of Oceanbase is as follows:
CREATE TABLE IF NOT EXISTS simple_example
(
book_id int NOT NULL,
book_intro vector(4) DEFAULT NULL,
book_title varchar(64) DEFAULT NULL,
primary key (book_id)
);

image
image
This mission was successful

Check list

@Hisoka-X
Copy link
Member

Could you add a test case for insert/read vector into/from oceanbase?

@github-actions github-actions bot added the e2e label Aug 13, 2024
@xxsc0529
Copy link
Contributor Author

Could you add a test case for insert/read vector into/from oceanbase?

I added the test case and the test on giothub was successful
image
Because the milvus container and oceanbase container occupy too much memory after startup, @disable annotations are added

@Hisoka-X
Copy link
Member

Thanks @xxsc0529 ! How about waiting #7401 be merged to replace milvus in test case?

@xxsc0529
Copy link
Contributor Author

Thanks @xxsc0529 ! How about waiting #7401 be merged to replace milvus in test case?

ok

@xxsc0529
Copy link
Contributor Author

Thanks @xxsc0529 ! How about waiting #7401 be merged to replace milvus in test case?

@Hisoka-X this #7401 is merged

@Hisoka-X
Copy link
Member

Please merge from dev then use fake source to produce vector data to write into oceanbase

@xxsc0529
Copy link
Contributor Author

Please merge from dev then use fake source to produce vector data to write into oceanbase
The test case of fakesource has been added, and the result is as follows
image

@xxsc0529
Copy link
Contributor Author

xxsc0529 commented Aug 20, 2024

Please merge from dev then use fake source to produce vector data to write into oceanbase
What should I do if I find that the connection fails to execute the workflow?
The link is below
https://paimon.apache.org/docs/0.6/maintenance/configurations/#coreoptions
location
./docs/en/connector-v2/sink/Paimon.md
image
@Hisoka-X

@Hisoka-X
Copy link
Member

Please merge from dev then use fake source to produce vector data to write into oceanbase
What should I do if I find that the connection fails to execute the workflow?
The link is below
https://paimon.apache.org/docs/0.6/maintenance/configurations/#coreoptions
location
./docs/en/connector-v2/sink/Paimon.md
image
@Hisoka-X

try to fix it in #7435

@Hisoka-X
Copy link
Member

Please merge from dev. Thanks

@xxsc0529
Copy link
Contributor Author

xxsc0529 commented Aug 21, 2024

Please merge from dev. Thanks

done,
image
Did you change the logic of mysqlcdc? There is an error here @Hisoka-X

@Hisoka-X
Copy link
Member

Did you change the logic of mysqlcdc? There is an error here @Hisoka-X

No, maybe it's unstable. Please retry failed CI.

@xxsc0529
Copy link
Contributor Author

xxsc0529 commented Aug 21, 2024

Did you change the logic of mysqlcdc? There is an error here @Hisoka-X

No, maybe it's unstable. Please retry failed CI.

ok,done, @Hisoka-X

value = {},
type = {EngineType.SPARK, EngineType.FLINK},
disabledReason = "Currently SPARK and FLINK not support adapt")
@Disabled("oceanbase vector and milvus takes up too much memory")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove disabled on fake to oceanbase case.

Copy link
Contributor Author

@xxsc0529 xxsc0529 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present, Oceanbase's vector image will not be closed, resulting in a process still running, and this image problem will be dealt with in the future, but the function is okay。seatunne's executeJob logic has an assertion to determine whether there is an image running, and after testing, replacing it with a mysql image or other versions of the image will not have this error, and the current ob vector image has this problem, so add @disable without modifying the original test logic on the premise。
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oceanbase's vector image will not be closed, resulting in a process still running

Do you know the reason? And is there any problem if we don't solve this problem and enable the test case at the same time?

Copy link
Contributor Author

@xxsc0529 xxsc0529 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove disabled on fake to oceanbase case.

Subsequent PR will be submitted, the OB-related testcontainers will be modified, and the test container will be replaced to the latest version @Hisoka-X What do you think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oceanbase's vector image will not be closed, resulting in a process still running

Do you know the reason? And is there any problem if we don't solve this problem and enable the test case at the same time?

The problem of the ob vector container, I am sure that there is no problem with the test case, and the synchronous results can be printed out at the moment, but the test container still has threads running and cannot be closed

Copy link
Member

@Hisoka-X Hisoka-X Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present, Oceanbase's vector image will not be closed, resulting in a process still running, and this image problem will be dealt with in the future, but the function is okay。seatunne's executeJob logic has an assertion to determine whether there is an image running, and after testing, replacing it with a mysql image or other versions of the image will not have this error, and the current ob vector image has this problem, so add @disable without modifying the original test logic on the premise。 image

This is the assert logic provided by seatunnel to ensure that there are no abnormal threads in the engine. I believe this test case did not pass our assert logic. The JNA cleaner thread come from oceanbase client. This is a bug come from jna tool. Fixed by java-native-access/jna@e8182b2 in version 5.14.0
So we should do this:

  1. open test case and add JNA cleaner thread into
  2. waiting oceanbase client update jna version to 5.14.0 then remove it from isIssueWeAlreadyKnow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @whhe as well.

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @xxsc0529 !

@hailin0 hailin0 merged commit a6b188d into apache:dev Aug 21, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][Connector-V2][OceanBase] Support vector types on OceanBase
3 participants