Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOAD DATA LOCAL INFILE fails when row size exceeds 64K (65535 bytes) limit #8469

Closed
TianyuZhang1214 opened this issue Oct 17, 2024 · 3 comments · Fixed by dolthub/go-mysql-server#2709
Assignees

Comments

@TianyuZhang1214
Copy link

Bug Description

When executing the following command:

mysql> LOAD DATA LOCAL INFILE 't_b.csv' INTO TABLE t_b FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' (id, real_sql);
ERROR 1105 (HY000): bufio.Scanner: token too long

An error occurs when the row length exceeds 64K (65535 bytes). The error appears to be related to the use of bufio.Scanner in the Go MySQL server implementation.


Steps to Reproduce

  1. Create the table using the following schema:

    CREATE DATABASE t_d;
    USE t_d;
    CREATE TABLE `t_b` (
      `id` bigint NOT NULL,
      `real_sql` text NOT NULL,
      PRIMARY KEY (`id`)
    );
  2. Prepare a CSV file (t_b.csv) with content similar to this:

    1,AAAAAA (where the length of 'AAAAAA' exceeds 65535 characters)
    
  3. Execute the LOAD DATA LOCAL INFILE statement:

    LOAD DATA LOCAL INFILE 't_b.csv' INTO TABLE t_b FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' (id, real_sql);

Possible Cause
The issue appears to be caused by the use of bufio.Scanner, which imposes a token size limit of 64K, as seen in the Go MySQL server code, particularly in ddl_iters.go.

@timsehn
Copy link
Contributor

timsehn commented Oct 18, 2024

@jycor will look at this today.

@jycor jycor transferred this issue from dolthub/go-mysql-server Oct 18, 2024
@jycor jycor self-assigned this Oct 18, 2024
@jycor
Copy link
Contributor

jycor commented Oct 18, 2024

Hey @TianyuZhang1214, thanks for reporting this issue.
We've increased the scanner token limit to 4GB (the max size of a LongText); hopefully, you don't have any rows larger than that.

Also, Text type does not hold 65535 bytes, you'll want to change the type to LongText.

The fix has been merged to gms, and is making its wait to dolt main. Expect a dolt release early next week.

@TianyuZhang1214
Copy link
Author

Hey @TianyuZhang1214, thanks for reporting this issue.

We've increased the scanner token limit to 4GB (the max size of a LongText); hopefully, you don't have any rows larger than that.

Also, Text type does not hold 65535 bytes, you'll want to change the type to LongText.

The fix has been merged to gms, and is making its wait to dolt main. Expect a dolt release early next week.

Thank you for addressing the issue so promptly! I apologize for overlooking the text length limit in the reported case. In our original table, we are using two columns with the TEXT data type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants