Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import with global sort, import rows is not right #47535

Closed
D3Hunter opened this issue Oct 11, 2023 · 1 comment · Fixed by #47648
Closed

import with global sort, import rows is not right #47535

D3Hunter opened this issue Oct 11, 2023 · 1 comment · Fixed by #47648
Labels
affects-7.5 component/ddl This issue is related to DDL of TiDB. feature/developing the related feature is in development severity/moderate type/bug The issue is confirmed as a bug.

Comments

@D3Hunter
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

import with global sort, after import, the imported rows(11100001) is 1 more larger than expect rows(11100000).

| 210002 | s3://qe-testing/10T/data0/test.item_core.10*.csv?access-key=xxxxxx&endpoint=http%3A%2F%2Fks3-cn-beijing-internal.ksyuncs.com&force-path-style=false&provider=ks&region=Beijing&secret-access-key=xxxxxx                                                                               | `test`.`item_core` 
|      136 |           | finished  | 158.8GB          |      11100001 |                   | 2023-10-10 13:39:37.407559 | 2023-10-10 13:39:40.761066 | 2023-10-10 21:55:29.764332 | root@%     |

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

master

@D3Hunter D3Hunter added the type/bug The issue is confirmed as a bug. label Oct 11, 2023
@seiya-annie seiya-annie added severity/moderate feature/developing the related feature is in development component/ddl This issue is related to DDL of TiDB. labels Oct 12, 2023
@lance6716
Copy link
Contributor

There's two data engines:

[2023/10/12 08:41:09.481 +00:00] [Info] [local.go:1547] ["start import engine"] [uuid=fd3831fa-7922-545e-9403-40fe17340465] ["region ranges"=1062] [count=0] [size=107374182400]
...
[2023/10/12 08:41:09.481 +00:00] [Info] [localhelper.go:160] ["split and scatter region"] [minKey=7480000000000000FFE05F720131303030FF30303030FF304B4CFF7777384A35FF4D6AFF395971344C32FF64FF4E7747586C434EFFFF7742396770424531FFFF6A000000000000FF00F8000000000000F9] [maxKey=7480000000000000FFE05F720131303636FF31323433FF37436FFF7139636B5AFF414DFF5A5154733141FF58FF46725368766F71FFFF314F586A6F313439FFFF6E000000000000FF00F8000000000000FA][retry=0]

decoding max key we got

    │   └─## table row key
    │     ├─table: 224
    │     └─"\00110661243\3777Coq9ckZ\377AMZQTs1A\377XFrShvoq\3771OXjo149\377n\000\000\000\000\000\000\000\370\000"
    │       └─## decode index values
    │         ├─kind: Bytes, value: 106612437Coq9ckZAMZQTs1AXFrShvoq1OXjo149n
    │         └─kind: Null, value:

another data engine is

[2023/10/12 08:56:45.499 +00:00] [Info] [local.go:1547] ["start import engine"] [uuid=55684aee-4d12-5c50-8af8-d6e741cc01aa] ["region ranges"=559] [count=0] [size=107374182400]
...
[2023/10/12 08:56:45.499 +00:00] [Info] [localhelper.go:160] ["split and scatter region"] [minKey=7480000000000000FFE05F720131303636FF31323433FF37436FFF7139636B5AFF414DFF5A5154733141FF58FF46725368766F71FFFF314F586A6F313439FFFF6E000000000000FF00F8000000000000F9][maxKey=7480000000000000FFE05F720139393939FF39396744FF6D666FFF73524A5A43FF796BFF416C51316957FF6EFF51687864323155FFFF5A69616C38360000FFFD00000000000000FA] [retry=0]

decoding the min key we got

    │   └─## table row key
    │     ├─table: 224
    │     └─"\00110661243\3777Coq9ckZ\377AMZQTs1A\377XFrShvoq\3771OXjo149\377n\000\000\000\000\000\000\000\370"
    │       └─## decode index values
    │         └─kind: Bytes, value: 106612437Coq9ckZAMZQTs1AXFrShvoq1OXjo149n

so PK 106612437Coq9ckZAMZQTs1AXFrShvoq1OXjo149n is included in both ranges. In other words, we should not call nextKey on the end key of first range

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.5 component/ddl This issue is related to DDL of TiDB. feature/developing the related feature is in development severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants