Skip to content

lightning: fix length check may be skipped for first line#61874

Merged
ti-chi-bot[bot] merged 2 commits intopingcap:masterfrom
joechenrh:check-row-size
Jun 24, 2025
Merged

lightning: fix length check may be skipped for first line#61874
ti-chi-bot[bot] merged 2 commits intopingcap:masterfrom
joechenrh:check-row-size

Conversation

@joechenrh
Copy link
Contributor

@joechenrh joechenrh commented Jun 20, 2025

What problem does this PR solve?

Issue Number: close #61873

Problem Summary:

What changed and how does it work?

#58592 introduced max row length check for CSV, but we should enable parser.beginRowLenCheck() after read column headers. Otherwise, the first row after reading the column can skip the row length check.

func (parser *CSVParser) ReadRow() error {
parser.beginRowLenCheck()
defer parser.endRowLenCheck()
row := &parser.lastRow
row.Length = 0
row.RowID++
// skip the header first
if parser.shouldParseHeader {
err := parser.ReadColumns()
if err != nil {
return errors.Trace(err)
}
parser.shouldParseHeader = false
}
fields, err := parser.readRecord(parser.lastRecord)

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Create a file with 20 fields, each field is 12MiB, total row size is 240MiB. Import using lightning with header set to true

[mydumper.csv]
header = true
tidb lightning encountered error: [Lightning:PreCheck:ErrCheckDataSource]check data source error: in file offset 0: size of row cannot exceed the max value of txn-entry-size-limit

Before this PR:

Verbose debug logs will be written to /var/folders/9z/p3_qpprn0y3dtqs0kcsqtctc0000gn/T/lightning.log.2025-06-23T13.57.32+0800                                                                  
                                                                                                                                                                                               
tidb lightning exit successfully 

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Signed-off-by: Ruihao Chen <ruihao.chen@pingcap.cn>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-tests-checked do-not-merge/needs-triage-completed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 20, 2025
@tiprow
Copy link

tiprow bot commented Jun 20, 2025

Hi @joechenrh. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: Ruihao Chen <ruihao.chen@pingcap.cn>
@ti-chi-bot ti-chi-bot bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed do-not-merge/needs-tests-checked size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 20, 2025
@codecov
Copy link

codecov bot commented Jun 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.2646%. Comparing base (465b166) to head (c8a94ae).
Report is 22 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #61874        +/-   ##
================================================
+ Coverage   73.0213%   74.2646%   +1.2432%     
================================================
  Files          1735       1765        +30     
  Lines        481288     495427     +14139     
================================================
+ Hits         351443     367927     +16484     
+ Misses       108298     105309      -2989     
- Partials      21547      22191       +644     
Flag Coverage Δ
integration 45.5590% <100.0000%> (?)
unit 72.9499% <100.0000%> (+0.6800%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.7804% <ø> (ø)
parser ∅ <ø> (∅)
br 47.0140% <ø> (+0.4411%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@joechenrh
Copy link
Contributor Author

/cc @lance6716 @D3Hunter @GMHDBJD

@ti-chi-bot ti-chi-bot bot requested review from D3Hunter, GMHDBJD and lance6716 June 23, 2025 06:03
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 23, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, lance6716

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jun 24, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 24, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-06-23 09:41:24.639895521 +0000 UTC m=+696737.363074500: ☑️ agreed by lance6716.
  • 2025-06-24 02:50:29.477936068 +0000 UTC m=+758482.201115046: ☑️ agreed by D3Hunter.

@ti-chi-bot ti-chi-bot bot merged commit ad86c97 into pingcap:master Jun 24, 2025
25 checks passed
morgo added a commit to morgo/tidb that referenced this pull request Jun 24, 2025
* origin/master: (129 commits)
  domain: Fix the issue that the min start ts doesn't correctly block keyspace-level GC (pingcap#61925)
  br: better control pd scheduler pause during log restore with filter (pingcap#61819)
  session: rename circuit breaker sysvar (pingcap#61951)
  dxfservice: create store for SYSTEM keyspace (pingcap#61752)
  docs: fix a dead link in CONTRIBUTORS.md (pingcap#61923)
  metrics/nextgengrafana: display keyspace separately (pingcap#61823)
  lightning: fix length check may be skipped for first line (pingcap#61874)
  planner: support `explain [analyze] <plan_digest>` for `explain explore` (pingcap#61942)
  planner: record explored plans into `tidb_statement_stats` when running `explain explore` (pingcap#61850)
  fix(runaway): ensure DistSQLContext's checker is synchronized with session variables (pingcap#61907)
  expression,planner: reuse the propOuterJoinConstSolver to improve performance (pingcap#61913)
  ddl,planner: remove unused and meaningless code (pingcap#61936)
  planner: remove unused field from physicalTableScan. (pingcap#61935)
  workload-learning: Extract metrics from cluster statements stats (pingcap#61378)
  executor: minor cleanup in builder.go (pingcap#61924)
  session: rename GetDomainInfoSchema to GetLatestInfoSchema (pingcap#61894)
  ingest: retry failed regions when batch scatter regions (pingcap#61722)
  planner: add tpch q1,q2,q3 benchmark (pingcap#61898)
  planner: fix uninit timeout for loading bindings (pingcap#61891)
  executor: report error when admin check on multiple tables (pingcap#61828)
  ...
@joechenrh joechenrh deleted the check-row-size branch July 8, 2025 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lightning: max row size check failed on CSV with header

3 participants