Skip to content

Conversation

@tac0turtle
Copy link
Contributor

Overview

Trying to bump the timeout in order to get through larger blobs, it seems like celestia-node is running into issues

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tac0turtle, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses issues encountered when processing larger data blobs by adjusting the timeout configuration for the Data Availability (DA) client. The change removes a hardcoded 10-second default timeout, allowing the DA client's internal mechanisms to handle timeouts, which is expected to provide more flexibility and resolve current limitations with larger data transfers.

Highlights

  • Timeout Configuration: The explicit DefaultTimeout of 10 * time.Second has been removed from the da.Config struct initialization within the NewDAClient function. This change allows the underlying Data Availability (DA) client to manage its own timeout behavior, which is intended to accommodate larger data blobs.
  • Dependency Cleanup: The time package import is no longer required in block/public.go following the removal of the DefaultTimeout field, leading to cleaner code.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedDec 2, 2025, 9:42 AM

@claude
Copy link
Contributor

claude bot commented Dec 2, 2025

Claude finished @tac0turtle's task —— View job


Code Review Complete

I've completed a comprehensive review of this PR. The changes successfully make the DA request timeout configurable and add batch size configuration. Here's my detailed analysis:


Summary

This PR addresses timeout issues when processing larger blobs by:

  1. Making the DA request timeout configurable (via da.request_timeout flag)
  2. Making the DA retrieve batch size configurable (via da.retrieve_batch_size flag)
  3. Removing hardcoded timeout values
  4. Removing duplicate timeout in DA submitter

Overall Assessment: ✅ The changes are well-implemented and follow good practices. I found some areas for improvement around error handling and documentation.


Detailed Review

1. Code Quality & Architecture ✅

Strengths:

  • Clean separation of concerns: timeout configuration is now in config.Config where it belongs
  • Proper use of dependency injection: timeout is passed through NewDAClient rather than hardcoded
  • Backward compatible: defaults to 30s if not specified (line 56-58 in client.go)
  • Consistent naming: RequestTimeout and RetrieveBatchSize follow existing conventions

Minor Issues:

block/internal/da/client.go:40-42 - Consider making the constant exported or configurable:

const (
	defaultRetrieveBatchSize = 150
)

This is set to 150, but the default in defaults.go:78 is 100. This inconsistency could be confusing. Recommend aligning these values or documenting why they differ.

block/public.go:39-49 - Good refactor removing the hardcoded timeout:

return da.NewClient(da.Config{
    DA:                daLayer,
    Logger:            logger,
    Namespace:         config.DA.GetNamespace(),
    DefaultTimeout:    config.DA.RequestTimeout.Duration,  // ✅ Now configurable
    DataNamespace:     config.DA.GetDataNamespace(),
    RetrieveBatchSize: config.DA.RetrieveBatchSize,
})

2. Error Handling & Resilience ⚠️

Issue Found - block/internal/submitting/da_submitter.go:404-406:

The removal of the submissionTimeout wrapper is potentially risky:

-   submitCtx, cancel := context.WithTimeout(ctx, submissionTimeout)
-   defer cancel()
-
    // Perform submission
    start := time.Now()
-   res := s.client.Submit(submitCtx, marshaled, -1, namespace, mergedOptions)
+   res := s.client.Submit(ctx, marshaled, -1, namespace, mergedOptions)

Analysis:

  • Previously: Had a 60-second hard timeout via submissionTimeout constant
  • Now: Relies entirely on the parent ctx timeout (if any)
  • The submitToDA function is called with a context from block production/submission loops
  • If that parent context has no timeout, submissions could hang indefinitely

Recommendation:

  1. Verify that parent contexts always have appropriate timeouts, OR
  2. Consider adding a configurable submission timeout (separate from retrieve timeout) since submission and retrieval have different performance characteristics

Question for maintainers: Is the parent context guaranteed to have a timeout? If not, consider adding:

submitCtx, cancel := context.WithTimeout(ctx, s.config.DA.RequestTimeout.Duration * 2)
defer cancel()
res := s.client.Submit(submitCtx, marshaled, -1, namespace, mergedOptions)

3. Security ✅

No security concerns identified:

  • No credential handling changes
  • No new attack surfaces introduced
  • Configuration values are properly validated (timeout must be positive duration)

4. Performance & Resource Efficiency ✅

Positive Changes:

block/internal/da/client.go:164-224 - Batch processing with configurable size:

batchSize := c.batchSize
for i := 0; i < len(idsResult.IDs); i += batchSize {
    end := min(i+batchSize, len(idsResult.IDs))
    getBlobsCtx, cancel := context.WithTimeout(ctx, c.defaultTimeout)
    batchBlobs, err := c.da.Get(getBlobsCtx, idsResult.IDs[i:end], namespace)
    cancel()
    // ...
}

Observations:

  • Each batch gets its own timeout context (line 222) - good practice
  • Proper cleanup with cancel() after each batch
  • Default batch size of 100 is reasonable, but now configurable for tuning

Performance Implications:

  • Larger timeouts (default increased from implicit to explicit 30s) allow handling of larger blobs ✅
  • Configurable batch size enables performance tuning per deployment
  • Could slightly increase memory usage with larger batches, but acceptable tradeoff

5. Testing Coverage ⚠️

Added Tests - pkg/config/config_test.go:29-30, 75-76:

assert.Equal(t, 100, def.DA.RetrieveBatchSize)
assert.Equal(t, 30*time.Second, def.DA.RequestTimeout.Duration)

Good: Tests verify default values are set correctly.

Missing Test Coverage:

  1. No tests for NewDAClient with custom timeout values
  2. No tests for Retrieve function's timeout behavior with different batch sizes
  3. No integration tests verifying larger blob handling works as expected

Recommendation: Add unit tests in block/internal/da/client_test.go:

func TestClientRetrieveWithCustomTimeout(t *testing.T) {
    // Test that custom timeout is respected
}

func TestClientRetrieveWithCustomBatchSize(t *testing.T) {
    // Test batch processing with different sizes
}

6. Documentation ⚠️

pkg/config/config.go:169-171 - Good inline documentation:

RetrieveBatchSize int             `comment:"Number of IDs to request per DA Get call when retrieving blobs. Smaller batches lower per-request latency; larger batches reduce the number of RPC round trips. Default: 100."`
RequestTimeout    DurationWrapper `comment:"Per-request timeout applied to DA GetIDs/Get calls when retrieving blobs. Larger values tolerate slower DA nodes at the cost of waiting longer before failing. Default: 30s."`

Documentation Files Modified:

  • docs/learn/config.md - Need to verify this was updated (not shown in diff)
  • docs/learn/specs/block-manager.md - Added to line 171

Issue: The documentation mentions the new fields but doesn't provide guidance on:

  1. How to determine appropriate timeout values for your deployment
  2. Performance implications of different batch sizes
  3. Relationship between RequestTimeout and retry logic

Recommendation: Add to docs/learn/config.md:

### DA Request Timeout

Controls how long to wait for individual DA requests. Increase this if:
- You're working with larger blobs (>1MB)
- Your DA node is geographically distant
- You see frequent timeout errors in logs

Default: 30s

### DA Retrieve Batch Size

Number of blob IDs fetched per request. Tune based on:
- Network latency (higher latency → larger batches)
- Blob sizes (larger blobs → smaller batches)
- DA node capabilities

Default: 100

7. Additional Observations

Inconsistency Between Constants:

  • client.go:41: defaultRetrieveBatchSize = 150
  • defaults.go:78: RetrieveBatchSize: 100

This means if you don't set the config value, you get 100, but the internal constant says 150. This is confusing and should be aligned.

Migration Path:
The PR handles backward compatibility well:

  • Old deployments without the config will get sensible defaults
  • The 30s timeout is explicitly set (was implicitly 30s in NewClient)
  • No breaking changes

Recommendations Summary

High Priority

  1. ⚠️ Verify parent context timeout behavior in da_submitter.go to ensure submissions can't hang indefinitely
  2. ⚠️ Align batch size constants (150 vs 100) to prevent confusion

Medium Priority

  1. 📝 Add unit tests for new timeout/batch size configuration paths
  2. 📝 Expand documentation with tuning guidance

Low Priority

  1. 📋 Consider adding metrics for timeout occurrences to help with future tuning

Approval Status

The core changes are solid and address the stated problem. The potential issue with submission timeouts should be investigated, but may not be a problem if parent contexts are properly managed elsewhere in the codebase.

Recommended Action: Verify the parent context timeout handling before merging, otherwise the changes look good to merge.


Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request bumps the default timeout for the DA client by removing the explicitly set timeout, which causes it to fall back to the default value in the da package. While this achieves the goal of increasing the timeout, my review suggests making this new timeout value explicit in NewDAClient to improve code clarity and maintainability, rather than relying on an implicit default from another package.

@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.62%. Comparing base (a00eac7) to head (7d133ed).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2902      +/-   ##
==========================================
- Coverage   64.64%   64.62%   -0.03%     
==========================================
  Files          81       81              
  Lines        7374     7380       +6     
==========================================
+ Hits         4767     4769       +2     
- Misses       2065     2069       +4     
  Partials      542      542              
Flag Coverage Δ
combined 64.62% <100.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

PR Preview Action v1.6.3
Preview removed because the pull request was closed.
2025-12-02 09:59 UTC

julienrbrt
julienrbrt previously approved these changes Dec 2, 2025
Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to have those field configurable. Can we remove this one too: https://github.com/evstack/ev-node/blob/main/block/internal/submitting/da_submitter.go#L404-L405 it is duplicated with the client.

@tac0turtle
Copy link
Contributor Author

Great to have those field configurable. Can we remove this one too: https://github.com/evstack/ev-node/blob/main/block/internal/submitting/da_submitter.go#L404-L405 it is duplicated with the client.

done

@tac0turtle tac0turtle enabled auto-merge December 2, 2025 09:43
@tac0turtle tac0turtle added this pull request to the merge queue Dec 2, 2025
Merged via the queue into main with commit 4481fea Dec 2, 2025
27 of 28 checks passed
@tac0turtle tac0turtle deleted the marko/bump_timeout branch December 2, 2025 09:56
julienrbrt added a commit that referenced this pull request Dec 2, 2025
<!--
Please read and fill out this form before submitting your PR.

Please make sure you have reviewed our contributors guide before
submitting your
first PR.

NOTE: PR titles should follow semantic commits:
https://www.conventionalcommits.org/en/v1.0.0/
-->

## Overview

Small improvements on timeout in da client and config.

Ref: #2902,
#2878

<!-- 
Please provide an explanation of the PR, including the appropriate
context,
background, goal, and rationale. If there is an issue with this
information,
please provide a tl;dr and link the issue. 

Ex: Closes #<issue number>
-->
alpe added a commit that referenced this pull request Dec 3, 2025
* main:
  build(deps): Bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /docs in the npm_and_yarn group across 1 directory (#2900)
  refactor(block): centralize timeout in client (#2903)
  build(deps): Bump the all-go group across 2 directories with 3 updates (#2898)
  chore: bump default timeout (#2902)
  fix: revert default db (#2897)
  refactor: remove obsolete // +build tag (#2899)
  fix:da visualiser namespace  (#2895)
  refactor: omit unnecessary reassignment (#2892)
  build(deps): Bump the all-go group across 5 directories with 6 updates (#2881)
  chore: fix inconsistent method name in retryWithBackoffOnPayloadStatus comment (#2889)
  fix: ensure consistent network ID usage in P2P subscriber (#2884)
  build(deps): Bump golangci/golangci-lint-action from 9.0.0 to 9.1.0 (#2885)
  build(deps): Bump actions/checkout from 5 to 6 (#2886)
alpe added a commit that referenced this pull request Dec 8, 2025
* main:
  refactor(sequencers): persist prepended batch (#2907)
  feat(evm): add force inclusion command (#2888)
  feat: DA client, remove interface part 1: copy subset of types needed for the client using blob rpc. (#2905)
  feat: forced inclusion (#2797)
  fix: fix and cleanup metrics (sequencers + block) (#2904)
  build(deps): Bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /docs in the npm_and_yarn group across 1 directory (#2900)
  refactor(block): centralize timeout in client (#2903)
  build(deps): Bump the all-go group across 2 directories with 3 updates (#2898)
  chore: bump default timeout (#2902)
  fix: revert default db (#2897)
  refactor: remove obsolete // +build tag (#2899)
  fix:da visualiser namespace  (#2895)
julienrbrt added a commit that referenced this pull request Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants