Fix/(Recrawl) archon crawled pages url chunk number key (409) error#378
Conversation
Removing langgraph-api from requirements.txt so it is autoresolved, h…
…number_key-(409)-error
change insert to upsert for supabase, removes 409 error
|
@Chillbruhhh Thanks for this! Using your PR to test CodeRabbit for the first time here. Nice since it's a simple change too. @coderabbitai Review this PR |
|
@coleam00 I'll review this PR for you! It's great that you're testing CodeRabbit with this straightforward bug fix. Let me examine the changes to ensure the switch from insert to upsert properly addresses the 409 conflict error when recrawling documents. ✅ Actions performedReview triggered.
|
WalkthroughReplaced insert calls with upsert(on_conflict="url,chunk_number") in two locations inside document_storage_service: the batch write path and the per-record fallback. Batch processing, progress reporting, and error handling remain unchanged; no public signature changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant DocumentStorageService as Service
participant DB
Client->>Service: storeBatch(batch_data)
Service->>DB: upsert(batch_data) on (url, chunk_number)
alt Upsert succeeds
DB-->>Service: OK
Service-->>Client: Batch stored
else Error during batch upsert
DB-->>Service: Error
loop Per-record fallback
Service->>DB: upsert(record) on (url, chunk_number)
DB-->>Service: OK/Error per record
end
Service-->>Client: Fallback completed with results
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/src/server/services/storage/document_storage_service.py (1)
342-351: Bug: per-record fallback re-upserts the entire batch; should upsert the single recordInside the per-record fallback loop, you’re calling upsert(batch_data, ...) instead of upserting the current record. This will repeat the full-batch upsert N times and misreport successful_inserts. It defeats the purpose of an item-by-item fallback and can cause unnecessary load.
Apply this diff to upsert only the current record:
- try: - client.table("archon_crawled_pages").upsert(batch_data, on_conflict="url,chunk_number").execute() + try: + # Upsert the single record in the per-record fallback + client.table("archon_crawled_pages").upsert([record], on_conflict="url,chunk_number").execute()
🧹 Nitpick comments (1)
python/src/server/services/storage/document_storage_service.py (1)
330-358: Nit: logging still says “insert”; update to “upsert” for clarityNow that the operation is an upsert, the log messages should reflect that to avoid confusion during debugging and ops.
- search_logger.warning( - f"Error inserting batch (attempt {retry + 1}/{max_retries}): {e}" - ) + search_logger.warning( + f"Error upserting batch (attempt {retry + 1}/{max_retries}): {e}" + ) ... - search_logger.error( - f"Failed to insert batch after {max_retries} attempts: {e}" - ) + search_logger.error( + f"Failed to upsert batch after {max_retries} attempts: {e}" + ) ... - search_logger.error( - f"Failed individual insert for {record['url']}: {individual_error}" - ) + search_logger.error( + f"Failed individual upsert for {record['url']}: {individual_error}" + ) ... - search_logger.info( - f"Individual inserts: {successful_inserts}/{len(batch_data)} successful" - ) + search_logger.info( + f"Individual upserts: {successful_inserts}/{len(batch_data)} successful" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
python/src/server/services/storage/document_storage_service.py(2 hunks)
mistakenly added batch_data twice instead of record to line 349.
…number_key-(409)-error
|
Thanks for this @Chillbruhhh! I will be closing this PR though because I actually like having insert statements here instead of upsert because there should never be an upsert in this case, so it would mask an underlying issue (which in this case we have since addressed!). I am taking a look at your other PR and testing it right now! #437 |
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…378) (#409) * Investigate issues #252 and #378: test infrastructure failures Root cause for #378: github-context.test.ts globally mocks fs/promises with readFile returning empty string, which leaks to version.test.ts causing JSON.parse('') to fail. Issue #252 (test hang) no longer reproduces but preventive cleanup should be added to executor.test.ts. * Fix test infrastructure: mock.module leak and executor cleanup (#252, #378) Remove global mock.module('fs/promises') from github-context.test.ts that was leaking a readFile mock across all test files, causing 3 version.test.ts failures in CI. The mock was redundant since ensureRepoReady and autoDetectAndLoadCommands are already mocked at the adapter level. Add afterAll mock.restore() to executor.test.ts to prevent module mocks and pending timers from leaking to other test files. Changes: - Remove mock.module('fs/promises') from github-context.test.ts - Add afterAll cleanup to executor.test.ts Fixes #252, Fixes #378
…m00#252, coleam00#378) (coleam00#409) * Investigate issues coleam00#252 and coleam00#378: test infrastructure failures Root cause for coleam00#378: github-context.test.ts globally mocks fs/promises with readFile returning empty string, which leaks to version.test.ts causing JSON.parse('') to fail. Issue coleam00#252 (test hang) no longer reproduces but preventive cleanup should be added to executor.test.ts. * Fix test infrastructure: mock.module leak and executor cleanup (coleam00#252, coleam00#378) Remove global mock.module('fs/promises') from github-context.test.ts that was leaking a readFile mock across all test files, causing 3 version.test.ts failures in CI. The mock was redundant since ensureRepoReady and autoDetectAndLoadCommands are already mocked at the adapter level. Add afterAll mock.restore() to executor.test.ts to prevent module mocks and pending timers from leaking to other test files. Changes: - Remove mock.module('fs/promises') from github-context.test.ts - Add afterAll cleanup to executor.test.ts Fixes coleam00#252, Fixes coleam00#378
…m00#252, coleam00#378) (coleam00#409) * Investigate issues coleam00#252 and coleam00#378: test infrastructure failures Root cause for coleam00#378: github-context.test.ts globally mocks fs/promises with readFile returning empty string, which leaks to version.test.ts causing JSON.parse('') to fail. Issue coleam00#252 (test hang) no longer reproduces but preventive cleanup should be added to executor.test.ts. * Fix test infrastructure: mock.module leak and executor cleanup (coleam00#252, coleam00#378) Remove global mock.module('fs/promises') from github-context.test.ts that was leaking a readFile mock across all test files, causing 3 version.test.ts failures in CI. The mock was redundant since ensureRepoReady and autoDetectAndLoadCommands are already mocked at the adapter level. Add afterAll mock.restore() to executor.test.ts to prevent module mocks and pending timers from leaking to other test files. Changes: - Remove mock.module('fs/promises') from github-context.test.ts - Add afterAll cleanup to executor.test.ts Fixes coleam00#252, Fixes coleam00#378
Pull Request
Summary
Changes Made
Type of Change
Affected Services
Testing
Test Evidence
Checklist
Breaking Changes
Additional Notes
Summary by CodeRabbit