Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughPropagates OriginalUrl/PostUrl through extractors and models, injects IOptions into RawRedditPostTransformer to derive host-based comment URLs, adds TimeProvider-based instance URLs in the subreddit extractor, updates UntypedExtract/Extract signatures, and adjusts tests, tooling, and docs accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Caller / Tests
participant RawClient as RawRedditClient
participant Transformer as RawRedditPostTransformer
participant Models as RedditPost Models
participant Extractor as SubredditContentExtractor / UntypedExtract
Client->>RawClient: request subreddit/post JSON
RawClient->>Transformer: deliver RawRedditPost payload
Transformer->>Transformer: validate/derive host from RedditOptions.DefaultBaseAddress
Transformer->>Transformer: parse listing → postId, subreddit, comments
Transformer->>Transformer: set Post.PostUrl = webPageUrl
Transformer->>Transformer: select best image
Transformer->>Transformer: recursively TransformComments(replies, subreddit, postId, host)
note right of Transformer: build per-comment PostUrl using host/subreddit/postId/commentId
Transformer->>Models: construct RedditPost (post + comments with PostUrl)
Transformer-->>Extractor: return transformed RedditPost
Extractor->>Extractor: compose UntypedExtract(title, content, OriginalUrl, image)
Extractor->>Client: forward extract (OriginalUrl included)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull Request Overview
This PR updates the SubredditContent.md summarization instructions to provide more detailed and structured guidance for summarizing Reddit subreddit data. The update transforms the brief instructions into a comprehensive guide with clear sections for task description, input structure, requirements, and output format.
Key Changes:
- Restructured instructions from a simple bullet list to a well-organized document with markdown sections
- Added detailed specifications for handling Reddit-specific data structures (posts, comments, nested replies)
- Introduced HTML link formatting requirements for post titles and comment author attributions
- Clarified summary length constraints and high-scoring content prioritization
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
Outdated
Show resolved
Hide resolved
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
Outdated
Show resolved
Hide resolved
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
[grammar] ~14-~14: Ensure spelling is correct
Context: ...e/themes of the subreddit 2. Posts: Sumarise every post with a thematic summary of i...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~30-~30: Use a hyphen to join words.
Context: ...reddit itself - Summaries of the highest scoring top-level posts
(QB_NEW_EN_HYPHEN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
Outdated
Show resolved
Hide resolved
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
Outdated
Show resolved
Hide resolved
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md (2)
14-14: Fix spelling: "Sumarise" → "Summarise".This spelling error was flagged in previous reviews and remains unaddressed. Correct to use consistent UK spelling throughout.
-2. **Posts**: Sumarise every post with a thematic summary of its comments +2. **Posts**: Summarise every post with a thematic summary of its comments
29-31: Complete the final bullet-point instruction.Lines 29–31 contain bullet points that provide guidance but lack imperative verbs to form complete instructions. Line 31 ("Summaries of the highest scoring top-level posts") is particularly unclear—it reads as a noun phrase rather than a directive.
Consider revising to clarify intent:
- Strictly well-formatted HTML output - Do not include any markdown notation nor put the summary in a codeblock - Brief overview of themes covered in this specific JSON document - Do not include a general description of the subreddit itself - - Summaries of the highest scoring top-level posts + - Prioritize summaries of the highest-scoring top-level posts in the output
🧹 Nitpick comments (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
239-247: Silent exception swallowing may hide deserialization issues.The empty
catchblock discards all exceptions during JSON deserialization, making it difficult to diagnose issues with malformed Reddit API responses.Consider logging the exception or at least catching a specific exception type:
try { var deserializedListing = JsonSerializer.Deserialize<RawRedditListing>(jsonElement.GetRawText()); return TransformComments(deserializedListing, subreddit, postId, host); } - catch + catch (JsonException) { + // Replies may have unexpected format in some edge cases return []; }tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1)
66-66: LGTM!Good validation of the new
PostUrlproperty.For stronger validation, consider asserting the actual URL value matches the input:
redditPost.Post.PostUrl.ShouldNotBeNullOrWhiteSpace(); + redditPost.Post.PostUrl.ShouldStartWith("https://"); + redditPost.Post.PostUrl.ShouldContain("1kqiwzc");tests/SampleRequests/Elzik.Breef.Api.http (1)
35-36: UnnecessaryContent-Typeheader on GET request.The
Content-Typeheader is only meaningful for requests with a body. GET requests don't have a request body, so this header can be removed.GET {{host}}/health -Content-Type: application/jsontests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawNewInSubredditTransformerTests.cs (1)
326-327: Consider a more specific assertion for consistency.The test data sets
PostUrlto a specific value at line 327, but line 360 only asserts it's not null or empty. For consistency with the other tests (lines 97, 105) that assert exact values, consider using a specific assertion here as well.- post.Post.PostUrl.ShouldNotBeNullOrEmpty(); + post.Post.PostUrl.ShouldBe("https://reddit.com/r/test/single_post");Also applies to: 360-360
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
.vscode/launch.json(1 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs(5 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPost.cs(2 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs(1 hunks)src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/Client/RawRedditPostClientTests.cs(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs(2 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawNewInSubredditTransformerTests.cs(5 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RedditPostClientTests.cs(4 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs(4 hunks)tests/SampleRequests/Elzik.Breef.Api.http(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .vscode/launch.json
🧰 Additional context used
🧬 Code graph analysis (3)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawNewInSubredditTransformerTests.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RedditDateTimeConverter.cs (1)
DateTime(8-19)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
RedditPost(147-190)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPost.cs (1)
RedditPost(3-7)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RedditPostClientTests.cs (4)
RedditPost(249-287)RedditPost(289-315)RedditPost(317-343)RedditPost(345-371)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/IRawRedditPostTransformer.cs (1)
RedditPost(5-5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
List(24-32)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPost.cs (3)
RedditComment(22-31)RedditPost(3-7)RedditPostContent(9-20)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPost.cs (3)
RawRedditChild(30-37)RawRedditListing(9-16)RawRedditPost(5-7)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/IRawRedditPostTransformer.cs (1)
RedditPost(5-5)
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
[uncategorized] ~18-~18: Do not mix variants of the same word (‘summarise’ and ‘summarize’) within a single text.
Context: ...post's replies are highly scoring, also summarise them and include author attribution wit...
(EN_WORD_COHERENCY)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (13)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md (2)
18-18: Ensure consistent spelling variant for "summarise".Line 18 uses "summarise" (UK spelling), which is correct. However, once line 14 is corrected from "Sumarise" to "Summarise", verify that no other variants like "summarize" (US spelling) appear elsewhere in this file to maintain coherence.
8-9: Verify PostUrl propagation aligns with schema documentation.The instructions now reference
PostUrlin the input structure (lines 8–9) and specify HTML link generation usingRedditPostContent.PostUrl(line 17). Ensure that the actual post and comment objects in the transformer and extractor consistently populate this property before these instructions are applied. Per the enriched summary,PostUrlwas introduced in the refactoring, so confirm it is reliably available in both top-level posts and nested replies.Also applies to: 17-18
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/Client/RawRedditPostClientTests.cs (1)
47-47: LGTM!Good addition to verify that the raw Reddit API response contains a non-empty URL, which is essential for the downstream PostUrl propagation.
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
63-63: LGTM!Correctly assigns the web page URL to
PostUrl, ensuring it always points to the Reddit post page regardless of whether the post is a link post or self post. This appropriately overrides the transformer's value.tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1)
140-140: LGTM!Consistent validation of
PostUrlin the comprehensive structure test.src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPost.cs (2)
19-19: LGTM!The new
PostUrlproperty follows the same pattern as other string properties in the model with a sensible empty default.
30-30: LGTM!Consistent addition of
PostUrltoRedditComment, enabling direct links to individual comments.src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
72-145: Dead code: Duplicate method overloads that are never called.Lines 72-145 define three
TransformCommentsoverloads without thehostparameter, but theTransformmethod at line 186 calls the 4-parameter overload (withhost). These 3-parameter overloads appear to be leftover from a refactor and are unreachable.Remove the unused overloads:
- private List<RedditComment> TransformComments(List<RawRedditChild> children, - string subreddit = "", string postId = "") - { - var comments = new List<RedditComment>(); - - foreach (var child in children) - { - if (child.Kind == "t1") - { - var commentUrl = $"https://www.reddit.com/r/{subreddit}/comments/{postId}/comment/{child.Data.Id}/"; - var comment = new RedditComment - { - Id = child.Data.Id ?? string.Empty, - Author = child.Data.Author ?? string.Empty, - Score = child.Data.Score, - Content = child.Data.Content ?? string.Empty, - CreatedUtc = child.Data.CreatedUtc, - PostUrl = commentUrl, - Replies = TransformComments(child.Data.Replies, subreddit, postId) - }; - - comments.Add(comment); - } - } - - return comments; - } - - private List<RedditComment> TransformComments(object? replies, string subreddit = "", string postId = "") - { - if (replies == null) - return []; - - if (replies is string stringReply && stringReply == "") - return []; - - if (replies is JsonElement jsonElement) - { - if (jsonElement.ValueKind == JsonValueKind.Null) - return []; - - if (jsonElement.ValueKind == JsonValueKind.String && jsonElement.GetString() == "") - return []; - - try - { - var deserializedListing = JsonSerializer.Deserialize<RawRedditListing>(jsonElement.GetRawText()); - return TransformComments(deserializedListing, subreddit, postId); - } - catch - { - return []; - } - } - - if (replies is RawRedditListing listing) - return TransformComments(listing, subreddit, postId); - - return []; - } - - private List<RedditComment> TransformComments(RawRedditListing? replies, string subreddit = "", string postId = "") - { - if (replies == null) - return []; - - if (replies.Data == null) - return []; - - if (replies.Data.Children == null) - return []; - - return TransformComments(replies.Data.Children, subreddit, postId); - } -Likely an incorrect or invalid review comment.
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (2)
218-229: LGTM! Good test coverage for PostUrl propagation.The test correctly verifies that the URL passed to
ExtractAsyncis propagated through to thePostUrlproperty in the serialized content, ensuring end-to-end validation of the new feature.
353-365: Good approach with the optional parameter and sensible default.The helper method maintains backward compatibility with existing tests while allowing explicit PostUrl values when needed. The default URL pattern correctly follows Reddit's URL structure.
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawNewInSubredditTransformerTests.cs (1)
59-60: LGTM! Comprehensive test coverage for PostUrl.The test data and assertions properly verify that
PostUrlis populated and propagated correctly through the transformation pipeline.Also applies to: 74-75, 97-97, 105-105
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RedditPostClientTests.cs (2)
58-58: LGTM! Good assertions for comment and reply PostUrl format.The assertions correctly verify that the
PostUrlproperty follows the expected Reddit URL format:https://www.reddit.com/r/{subreddit}/comments/{postId}/comment/{commentId}/Also applies to: 67-67
289-315: Verify if missingPostUrlin edge case factory methods is intentional.The
CreateExpectedTransformedResultmethod now includesPostUrlfor comments/replies (lines 271, 280), but other factory methods likeCreateExpectedResultWithEmptyReplies,CreateExpectedResultWithSingleComment, andCreateExpectedResultWithNullFieldsdon't setPostUrlon their comments. If the production transformer always populatesPostUrl, these expected results may need updating for consistency.
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md (1)
29-31: Complete the Output Format bullet points as imperative instructions.Lines 29 and 31 are noun phrases that lack clear imperative verbs, making the requirements ambiguous. These should be explicit instructions stating whether to include, focus on, or filter content.
Apply this diff to clarify the instructions:
-- Brief overview of themes covered in this specific JSON document -- Do not include a general description of the subreddit itself -- Summaries of the highest scoring top-level posts +- Include a brief overview of themes covered in this specific JSON document +- Do not include a general description of the subreddit itself +- Include summaries of the highest scoring top-level posts
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
[uncategorized] ~14-~14: Do not mix variants of the same word (‘summarise’ and ‘summarize’) within a single text.
Context: ...e/themes of the subreddit 2. Posts: Summarise every post with a thematic summary of i...
(EN_WORD_COHERENCY)
[uncategorized] ~18-~18: Do not mix variants of the same word (‘summarise’ and ‘summarize’) within a single text.
Context: ...post's replies are highly scoring, also summarise them and include author attribution wit...
(EN_WORD_COHERENCY)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
Outdated
Show resolved
Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md (2)
3-3: Fix inconsistent English spelling: use either "Summarise" (British) or "Summarize" (American) consistently.The document mixes American English "Summarize" (line 3) with British English "Summarise" (lines 14, 18). Choose one variant and apply it throughout for consistency.
-Summarize the provided Reddit subreddit JSON data containing posts and nested comments. +Summarise the provided Reddit subreddit JSON data containing posts and nested comments.Alternatively, if your project defaults to American English, change lines 14 and 18 to "Summarize".
Also applies to: 14-14, 18-18
31-31: Complete the incomplete instruction on line 31.Line 31 ("Summaries of the highest scoring top-level posts") is a noun phrase without an action verb, making it an incomplete instruction. Add a verb to clarify the requirement.
-- Summaries of the highest scoring top-level posts +- Include summaries of the highest scoring top-level posts
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
[uncategorized] ~14-~14: Do not mix variants of the same word (‘summarise’ and ‘summarize’) within a single text.
Context: ...e/themes of the subreddit 2. Posts: Summarise every post with a thematic summary of i...
(EN_WORD_COHERENCY)
[uncategorized] ~18-~18: Do not mix variants of the same word (‘summarise’ and ‘summarize’) within a single text.
Context: ...post's replies are highly scoring, also summarise them and include author attribution wit...
(EN_WORD_COHERENCY)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (3)
120-123: Silent exception swallowing may hide deserialization issues.The bare
catchblock swallows all exceptions without logging or distinguishing exception types. This could mask legitimate bugs or unexpected data shapes from the Reddit API.Consider catching
JsonExceptionspecifically and optionally logging:- catch + catch (JsonException) { + // Reddit API sometimes returns unexpected reply formats; gracefully degrade return []; }The same applies to the duplicate catch block at lines 194-197.
117-118: Consider deserializing directly fromJsonElement.Using
GetRawText()creates an intermediate string.JsonSerializercan deserialize directly fromJsonElement:- var deserializedListing = JsonSerializer.Deserialize<RawRedditListing>(jsonElement.GetRawText()); + var deserializedListing = JsonSerializer.Deserialize<RawRedditListing>(jsonElement);
72-144: Refactor the 3-parameterTransformCommentsoverloads to simple delegates to eliminate dead code duplication.The 3-parameter overloads (lines 72–144) are confirmed to be unused. All three are
privatemethods that form a closed recursive chain, never called from theTransformmethod (the public entry point) or any other code in the repository. The 4-parameter versions (lines 146–218) handle all actual usage via theTransformmethod.Consolidating these into delegates is appropriate:
- private List<RedditComment> TransformComments(List<RawRedditChild> children, string subreddit = "", string postId = "") - { - var comments = new List<RedditComment>(); - - foreach (var child in children) - { - if (child.Kind == "t1") - { - var commentUrl = $"https://www.reddit.com/r/{subreddit}/comments/{postId}/comment/{child.Data.Id}/"; - var comment = new RedditComment - { - Id = child.Data.Id ?? string.Empty, - Author = child.Data.Author ?? string.Empty, - Score = child.Data.Score, - Content = child.Data.Content ?? string.Empty, - CreatedUtc = child.Data.CreatedUtc, - PostUrl = commentUrl, - Replies = TransformComments(child.Data.Replies, subreddit, postId) - }; - - comments.Add(comment); - } - } - - return comments; - } - - private List<RedditComment> TransformComments(object? replies, string subreddit = "", string postId = "") - { - if (replies == null) - return []; - - if (replies is string stringReply && stringReply == "") - return []; - - if (replies is JsonElement jsonElement) - { - if (jsonElement.ValueKind == JsonValueKind.Null) - return []; - - if (jsonElement.ValueKind == JsonValueKind.String && jsonElement.GetString() == "") - return []; - - try - { - var deserializedListing = JsonSerializer.Deserialize<RawRedditListing>(jsonElement.GetRawText()); - return TransformComments(deserializedListing, subreddit, postId); - } - catch - { - return []; - } - } - - if (replies is RawRedditListing listing) - return TransformComments(listing, subreddit, postId); - - return []; - } - - private List<RedditComment> TransformComments(RawRedditListing? replies, string subreddit = "", string postId = "") - { - if (replies == null) - return []; - - if (replies.Data == null) - return []; - - if (replies.Data.Children == null) - return []; - - return TransformComments(replies.Data.Children, subreddit, postId); - } + private List<RedditComment> TransformComments(List<RawRedditChild> children, string subreddit = "", string postId = "") + => TransformComments(children, subreddit, postId, "www.reddit.com"); + + private List<RedditComment> TransformComments(object? replies, string subreddit = "", string postId = "") + => TransformComments(replies, subreddit, postId, "www.reddit.com"); + + private List<RedditComment> TransformComments(RawRedditListing? replies, string subreddit = "", string postId = "") + => TransformComments(replies, subreddit, postId, "www.reddit.com");
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPost.cs (3)
RawRedditChild(30-37)RawRedditListing(9-16)RawRedditPost(5-7)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/IRawRedditPostTransformer.cs (1)
RedditPost(5-5)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (2)
238-243: VerifyPostUrlsemantics for link posts vs self-posts.
mainPostData.Urlreturns the external link URL for link posts (e.g.,https://example.com/article) but the Reddit permalink for self-posts. IfPostUrlis intended to always be the Reddit permalink, you may want to use thePermalinkfield or construct it similarly to comment URLs:var postUrl = $"https://{host}/r/{subreddit}/comments/{postId}/";If the current behavior (external URL for link posts, permalink for self-posts) is intentional, this is fine as-is.
Also applies to: 257-257
220-263: Transform method implementation looks solid.Good input validation with
ArgumentNullException.ThrowIfNulland meaningful exception messages. The host derivation with fallback towww.reddit.comhandles edge cases well. Required vs optional field handling is consistent.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
List(24-32)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPost.cs (3)
RedditComment(22-31)RedditPost(3-7)RedditPostContent(9-20)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPost.cs (3)
RawRedditChild(30-37)RawRedditListing(9-16)RawRedditPost(5-7)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/IRawRedditPostTransformer.cs (1)
RedditPost(5-5)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (4)
71-96: LGTM!The context propagation pattern is well-implemented. Comment URLs are correctly constructed following Reddit's standard URL structure, and the recursive call properly passes context to nested replies.
98-129: LGTM!The overload correctly handles Reddit's polymorphic reply format (null, empty string, JsonElement, or RawRedditListing). The silent exception handling at line 119 provides resilient fallback behavior, which is appropriate for parsing external API responses.
131-143: LGTM!Proper null-guarding before delegating to the list-based overload.
145-160: Clean validation and transformation logic.Input validation is thorough with clear exception messages. The object initialization is well-structured, and the null-coalescing for empty children list at line 184 handles edge cases gracefully.
Also applies to: 170-188
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (2)
11-18: Fix indentation on line 17.Line 17 has inconsistent leading whitespace (two spaces instead of proper indentation). While this doesn't affect functionality, it impacts code readability.
Apply this diff to fix the indentation:
if (options.Value == null) throw new InvalidOperationException("RedditOptions configuration is missing or not bound."); - _options = options.Value; + _options = options.Value; }
173-180: Consider using Permalink instead of Url for semantic correctness.Line 175 assigns
mainPostData.UrltopostUrl, which is later used for thePostUrlproperty (line 194). For link posts,Urlcontains the external link (e.g., imgur.com, youtube.com) rather than the Reddit post permalink. WhileRedditPostContentExtractoroverwrites this value with the actual Reddit URL (line 63), usingUrlhere is semantically incorrect and could cause issues ifTransformis called directly.Consider using the
Permalinkfield instead, which always contains the Reddit post URL:var subreddit = mainPostData.Subreddit ?? string.Empty; var postId = mainPostData.Id ?? string.Empty; -var postUrl = mainPostData.Url ?? string.Empty; +var postUrl = mainPostData.Permalink ?? string.Empty;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs(6 hunks)src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/Client/RedditPostClientTests.cs(2 hunks)tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs(3 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawRedditPostTransformerTests.cs(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawRedditPostTransformerTests.cs (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (2)
RawRedditPostTransformer(7-201)RawRedditPostTransformer(11-18)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
RedditOptions(5-33)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
RedditOptions(5-33)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (2)
RawRedditPostTransformer(7-201)RawRedditPostTransformer(11-18)
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
[grammar] ~31-~31: Use a hyphen to join words.
Context: ...reddit itself - Summaries of the highest scoring top-level posts
(QB_NEW_EN_HYPHEN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (7)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
83-155: LGTM!Context propagation for subreddit, postId, and host is implemented correctly across all TransformComments overloads. The use of the configured host (from DefaultBaseAddress) for constructing comment URLs properly resolves the previous issue where the host was incorrectly extracted from mainPostData.Url.
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md (1)
1-31: LGTM!The refactored instructions provide a clear, structured specification for Reddit subreddit summarization. The HTML-centric format with explicit link examples (lines 17-18) aligns well with the PostUrl changes introduced across the codebase.
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (2)
22-26: LGTM!The test correctly instantiates the transformer with injected configuration via
Options.Create. Using a distinct test base address (https://www.test-reddit.com) ensures test isolation.
70-70: LGTM!PostUrl assertions validate that the new property is properly populated through the extraction pipeline.
Also applies to: 144-144
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/Client/RedditPostClientTests.cs (1)
22-26: LGTM!Consistent with other test files, the transformer is now instantiated with injected configuration. The test base address ensures that comment URLs are constructed using the configured host.
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawRedditPostTransformerTests.cs (2)
10-19: LGTM!The test constructor properly initializes the transformer with test configuration, enabling controlled validation of the host-based URL generation.
126-126: LGTM!The PostUrl assertions validate that comment URLs are correctly constructed using the configured host from
RedditOptions.DefaultBaseAddress, including for nested reply chains. This confirms that context propagation works as expected throughout the transformation hierarchy.Also applies to: 135-135
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawRedditPostTransformerTests.cs (1)
119-137: Consider also asserting the main postPostUrlin the happy‑path testThe new
PostUrlassertions for the top‑level comment and its reply look correct and validate the expected Reddit URL pattern; as a small enhancement, you might also assertresult.Post.PostUrlhere to keep all primary URL expectations in a single canonical test.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawRedditPostTransformerTests.cs(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (6)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawRedditPostTransformerTests.cs (6)
1-19: Fixture construction withIOptions<RedditOptions>is clear and realisticUsing
Options.Createwith a test base address in the test fixture nicely mirrors production wiring and keeps individual tests focused on behavior rather than setup noise. Please just confirm the Options pattern here matches how the transformer is built in your real DI configuration.
450-466: Constructor guard‑rail tests for options are well targetedThe tests for null
optionsand nulloptions.Valueexercise the key failure modes for misconfiguredRedditOptionsand pin down both exception type and diagnostic text, which is valuable for catching configuration regressions early.
469-565: Good coverage of malformed post/listing and invalid base‑address scenariosThe new tests around empty post listings, null titles, and an invalid
DefaultBaseAddressprovide solid negative‑path coverage and clearly document the invariants the transformer expects from both Reddit payloads and configuration.
567-783: Image selection and HTML‑decoding tests look accurate and comprehensiveThe gallery and preview tests, including HTML‑encoded URLs and multi‑image selection by size, tightly match the expected behavior and should guard well against regressions in image URL resolution logic.
785-966: Robust handling of gallery metadata and “thumbnail only” postsTests that skip invalid gallery metadata, ignore
default/nsfwthumbnails, and treat non‑image URLs asnullimage URLs are all consistent and give a clear contract for what “valid image” means in this transformer.
968-1218: Comment/reply edge‑case tests are thorough and align with Reddit’s odd shapesThe scenarios for empty‑string replies, null listing data, null children, and non‑
t1kinds demonstrate careful handling of real‑world Reddit comment trees and ensure the transformer produces a clean, predictable comment list.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawRedditPostTransformerTests.cs (1)
22-137: Consider adding Post.PostUrl assertions.The test verifies comment PostUrl values (lines 126, 135) but doesn't assert on
result.Post.PostUrl. While ImageUrl is checked, confirming PostUrl behavior for the post itself would complete the validation coverage.Add an assertion like:
result.Post.CreatedUtc.ShouldBe(new DateTime(2025, 1, 1, 12, 0, 0, DateTimeKind.Utc)); +result.Post.PostUrl.ShouldNotBeNullOrEmpty(); result.Post.ImageUrl.ShouldBeNull();src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
21-71: Consider consistent HTML decoding across all image sources.HTML decoding is applied to gallery URLs (line 35) and preview URLs (line 49), but not to direct URLs (line 57) or thumbnails (line 67). If Reddit's API can return HTML-encoded characters in any URL field, applying
HttpUtility.HtmlDecodeconsistently would prevent potential issues.Apply decoding to all image URLs:
var directUrl = postData.UrlOverriddenByDest ?? postData.Url; if (IsImageUrl(directUrl)) { - return directUrl; + return HttpUtility.HtmlDecode(directUrl); }And for thumbnails:
if (!string.IsNullOrEmpty(postData.Thumbnail) && postData.Thumbnail != "self" && postData.Thumbnail != "default" && postData.Thumbnail != "nsfw" && IsImageUrl(postData.Thumbnail)) { - return postData.Thumbnail; + return HttpUtility.HtmlDecode(postData.Thumbnail); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs(6 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawRedditPostTransformerTests.cs(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawRedditPostTransformerTests.cs (3)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (2)
RawRedditPostTransformer(7-202)RawRedditPostTransformer(11-19)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
RedditOptions(5-33)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPost.cs (11)
RawRedditPost(5-7)RawRedditListing(9-16)RawRedditListingData(18-28)RawRedditChild(30-37)RawRedditCommentData(39-93)RawRedditGalleryData(140-144)RawRedditGalleryItem(146-154)RawRedditMediaMetadata(125-138)RawRedditImageSource(113-123)RawRedditPreview(95-102)RawRedditPreviewImage(104-111)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (8)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawRedditPostTransformerTests.cs (4)
1-19: LGTM! Constructor properly initializes options for dependency injection.The test setup correctly creates
IOptions<RedditOptions>with a configuredDefaultBaseAddress. This aligns with the transformer's new constructor signature and enables proper PostUrl generation throughout the tests.
450-467: Excellent validation coverage for null options scenarios.These tests properly verify that the transformer rejects both null options and unbound options.Value, ensuring configuration is always present at instantiation time.
527-565: Good validation test for invalid base address.This test confirms that an invalid
DefaultBaseAddressis caught during transformation when attempting to extract the host, not at construction time. This is appropriate since the validation attribute on RedditOptions would catch this during app startup in production.
568-1218: Comprehensive edge case coverage for image extraction and comment handling.The test suite thoroughly covers:
- HTML decoding for gallery and preview URLs
- URL override behavior
- Largest image selection from multiple sources
- Invalid gallery metadata filtering
- Thumbnail validation (default/nsfw/self ignored)
- Non-image URL handling
- Various reply edge cases (empty string, null data, null children, non-t1 kinds)
This level of coverage significantly improves confidence in the transformer's robustness.
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (4)
9-19: LGTM! Robust constructor validation ensures configuration is always present.The constructor properly validates both the options parameter and its Value property, throwing appropriate exceptions with clear messages. This prevents runtime errors from missing configuration.
84-109: Correct context propagation for comment URL generation.The method now receives subreddit, postId, and host parameters and constructs proper Reddit comment URLs using the configured host. This addresses the previous review concern about extracting host from post URLs (which could be external domains for link posts).
111-156: Excellent handling of polymorphic replies with context threading.These overloads gracefully handle various reply formats (null, empty string, JsonElement, RawRedditListing) while consistently propagating the context parameters throughout the transformation chain. The defensive null checks prevent errors from malformed data.
158-201: Well-structured Transform method with proper validation.The method validates input, extracts necessary data, confirms DefaultBaseAddress is a valid URI, and correctly uses the configured host for comment URLs. The PostUrl assignment on line 195 captures the post's linked content URL (which may be external for link posts), while comments receive Reddit permalink URLs.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1)
233-245: Test currently can’t detect regressions in PostUrl assignmentBy passing
urlintoCreateTestRedditPostaspostUrl, the stub’sPost.PostUrlalready equals the expected value. IfRedditPostContentExtractorever stops overridingPost.PostUrlwithwebPageUrl, this test would still pass, so it doesn’t truly guard the behavior.To ensure the test fails when the extractor stops setting
PostUrl, let the helper use its fallback URL (different fromurl) and keep asserting againsturl:@@ public async Task ExtractAsync_ValidUrl_ReturnsSerializedPostAsContent() - var url = "https://www.reddit.com/r/programming/comments/abc123/title"; - var testPost = CreateTestRedditPost("abc123", "Test Title", "https://example.com/image.jpg", url); + var url = "https://www.reddit.com/r/programming/comments/abc123/title"; + // Do not pass `url` as postUrl so we can detect that the extractor overwrites it + var testPost = CreateTestRedditPost("abc123", "Test Title", "https://example.com/image.jpg"); @@ - deserializedPost.Post.PostUrl.ShouldBe(url); + deserializedPost.Post.PostUrl.ShouldBe(url);The updated helper with optional
postUrland a sensible fallback is otherwise fine and keeps existing call sites compatible.Also applies to: 368-381
🧹 Nitpick comments (4)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (2)
40-54: Consider adding assertion forOriginalUrl.The test validates
Title,Content, andPreviewImageUrlbut doesn't assertOriginalUrl. For completeness, consider adding:result.Title.ShouldBe("Test Title"); result.Content.ShouldBe("Test Content"); result.PreviewImageUrl.ShouldBe("https://example.com/image.jpg"); + result.OriginalUrl.ShouldBe("https://original.url.com");
107-110: Consider usingwebPageUrlparameter for more realistic test behavior.The test extractors use a hardcoded URL instead of the
webPageUrlparameter. Using the parameter would better mirror production extractors and allow tests to verify URL propagation:protected override Task<UntypedExtract> CreateUntypedExtractAsync(string webPageUrl) { - return Task.FromResult(new UntypedExtract("Test Title", "Test Content", "https://original.url.com", "https://example.com/image.jpg")); + return Task.FromResult(new UntypedExtract("Test Title", "Test Content", webPageUrl, "https://example.com/image.jpg")); }This would apply similarly to the other test extractors.
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorStrategyTests.cs (1)
11-13: AlignExtractconstructor argument order with domain model
Extractis constructed elsewhere as(Title, Content, OriginalUrl, PreviewImageUrl, ExtractType), but_extractedByExtractor1and_extractedByExtractor2currently pass image URL as the third argument and the original URL as the fourth, while_extractedByDefaultExtractoruses the consistent order.Consider swapping the 3rd/4th arguments for the first two fields to avoid confusion and future copy‑paste errors:
- private readonly Extract _extractedByExtractor1 = new("Title1", "Content1", "Image1", "https://original.url.com", "Extractor1Type"); - private readonly Extract _extractedByExtractor2 = new("Title2", "Content2", "Image2", "https://original.url.com", "Extractor2Type"); - private readonly Extract _extractedByDefaultExtractor = new("DefaultTitle", "DefaultContent", "https://original.url.com", "DefaultImage", "DefaultExtractorType"); + private readonly Extract _extractedByExtractor1 = new("Title1", "Content1", "https://original.url.com", "Image1", "Extractor1Type"); + private readonly Extract _extractedByExtractor2 = new("Title2", "Content2", "https://original.url.com", "Image2", "Extractor2Type"); + private readonly Extract _extractedByDefaultExtractor = new("DefaultTitle", "DefaultContent", "https://original.url.com", "DefaultImage", "DefaultExtractorType");src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1)
11-12: Timestamped title and instance-specificOriginalUrllook consistentUsing
TimeProviderto derivedateAndTimeand then reusing it in both the title andinstanceSpecificoriginalUrlcleanly guarantees a unique, time‑scoped extract per run, and theUntypedExtractcall matches the new(Title, Content, OriginalUrl, PreviewImageUrl)shape.If you touch this again, you might consider renaming
instanceSpecificoriginalUrltoinstanceSpecificOriginalUrlfor readability, but behaviorally this looks solid.Also applies to: 47-52
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
README.md(1 hunks)src/Elzik.Breef.Application/BreefGenerator.cs(1 hunks)src/Elzik.Breef.Domain/Extract.cs(1 hunks)src/Elzik.Breef.Infrastructure/CallerFixableHttpRequestException.cs(1 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs(1 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs(2 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs(2 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs(1 hunks)tests/Elzik.Breef.Api.Tests.Functional/Breefs/BreefTestsBase.cs(1 hunks)tests/Elzik.Breef.Api.Tests.Integration/Elzik.Breef.Api.Tests.Integration.csproj(1 hunks)tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/HtmlContentExtractorTests.cs(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs(5 hunks)tests/Elzik.Breef.Infrastructure.Tests.Integration/Elzik.Breef.Infrastructure.Tests.Integration.csproj(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs(4 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorStrategyTests.cs(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs(4 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs(10 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/Elzik.Breef.Infrastructure.Tests.Unit.csproj(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs
🧰 Additional context used
🧬 Code graph analysis (6)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorStrategyTests.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs (1)
Extract(7-8)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
Task(34-74)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
RedditPost(158-201)
src/Elzik.Breef.Domain/Extract.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs (1)
Extract(7-8)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
RedditOptions(5-33)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (3)
RawRedditPostTransformer(7-202)RawRedditPostTransformer(11-19)RedditPost(158-201)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1)
SubredditContentExtractor(8-92)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/IRawRedditPostTransformer.cs (1)
RedditPost(5-5)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (1)
Task(15-35)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
Task(34-74)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (3)
Task(37-52)Task(54-58)Task(60-91)src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (2)
Task(22-35)Task(37-37)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (3)
Task(37-52)Task(54-58)Task(60-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (18)
tests/Elzik.Breef.Api.Tests.Integration/Elzik.Breef.Api.Tests.Integration.csproj (1)
20-20: ✓ Good alignment with existing dependencies.The addition of Microsoft.Extensions.TimeProvider.Testing 10.0.0 aligns well with the PR's time-based test updates and matches the version strategy of other Microsoft.Extensions packages in the file.
README.md (1)
148-159: Documentation additions are well-integrated and follow existing patterns.The new Time Zone section is clearly written, logically positioned, and maintains consistency with the existing documentation structure. The example is practical and the Wikipedia reference provides a helpful resource for users seeking additional time zone identifiers.
src/Elzik.Breef.Infrastructure/CallerFixableHttpRequestException.cs (1)
5-6: LGTM!Clean refactoring to C# 12 primary constructor syntax. The behavior remains identical—passing
messageandinnerExceptionto the baseHttpRequestExceptionconstructor.tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (1)
99-105: Collection expression forTheoryData<string[]>looks correct; just confirm toolchain/xUnit versionsThe new
return [ [] ];is functionally equivalent to returning aTheoryData<string[]>with a single row containing an emptystring[], and matches modern xUnit guidance for using collection/array-style initialization withTheoryData<T>. (xunit.net)Please just confirm:
- The test project is compiling with C# 12 / .NET 8 (so collection expressions are supported), and
- You’re on an xUnit version that supports these initializers and is not stuck on
2.9.1, which briefly had a bug withTheoryData<string[]>and arrays, fixed in2.9.2. (xunit.net)tests/Elzik.Breef.Api.Tests.Functional/Breefs/BreefTestsBase.cs (1)
47-47: No changes to this file in the current PR.The file
tests/Elzik.Breef.Api.Tests.Functional/Breefs/BreefTestsBase.cswas not modified in this PR. Line 47 already containshttp://example.comin the baseline code, so there is no URL scheme change to verify. The review comment is based on incorrect information.Likely an incorrect or invalid review comment.
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1)
213-226: Good coverage for OriginalUrl propagationThis test cleanly verifies that
ExtractAsyncpreserves the requested URL intoresult.OriginalUrl, independent of the Reddit client stub, and is a solid addition around URL handling.tests/Elzik.Breef.Infrastructure.Tests.Unit/Elzik.Breef.Infrastructure.Tests.Unit.csproj (1)
23-23: LGTM!The package addition is appropriate for enabling deterministic time-based testing with
FakeTimeProvider. Version 10.0.0 is consistent with otherMicrosoft.Extensions.*packages in this project.src/Elzik.Breef.Application/BreefGenerator.cs (1)
19-19: LGTM!Using
extract.OriginalUrlinstead of the inputurlallows extractors to provide a canonical or instance-specific URL (e.g., subreddit extracts with timestamps). This aligns well with the OriginalUrl propagation introduced across the extraction pipeline.tests/Elzik.Breef.Infrastructure.Tests.Integration/Elzik.Breef.Infrastructure.Tests.Integration.csproj (1)
28-28: LGTM!Consistent with the unit test project, this enables
FakeTimeProviderusage for deterministic time-based testing in integration tests.tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/HtmlContentExtractorTests.cs (1)
44-44: LGTM!Good addition to validate that
OriginalUrlis correctly propagated from the input URL through the extraction process.src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (1)
34-34: LGTM!The
webPageUrlis correctly passed toUntypedExtractto populate the newOriginalUrlfield. This ensures the original request URL is preserved through the extraction pipeline.src/Elzik.Breef.Domain/Extract.cs (1)
3-8: LGTM!The
OriginalUrlfield is a sensible addition to track the source URL through the domain model. Making it non-nullable is appropriate since all extractors should provide an original URL.Verification confirmed: All usages of the
Extractrecord have been properly updated. The record is instantiated exclusively throughUntypedExtract.WithType()(src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs:8), which correctly passes parameters in the new order:Title, Content, OriginalUrl, PreviewImageUrl, extractType. All extractor implementations and test cases properly constructUntypedExtractwith the required four parameters in the correct sequence.src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs (1)
5-8:UntypedExtract→Extractmapping is consistentThe new
OriginalUrlparameter onUntypedExtractand its propagation viaWithTypetoExtract(Title, Content, OriginalUrl, PreviewImageUrl, extractType)are consistent with the rest of the changes.tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1)
5-6: Good coverage of new URL-related behavior in integration testsThe updated setup—constructing
RawRedditPostTransformerfromIOptions<RedditOptions>and injecting aFakeTimeProviderintoSubredditContentExtractor—looks correct and keeps host derivation and time behavior under test control. The added assertions onresult.OriginalUrlandredditPost.Post.PostUrlnicely exercise the new URL fields without making the tests brittle against remote data shape.Also applies to: 19-20, 24-29, 44-47, 69-70, 76-77, 150-151
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (4)
1-6: Deterministic time and safe defaults for subreddit extraction testsInjecting a
FakeTimeProviderwith a fixed instant and wiring_extractorthrough it makes the new time‑dependent behavior fully deterministic, and defaultingGetNewInSubreddittonew NewInSubreddit { Posts = [] }avoids null/empty ambiguity in content assertions. This setup matches the production extractor’s expectations and keeps the tests robust.Also applies to: 17-28, 34-45
77-88: Custom/unknown Reddit instance tests correctly exerciseAllDomainslogicRe‑instantiating
SubredditContentExtractorwith updatedRedditOptions.AdditionalBaseAddressesand the shared_fakeTimeProvidercleanly validates both acceptance and rejection of non‑default Reddit hosts viaAllDomains. The wiring looks correct and mirrors the production configuration path.Also applies to: 90-92, 100-121
189-205: Title andOriginalUrlassertions accurately cover the new timestamped behaviorThe updated expectations:
result.Title.ShouldBe("New in r/subreddit as of 2015-10-21 07:28");result.Title.ShouldBe("New in r/dotnet as of 2015-10-21 07:28");result.OriginalUrl.ShouldBe($"{url}#{_fakeTimeProvider.GetLocalNow():yyyy-MM-dd HH:mm}");match the extractor’s construction logic and directly pin the contract for both the human‑readable title and the instance‑specific
OriginalUrl. UsingPosts = []in the NewInSubreddit setups keeps the focus of these tests squarely on URL/title behavior.Also applies to: 483-497, 500-515, 227-230, 540-541
568-590: HTTP mock handlers nicely encapsulate success/failure behaviorBoth
MockHttpMessageHandlerandThrowingMockHttpMessageHandlerare straightforward and make the tests’ HTTP expectations explicit (success with optional failUrl override vs. hard failure with a specific exception), without leaking that logic into the test bodies. This keeps the subreddit image URL tests clear and focused.Also applies to: 594-600
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs(2 hunks)src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md(1 hunks)src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md(1 hunks)tests/Elzik.Breef.Api.Tests.Functional/Breefs/BreefTestsDocker.cs(1 hunks)tests/Elzik.Breef.Api.Tests.Functional/DockerTestsCollection.cs(1 hunks)tests/Elzik.Breef.Api.Tests.Functional/Health/HealthTestsDocker.cs(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs(5 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorStrategyTests.cs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorStrategyTests.cs
🧰 Additional context used
🧬 Code graph analysis (2)
tests/Elzik.Breef.Api.Tests.Functional/Health/HealthTestsDocker.cs (1)
tests/Elzik.Breef.Api.Tests.Functional/Breefs/BreefTestsDocker.cs (1)
Collection(8-205)
tests/Elzik.Breef.Api.Tests.Functional/Breefs/BreefTestsDocker.cs (1)
tests/Elzik.Breef.Api.Tests.Functional/Health/HealthTestsDocker.cs (1)
Collection(8-172)
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md
[grammar] ~7-~7: Use a hyphen to join words.
Context: ...ts. ## Input Structure JSON with a top level "Post" item containing: - Metadata...
(QB_NEW_EN_HYPHEN)
[grammar] ~32-~32: Use a hyphen to join words.
Context: ...maries of the top-level post and highest scoring replies
(QB_NEW_EN_HYPHEN)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
[grammar] ~31-~31: Use a hyphen to join words.
Context: ...reddit itself - Summaries of the highest scoring top-level posts
(QB_NEW_EN_HYPHEN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (9)
tests/Elzik.Breef.Api.Tests.Functional/Breefs/BreefTestsDocker.cs (1)
8-8: Aligns Breef Docker tests with the shared serialized collectionAdding
[Collection("Docker Tests")]on Line 8 correctly groups these Docker-based functional tests into the shared, non-parallelized collection, consistent withDockerTestsCollection. No further changes needed here.tests/Elzik.Breef.Api.Tests.Functional/Health/HealthTestsDocker.cs (1)
8-8: Health Docker tests correctly added to the Docker test collectionApplying
[Collection("Docker Tests")]on Line 8 brings these tests into the same serialized Docker collection asBreefTestsDocker, which matches theDockerTestsCollectiondefinition and avoids concurrent container runs. Looks good.tests/Elzik.Breef.Api.Tests.Functional/DockerTestsCollection.cs (1)
1-13: DockerTestsCollection correctly defines a non-parallel Docker test collectionThe
DockerTestsCollectionclass with[CollectionDefinition("Docker Tests", DisableParallelization = true)]is the standard xUnit pattern for grouping and serializing Docker-based tests. The name matches the[Collection("Docker Tests")]usages, and the documentation comment is clear. No changes needed.src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md (1)
1-31: LGTM!The instruction file is well-structured with clear requirements for summarizing Reddit subreddit content. The HTML link examples correctly reference
RedditPostContent.PostUrlfor posts andRedditComment.PostUrlfor comments, aligning with the code changes in this PR.src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (2)
47-49: Consider time zone implications for "as of" timestamp.The code uses
timeProvider.GetLocalNow()to generate a timestamp for the title and URL. This means the timestamp will reflect the server's local time zone, which may vary depending on deployment environment.Consider whether:
GetUtcNow()would be more consistent across deployments- The time zone should be explicitly included in the format string (e.g., "yyyy-MM-dd HH:mm zzz")
- The current behavior is intentional for user-facing display
If local time is intended for user convenience, consider documenting this design decision.
51-51: LGTM!The updated
UntypedExtractconstructor call correctly passes the instance-specific original URL and aligns with the signature changes across the codebase.tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (3)
54-54: LGTM!The new assertion correctly validates that
OriginalUrlis preserved through the extraction pipeline, aligning with the expandedUntypedExtractconstructor signature.
110-111: LGTM!The
UntypedExtractconstructor call correctly includeswebPageUrlas the third parameter (originalUrl), ensuring consistency with the test assertion on line 54.
121-121: LGTM!All test helper classes have been consistently updated to use the four-parameter
UntypedExtractconstructor, properly passing theoriginalUrlparameter.Also applies to: 131-131, 141-141
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md
Outdated
Show resolved
Hide resolved
…tPostContent.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md (3)
7-7: Add hyphen to compound adjective "top-level".When used attributively before a noun, "top level" should be hyphenated.
-JSON with a top level "Post" item containing: +JSON with a top-level "Post" item containing:
15-15: Add hyphen to compound adjective "highest-scoring".When used attributively before a noun, "highest scoring" should be hyphenated.
-3. **Replies**: Summarise only the highest scoring replies in lower detail +3. **Replies**: Summarise only the highest-scoring replies in lower detail
32-32: Add hyphen to compound adjective "highest-scoring".When used attributively before a noun, "highest scoring" should be hyphenated.
-- Summaries of the top-level post and highest scoring replies +- Summaries of the top-level post and highest-scoring replies
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md
[grammar] ~7-~7: Use a hyphen to join words.
Context: ...ts. ## Input Structure JSON with a top level "Post" item containing: - Metadata...
(QB_NEW_EN_HYPHEN)
[grammar] ~15-~15: Use a hyphen to join words.
Context: ... Replies: Summarise only the highest scoring replies in lower detail 4. **Sum...
(QB_NEW_EN_HYPHEN)
[grammar] ~32-~32: Use a hyphen to join words.
Context: ...maries of the top-level post and highest scoring replies
(QB_NEW_EN_HYPHEN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md(1 hunks)src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md(1 hunks)src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md
- src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md
12-12: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
13-13: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
16-16: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
17-17: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
18-18: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
19-19: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md
Outdated
Show resolved
Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|



Summary by CodeRabbit
New Features
Updates
Documentation
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.