-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crawler skipping downloading components #87
base: master
Are you sure you want to change the base?
Conversation
Review or Edit in CodeSandboxOpen the branch in Web Editor • VS Code • Insiders |
WalkthroughThe changes in this pull request primarily focus on the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Script
participant URL
User->>Script: Execute script
Script->>URL: Retrieve URL
URL-->>Script: Provide URL
Script->>Script: Process URL
Script->>Script: Derive local path
Script->>Script: Extract component name
Script-->>User: Output results
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
index.mjs (1)
243-246
: Consider using URL API for more robust path handlingWhile the current fix addresses the immediate path issue, consider using the URL API for more robust path handling:
- // Remove the rootUrl from the url to get the local path - const localOutput = url.replace('https://tailwindui.com', '') - const path = `${localOutput}/${filename}` + const urlObj = new URL(url) + const localOutput = urlObj.pathname + const path = _path.join(localOutput, filename)This approach would:
- Handle URL encoding/decoding automatically
- Use the existing
rootUrl
constant instead of hardcoding- Use proper path joining for cross-platform compatibility
// Example URL: https://tailwindui.com/components/application-ui/navigation/navbars | ||
// Parse out "application-ui" | "marketing" | "ecommerce" from the URL to ensure we only download the components we want | ||
const component = url.split('/')[4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
The current implementation uses index [4] to extract the component type from URLs like "https://tailwindui.com/components/application-ui/navigation/navbars", but this is incorrect:
0: "https:"
1: ""
2: "tailwindui.com"
3: "components"
4: "application-ui" // Current implementation uses this
The code should be using index [3] since "components" is at index [3], not the component type. This contradicts the comment's intention to extract "application-ui", "marketing", or "ecommerce".
🔗 Analysis chain
LGTM! Verify component filtering across all URL patterns
The fix correctly extracts the component type from the URL structure. The added comment clearly documents the expected URL format.
Let's verify that all component URLs in the codebase follow this structure:
Let me verify the URL patterns more thoroughly by checking the actual code context where these URLs are used.
Let me check the actual implementation to verify if the index [4] is correct for the URL structure.
Let me fix the regex and try again.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify URL patterns for component extraction
# Expected format: /components/<component-type>/...
# Search for all component URLs in the codebase
rg -o '/components/[^"'\'')\s]+' | while read -r url; do
# Extract and verify the component type (should be application-ui, marketing, or ecommerce)
component_type=$(echo "$url" | cut -d'/' -f3)
echo "URL: $url"
echo "Component type: $component_type"
case "$component_type" in
"application-ui"|"marketing"|"ecommerce")
echo "✓ Valid component type"
;;
*)
echo "⚠ Unexpected component type"
;;
esac
echo "---"
done
Length of output: 473
Script:
#!/bin/bash
# Get more context around the URL patterns
rg -B 3 -A 3 "components/[^\"'\s)]+"
Length of output: 538
Script:
#!/bin/bash
# Get the full URL validation and parsing logic
rg -B 5 -A 10 "url\.split\('\/'\)\[4\]"
Length of output: 150
Script:
#!/bin/bash
# Get the full URL validation and parsing logic
rg -B 5 -A 10 'split.*\[4\]'
Length of output: 855
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AI suggestion is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyw8, thank you for pointing that out. I apologize for the mistake in my previous suggestion. I'll make sure to verify the URL parsing logic more carefully in future reviews.
✏️ Learnings added
Learnt from: andyw8
PR: kiliman/tailwindui-crawler#87
File: index.mjs:445-447
Timestamp: 2024-12-01T16:54:57.688Z
Learning: In `index.mjs`, when parsing component URLs like `https://tailwindui.com/components/application-ui/navigation/navbars`, the correct index to extract the component type (e.g., "application-ui", "marketing", "ecommerce") from `url.split('/')` is `[4]`.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Fixes #86
There are a couple of issues here
./output/components/htmlhttps://tailwindui.com/components/application-ui/navigation/navbars
when passing thispath
const.[4]
instead of[2]
here is what we need to parse the correct segment.Summary by CodeRabbit
New Features
Bug Fixes