feat(seo): add data export and ranking opportunity analysis#245
feat(seo): add data export and ranking opportunity analysis#245marcusquinn merged 5 commits intomainfrom
Conversation
Add capability to export SEO data from multiple platforms (GSC, Bing,
Ahrefs, DataForSEO) to TOON format and analyze for ranking opportunities.
New scripts:
- seo-export-helper.sh: Unified router for all platforms
- seo-export-gsc.sh: Google Search Console export
- seo-export-bing.sh: Bing Webmaster Tools export
- seo-export-ahrefs.sh: Ahrefs organic keywords export
- seo-export-dataforseo.sh: DataForSEO ranked keywords export
- seo-analysis-helper.sh: Analysis engine for opportunities
Analysis types:
- Quick wins: Position 4-20 with high impressions
- Striking distance: Position 11-30 with high volume
- Low CTR: High impressions but low click-through rate
- Content cannibalization: Same query ranking with multiple URLs
New slash commands:
- /seo-export: Export data from platforms
- /seo-analyze: Run analysis on exported data
- /seo-opportunities: Export + analyze in one step
Storage: ~/.aidevops/.agent-workspace/work/seo-data/{domain}/
Format: TOON (tab-separated, token-efficient)
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the SEO capabilities by introducing a robust system for data acquisition and intelligent analysis. It allows users to seamlessly pull critical SEO performance metrics from various external services, standardize them into a custom TOON format, and then leverage an integrated analysis engine to pinpoint actionable strategies for improving search rankings and resolving content-related issues. The addition of new, intuitive slash commands simplifies the execution of these complex SEO workflows, making advanced optimization tasks more accessible and efficient for users. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughAdds a multi-platform SEO export and analysis system: new per-platform export scripts (GSC, Bing, Ahrefs, DataForSEO), an export orchestrator and analysis helper, command docs and SEO docs, plus a markdownlint rule change. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ExportHelper as seo-export-helper.sh
participant GSC as seo-export-gsc.sh
participant Bing as seo-export-bing.sh
participant Ahrefs as seo-export-ahrefs.sh
participant DataForSEO as seo-export-dataforseo.sh
participant FileSystem as Local TOON Files
User->>ExportHelper: export_all(domain, days)
ExportHelper->>ExportHelper: ensure_directories(domain)
ExportHelper->>GSC: export_gsc(domain, days)
GSC->>GSC: get_access_token()
GSC->>GSC: gsc_search_analytics(domain, date_range)
GSC->>FileSystem: Write gsc-*.toon
GSC-->>ExportHelper: status
ExportHelper->>Bing: export_bing(domain, days)
Bing->>Bing: bing_query_stats(domain, date_range)
Bing->>FileSystem: Write bing-*.toon
Bing-->>ExportHelper: status
ExportHelper->>Ahrefs: export_ahrefs(domain, days)
Ahrefs->>Ahrefs: ahrefs_organic_keywords(domain, date_range)
Ahrefs->>FileSystem: Write ahrefs-*.toon
Ahrefs-->>ExportHelper: status
ExportHelper->>DataForSEO: export_dataforseo(domain, days)
DataForSEO->>DataForSEO: dfs_ranked_keywords(domain, date_range)
DataForSEO->>FileSystem: Write dataforseo-*.toon
DataForSEO-->>ExportHelper: status
ExportHelper-->>User: Summary (exports completed, row counts, locations)
sequenceDiagram
participant User
participant AnalysisHelper as seo-analysis-helper.sh
participant TOONParser as TOON Parser
participant Analyzers as Analysis Engines
participant FileSystem as Output Files
User->>AnalysisHelper: run_full_analysis(domain)
AnalysisHelper->>FileSystem: find_latest_toon(domain, source)
AnalysisHelper->>TOONParser: parse_toon_data(file)
TOONParser-->>AnalysisHelper: parsed rows
AnalysisHelper->>Analyzers: analyze_quick_wins(domain)
Analyzers->>Analyzers: filter positions 4-20, score by impressions
Analyzers->>FileSystem: append quick-wins
AnalysisHelper->>Analyzers: analyze_striking_distance(domain)
Analyzers->>Analyzers: filter positions 11-30, score by volume
Analyzers->>FileSystem: append striking-distance
AnalysisHelper->>Analyzers: analyze_low_ctr(domain)
Analyzers->>Analyzers: filter low CTR with high impressions
Analyzers->>FileSystem: append low-ctr
AnalysisHelper->>Analyzers: analyze_cannibalization(domain)
Analyzers->>Analyzers: aggregate pages per query, detect duplicates
Analyzers->>FileSystem: append cannibalization
AnalysisHelper->>FileSystem: Write analysis-YYYY-MM-DD.toon
AnalysisHelper-->>User: Summary with top opportunities by category
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Jan 28 20:27:10 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Code Review
The pull request introduces a comprehensive set of features for SEO data export and ranking opportunity analysis. This includes new scripts for exporting data from various platforms (GSC, Bing, Ahrefs, DataForSEO) into a standardized TOON format, an analysis engine to identify quick wins, striking distance opportunities, low CTR issues, and content cannibalization, and new slash commands for easy access. The documentation for these new features is also well-integrated into the existing seo.md and new dedicated markdown files. Overall, the functionality is robust and well-documented. However, there are several areas for improvement related to code maintainability, consistency, and robustness in the bash scripts, particularly concerning shellcheck warnings, magic numbers, and API key handling.
| @@ -0,0 +1,595 @@ | |||
| #!/bin/bash | |||
| # shellcheck disable=SC2034,SC2155,SC2317,SC2329,SC2016,SC2181,SC1091,SC2154,SC2015,SC2086,SC2129,SC2030,SC2031,SC2119,SC2120,SC2001,SC2162,SC2088,SC2089,SC2090,SC2029,SC2006,SC2153 | |||
There was a problem hiding this comment.
The extensive list of shellcheck disables (SC2034,SC2155,SC2317,SC2329,SC2016,SC2181,SC1091,SC2154,SC2015,SC2086,SC2129,SC2030,SC2031,SC2119,SC2120,SC2001,SC2162,SC2088,SC2089,SC2090,SC2029,SC2006,SC2153) suggests that many potential issues are being ignored across the script. While some might be false positives or intentional, such a long list can hide real problems, reduce code quality, and makes the script harder to maintain and audit for best practices. It is recommended to address the underlying issues or selectively disable specific rules with inline comments explaining the rationale for each disablement.
This comment also applies to:
.agent/scripts/seo-export-ahrefs.sh(line 2).agent/scripts/seo-export-bing.sh(line 2).agent/scripts/seo-export-dataforseo.sh(line 2).agent/scripts/seo-export-gsc.sh(line 2).agent/scripts/seo-export-helper.sh(line 2)
| # Source shared constants | ||
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" || exit | ||
| if [[ -f "$SCRIPT_DIR/shared-constants.sh" ]]; then | ||
| source "$SCRIPT_DIR/shared-constants.sh" |
There was a problem hiding this comment.
The source command for shared-constants.sh does not include error handling for cases where the file exists but cannot be sourced (e.g., due to permissions or syntax errors within shared-constants.sh). This could lead to unexpected behavior if the shared constants are not loaded correctly. Adding || { print_error "Failed to source shared-constants.sh"; return 1; } would make the script more robust.
This comment also applies to:
.agent/scripts/seo-export-ahrefs.sh(line 17).agent/scripts/seo-export-bing.sh(line 17).agent/scripts/seo-export-dataforseo.sh(line 17).agent/scripts/seo-export-gsc.sh(line 17).agent/scripts/seo-export-helper.sh(line 17)
| source "$SCRIPT_DIR/shared-constants.sh" | |
| source "$SCRIPT_DIR/shared-constants.sh" || { print_error "Failed to source shared-constants.sh"; return 1; } |
| -v min_imp="$QUICK_WIN_MIN_IMPRESSIONS" \ | ||
| 'NF>=6 && $6>=min_pos && $6<=max_pos && $4>=min_imp { | ||
| # Calculate opportunity score: higher impressions + closer to page 1 = better | ||
| score = ($4 / 100) + ((21 - $6) * 5) |
There was a problem hiding this comment.
The numbers 100, 21, and 5 used in the score calculation for quick wins are magic numbers. Defining these as named readonly constants at the top of the script would significantly improve readability and maintainability, making it easier to understand their purpose and adjust them if needed.
| score = ($4 / 100) + ((21 - $6) * 5) | |
| # Calculate opportunity score: higher impressions + closer to page 1 = better | |
| score = ($4 / $QUICK_WIN_IMPRESSION_DIVISOR) + (($QUICK_WIN_POSITION_OFFSET - $6) * $QUICK_WIN_POSITION_MULTIPLIER) |
| volume = (NF>=7 && $7>0) ? $7 : ($4 / 10) | ||
| if (volume >= min_vol) { | ||
| # Score: volume * position proximity to page 1 | ||
| score = volume * (31 - $6) |
There was a problem hiding this comment.
The numbers 10 (for volume estimation) and 31 (for position proximity in score calculation) are magic numbers. Defining these as named readonly constants at the top of the script would enhance clarity and ease of modification, improving the script's maintainability.
| volume = (NF>=7 && $7>0) ? $7 : ($4 / 10) | |
| if (volume >= min_vol) { | |
| # Score: volume * position proximity to page 1 | |
| score = volume * (31 - $6) | |
| volume = (NF>=7 && $7>0) ? $7 : ($4 / $IMPRESSION_TO_VOLUME_RATIO) | |
| if (volume >= min_vol) { | |
| # Score: volume * position proximity to page 1 | |
| score = volume * ($STRIKING_DISTANCE_POSITION_OFFSET - $6) |
| -v min_imp="$LOW_CTR_MIN_IMPRESSIONS" \ | ||
| 'NF>=6 && $5<max_ctr && $4>=min_imp && $6<=10 { | ||
| # Potential clicks if CTR improved to 5% | ||
| potential = $4 * 0.05 |
There was a problem hiding this comment.
The number 0.05 used in the potential clicks calculation for low CTR opportunities is a magic number. Defining this as a named readonly constant (e.g., TARGET_CTR_IMPROVEMENT) would make the code more readable and maintainable, clearly indicating its significance.
| potential = $4 * 0.05 | |
| # Potential clicks if CTR improved to 5% | |
| potential = $4 * $TARGET_CTR_IMPROVEMENT |
.agent/scripts/seo-export-ahrefs.sh
Outdated
| .keyword, | ||
| .url, | ||
| .traffic, | ||
| (.volume * 10), |
There was a problem hiding this comment.
The number 10 used to estimate impressions from volume (.volume * 10) is a magic number. Defining this as a named readonly constant (e.g., VOLUME_TO_IMPRESSION_RATIO) would improve readability and maintainability, making the estimation factor explicit.
| (.volume * 10), | |
| (.volume * $VOLUME_TO_IMPRESSION_RATIO), |
| EOF | ||
|
|
||
| # Bing returns query stats and page stats separately | ||
| # We need to combine them - for now, output query stats with empty page |
There was a problem hiding this comment.
The page field in the TOON output for Bing is consistently an empty string (""). While the comment explains this is due to the API returning query stats and page stats separately, it leads to an inconsistent data structure compared to other sources that provide a URL for each query. This might require additional handling in downstream analysis scripts. Consider if there's a way to map queries to pages from the bing_page_stats response or clearly document this limitation for consumers of the TOON data.
| .keyword_data.keyword, | ||
| .ranked_serp_element.url, | ||
| (.ranked_serp_element.etv // 0), | ||
| ((.keyword_data.keyword_info.search_volume // 0) * 10), |
There was a problem hiding this comment.
The number 10 used to estimate impressions from search volume ((.keyword_data.keyword_info.search_volume // 0) * 10) is a magic number. Defining this as a named readonly constant (e.g., VOLUME_TO_IMPRESSION_RATIO) would improve readability and maintainability, making the estimation factor explicit.
| ((.keyword_data.keyword_info.search_volume // 0) * 10), | |
| ((.keyword_data.keyword_info.search_volume // 0) * $VOLUME_TO_IMPRESSION_RATIO), |
| fi | ||
|
|
||
| if [[ -n "${GOOGLE_APPLICATION_CREDENTIALS:-}" ]] && [[ -f "$GOOGLE_APPLICATION_CREDENTIALS" ]]; then | ||
| local token |
There was a problem hiding this comment.
The get_access_token function relies on the gcloud command-line tool being installed and configured to retrieve an access token. However, there's no explicit check for gcloud's presence, similar to the jq check at the end of the script. This could lead to runtime errors if gcloud is not available in the environment. Adding a dependency check for gcloud would make the script more robust.
| { | ||
| "default": true, | ||
| "MD001": false, | ||
| "MD010": false, |
There was a problem hiding this comment.
Disabling MD010, which checks for hard tabs, might be a deliberate choice for bash scripts or TOON files where tabs are often used for indentation or data separation. However, if the intention is to enforce consistent indentation across all markdown files, this rule should be enabled or specific exceptions should be documented to ensure clarity and maintain consistent style.
🤖 Augment PR SummarySummary: Adds an end-to-end SEO data export + ranking opportunity analysis workflow for the SEO agent. Changes:
Technical Notes: Exporters use curl+jq against provider APIs; analysis combines all available source files for a domain and emits an 🤖 Was this summary useful? React with 👍 or 👎 |
.agent/scripts/seo-export-helper.sh
Outdated
| local modified | ||
| modified=$(stat -f "%Sm" -t "%Y-%m-%d %H:%M" "$file" 2>/dev/null || stat -c "%y" "$file" 2>/dev/null | cut -d' ' -f1-2) | ||
| echo "$filename ($size) $modified" | ||
| ((count++)) |
There was a problem hiding this comment.
Because this script uses set -e, ((count++)) (and similar post-increment arithmetic) returns exit status 1 the first time (when the value is 0), which can terminate the script unexpectedly.
Other Locations
.agent/scripts/seo-export-helper.sh:204.agent/scripts/seo-export-helper.sh:206
🤖 Was this useful? React with 👍 or 👎
|
|
||
| # Check for TOON files | ||
| local toon_count | ||
| toon_count=$(ls "$domain_dir"/*.toon 2>/dev/null | grep -v "analysis-" | wc -l | tr -d ' ') |
.agent/scripts/seo-export-gsc.sh
Outdated
|
|
||
| # Make API request | ||
| local response | ||
| response=$(gsc_search_analytics "$site_url" "$start_date" "$end_date") || { |
| } | ||
|
|
||
| # Check if this page is already seen for this query | ||
| if (index(pages, page) == 0) { |
| return 1 | ||
| fi | ||
|
|
||
| echo -n "$DATAFORSEO_USERNAME:$DATAFORSEO_PASSWORD" | base64 |
- Check credentials before making API requests - Exit early with clear error messages when credentials missing - Handle both string and object error formats in API responses - Prevent empty exports when API calls fail
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Jan 28 20:32:44 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In @.agent/scripts/seo-analysis-helper.sh:
- Around line 270-366: The cannibalization detection in analyze_cannibalization
incorrectly uses index(pages, page) (substring matching) so URLs like /blog/seo
are treated as duplicates of /blog/seo-tools; update the AWK grouping logic
inside analyze_cannibalization to use an associative set keyed by the exact page
string (e.g., seen[page]) rather than index(), add pages to a pages_list only
when seen[page] is not set, and reset the seen map, pages_list, positions, and
page_count when query changes (ensure the same unique symbols: query, page, pos,
pages, positions, page_count, prev_query are used so the rest of the pipeline
still works).
In @.agent/scripts/seo-export-bing.sh:
- Around line 248-251: The `--days` case currently does
`days="${2:-$DEFAULT_DAYS}"; shift 2` which will fail if `--days` is the last
arg; update the `--days` case logic to explicitly check whether the next
positional parameter exists and is not another option (inspect `$2` for
emptiness or leading `-`), set `days` to `$2` when present or to `DEFAULT_DAYS`
otherwise, and only `shift 2` when a real value was consumed; if no value is
present, set `days` to the default and `shift` once to drop the `--days` token.
Reference: the `--days` case and the `days` variable in the argument-parsing
switch.
- Line 2: Replace the blanket "shellcheck disable=SC2034,SC2155,..." directive
with a focused approach: keep SC1091 globally (for expected sourced files) by
retaining "shellcheck disable=SC1091" at the top, remove the long list, and then
add targeted inline disables where needed — e.g., use "# shellcheck
disable=SC2034 - reason: variable used by external consumer" next to the
unused-variable, and "# shellcheck disable=SC2086 - reason: intentional
word-splitting/expansion" where expansions require no quoting; ensure each
inline disable includes a brief explanatory comment.
- Around line 73-75: The bing_request() curl call currently leaves stderr
unredirected and always returns 0, risking API key exposure and masking
failures; modify the curl invocation in bing_request() to redirect stderr to
/dev/null (or a safe log), use --write-out to capture HTTP status or a short
error token (not the full URL), check the curl exit code and HTTP status, and
propagate failures by returning a non-zero exit code when curl fails or the HTTP
status is not successful; also ensure no raw URL containing apikey is printed to
stdout/stderr or logs.
In @.agent/scripts/seo-export-dataforseo.sh:
- Around line 50-62: The get_auth_header function currently returns base64
output that includes a trailing newline which breaks the Authorization header;
change it to trim trailing newlines/whitespace before output. Specifically,
capture the base64 result from echo -n
"$DATAFORSEO_USERNAME:$DATAFORSEO_PASSWORD" | base64 into a variable and emit it
without any trailing newline or whitespace (e.g., use printf '%s' "$encoded" or
pipe through tr -d '\n') so the Authorization header interpolation receives a
clean value; keep the same error checks in get_auth_header and preserve its
return behavior.
In @.agent/scripts/seo-export-gsc.sh:
- Around line 149-214: In export_gsc, gsc_search_analytics("$site_url"... ) may
return a successful exit code but an error JSON, so after the first call inspect
the response for .error and, if present, set site_url="https://$domain/" and
call gsc_search_analytics again to retry; then re-check the second response for
.error and fail only if both attempts produced error JSON. Update the logic
around the first response variable (response) and the subsequent .error checks
so the sc-domain -> https fallback happens on API-error payloads as well as
non-zero exits.
In @.agent/scripts/seo-export-helper.sh:
- Around line 149-150: Replace the unsafe arithmetic post-increment expressions
that can return non-zero under set -e (specifically the occurrences of
((count++)), ((success_count++)), and ((fail_count++))) with POSIX-safe
arithmetic assignments like count=$((count + 1)), success_count=$((success_count
+ 1)), and fail_count=$((fail_count + 1)) so the increment never produces a
failing exit status; update each place where those increment expressions appear
(e.g., the echo block and the other increment sites) to use the safe form.
- Around line 289-292: The --days option block currently sets
days="${2:-$DEFAULT_DAYS}" and unconditionally does shift 2 which breaks if
--days is the last arg; update the --days handling to detect whether a next
positional argument exists and is not another flag before consuming it: if a
next arg is present and not a flag, assign it to days and shift 2, otherwise set
days="$DEFAULT_DAYS" and shift 1. Modify the block that references days,
DEFAULT_DAYS and the shift to implement this guard so shift 2 is only used when
a value is actually consumed.
🧹 Nitpick comments (12)
.markdownlint.json (1)
4-4: Prefer scoping MD010 instead of disabling globally to maintain A-grade lint standards.MD010 (
no-hard-tabs) can be scoped to allow tabs in code blocks (useful for Makefiles and tab-sensitive languages) while preventing them in prose.♻️ Recommended configuration
- "MD010": false, + "MD010": { + "code_blocks": false + },This keeps hard tabs out of markdown prose while permitting them in code fences—balancing automation quality with pragmatism.
.agent/scripts/seo-export-gsc.sh (2)
72-86: Add curl timeouts/retries to prevent hanging exports.Line 80 runs
curlwithout timeouts or retries, so a stalled connection can hang the automation. Consider adding retry/timeout flags while preserving the JSON response.As per coding guidelines: .agent/scripts/*.sh: Automation scripts - focus on Reliability and robustness.♻️ Suggested hardening
- curl -s -X POST \ + curl -sS --retry 3 --retry-delay 2 --connect-timeout 10 --max-time 60 -X POST \ "https://searchconsole.googleapis.com/webmasters/v3/$endpoint" \ -H "Authorization: Bearer $token" \ -H "Content-Type: application/json" \ -d "$data"
256-297: Validate--daysto avoiddatefailures.Line 263 accepts any string; with
set -ean invalid value will abort mid‑run. Enforce a positive integer before callingexport_gsc.🧱 Input validation
if [[ -z "$domain" ]]; then print_error "Domain is required" echo "Usage: seo-export-gsc.sh <domain> [--days N]" return 1 fi + + if ! [[ "$days" =~ ^[0-9]+$ ]] || (( days <= 0 )); then + print_error "Days must be a positive integer" + return 1 + fi.agent/scripts/seo-export-ahrefs.sh (2)
65-77: Hardencurlwith timeouts/retries.Line 72 uses
curlwithout timeouts or retries. Add retry/timeout flags to keep the export from hanging on transient network issues.As per coding guidelines: .agent/scripts/*.sh: Automation scripts - focus on Reliability and robustness.♻️ Suggested hardening
- curl -s -X GET \ + curl -sS --retry 3 --retry-delay 2 --connect-timeout 10 --max-time 60 -X GET \ "$AHREFS_API_BASE/$endpoint?$params" \ -H "Authorization: Bearer $api_key" \ -H "Accept: application/json"
253-299: Validate--daysto avoiddatefailures.Line 261 accepts any string; invalid values will cause
dateto error underset -e. Enforce a positive integer before callingexport_ahrefs.🧱 Input validation
if [[ -z "$domain" ]]; then print_error "Domain is required" echo "Usage: seo-export-ahrefs.sh <domain> [--days N] [--country CODE]" return 1 fi + + if ! [[ "$days" =~ ^[0-9]+$ ]] || (( days <= 0 )); then + print_error "Days must be a positive integer" + return 1 + fi.agent/scripts/seo-export-dataforseo.sh (2)
65-78: Add timeouts/retries to protect against stalled requests.Line 72 uses
curlwithout timeouts or retries. Adding retry/timeout flags improves resilience for automation usage.As per coding guidelines: .agent/scripts/*.sh: Automation scripts - focus on Reliability and robustness.♻️ Suggested hardening
- curl -s -X POST \ + curl -sS --retry 3 --retry-delay 2 --connect-timeout 10 --max-time 60 -X POST \ "$DFS_API_BASE/$endpoint" \ -H "Authorization: Basic $auth" \ -H "Content-Type: application/json" \ -d "$data"
283-334: Validate--daysto avoiddatefailures.Line 292 accepts any string; invalid values will cause
dateerrors underset -e. Enforce a positive integer before callingexport_dataforseo.🧱 Input validation
if [[ -z "$domain" ]]; then print_error "Domain is required" echo "Usage: seo-export-dataforseo.sh <domain> [--days N] [--location CODE]" return 1 fi + + if ! [[ "$days" =~ ^[0-9]+$ ]] || (( days <= 0 )); then + print_error "Days must be a positive integer" + return 1 + fi.agent/scripts/seo-analysis-helper.sh (1)
369-416: Avoidls | grepfor TOON counting.Line 381 uses
ls | grep, which breaks on spaces and can miscount. Inset -euo pipefailit can also abort unexpectedly. Prefer a glob + loop to count files safely.As per coding guidelines: .agent/scripts/*.sh: Automation scripts - focus on Reliability and robustness.♻️ Safer counting via glob/loop
- toon_count=$(ls "$domain_dir"/*.toon 2>/dev/null | grep -v "analysis-" | wc -l | tr -d ' ') + local toon_count=0 + for toon_file in "$domain_dir"/*.toon; do + [[ -f "$toon_file" ]] || continue + [[ "$(basename "$toon_file")" == analysis-* ]] && continue + ((toon_count++)) + done.agent/scripts/commands/seo-analyze.md (1)
11-71: Trim inline walkthroughs; point to authoritative helpers.This doc includes extended inline usage/process snippets. For progressive disclosure, keep a brief purpose/prereqs section and point to
file:.agent/scripts/seo-analysis-helper.sh:520-590(CLI) andseo/ranking-opportunities.mdfor details instead of embedding long code blocks.
As per coding guidelines: Apply progressive disclosure pattern by using pointers to subagents rather than including inline content in agent documentation; Include code examples only when authoritative; use file:line references to point to actual implementation instead of inline code snippets..agent/seo/ranking-opportunities.md (1)
17-233: Reduce inline examples; reference helper implementations.This doc contains large inline command/output examples. For progressive disclosure, keep the high‑level criteria and refer readers to
file:.agent/scripts/seo-analysis-helper.sh(analysis logic) and.agent/scripts/commands/seo-analyze.mdfor usage, rather than embedding long outputs and workflows here.
As per coding guidelines: Apply progressive disclosure pattern by using pointers to subagents rather than including inline content in agent documentation; Include code examples only when authoritative; use file:line references to point to actual implementation instead of inline code snippets..agent/scripts/seo-export-helper.sh (2)
2-2: Same shellcheck disable concern as other scripts.Consider a shared approach: either a
.shellcheckrcfile for project-wide configuration, or documenting why these specific rules need disabling across the SEO export scripts.
74-89: Auto-chmod is helpful but worth logging.The script silently makes platform scripts executable. Consider adding a
print_infomessage when this happens for transparency.💡 Optional: Add logging for chmod
if [[ ! -x "$script_path" ]]; then + print_info "Making $platform script executable" chmod +x "$script_path" fi
| analyze_cannibalization() { | ||
| local domain="$1" | ||
| local domain_dir="$SEO_DATA_DIR/$domain" | ||
| local output_file="$2" | ||
|
|
||
| print_header "Content Cannibalization Analysis" | ||
| print_info "Finding queries with multiple ranking URLs" | ||
| echo "" | ||
|
|
||
| local temp_file | ||
| temp_file=$(mktemp) | ||
| local query_pages | ||
| query_pages=$(mktemp) | ||
|
|
||
| # Collect all query-page pairs | ||
| for toon_file in "$domain_dir"/*.toon; do | ||
| [[ -f "$toon_file" ]] || continue | ||
| [[ "$(basename "$toon_file")" == analysis-* ]] && continue | ||
|
|
||
| local source | ||
| source=$(get_toon_meta "$toon_file" "source") | ||
|
|
||
| parse_toon_data "$toon_file" | awk -F'\t' -v src="$source" \ | ||
| 'NF>=6 && $2!="" { | ||
| print tolower($1) "\t" $2 "\t" $6 "\t" $4 "\t" src | ||
| }' >> "$query_pages" | ||
| done | ||
|
|
||
| # Find queries with multiple pages | ||
| if [[ -s "$query_pages" ]]; then | ||
| # Group by query, find those with multiple unique pages | ||
| sort -t$'\t' -k1,1 "$query_pages" | awk -F'\t' ' | ||
| { | ||
| query = $1 | ||
| page = $2 | ||
| pos = $3 | ||
| imp = $4 | ||
| src = $5 | ||
|
|
||
| if (query != prev_query && prev_query != "") { | ||
| if (page_count > 1) { | ||
| # Output cannibalization | ||
| print prev_query "\t" pages "\t" positions "\t" page_count | ||
| } | ||
| pages = "" | ||
| positions = "" | ||
| page_count = 0 | ||
| } | ||
|
|
||
| # Check if this page is already seen for this query | ||
| if (index(pages, page) == 0) { | ||
| if (pages != "") { | ||
| pages = pages "," page | ||
| positions = positions "," pos | ||
| } else { | ||
| pages = page | ||
| positions = pos | ||
| } | ||
| page_count++ | ||
| } | ||
|
|
||
| prev_query = query | ||
| } | ||
| END { | ||
| if (page_count > 1) { | ||
| print prev_query "\t" pages "\t" positions "\t" page_count | ||
| } | ||
| }' > "$temp_file" | ||
|
|
||
| if [[ -s "$temp_file" ]]; then | ||
| echo "" >> "$output_file" | ||
| echo "# Content Cannibalization" >> "$output_file" | ||
| echo "query pages positions page_count" >> "$output_file" | ||
| sort -t$'\t' -k4 -rn "$temp_file" | head -50 >> "$output_file" | ||
|
|
||
| local count | ||
| count=$(wc -l < "$temp_file" | tr -d ' ') | ||
| print_success "Found $count cannibalized queries" | ||
|
|
||
| echo "" | ||
| echo "Top 10 Cannibalized Queries:" | ||
| echo "Query | # Pages | Positions" | ||
| echo "------|---------|----------" | ||
| sort -t$'\t' -k4 -rn "$temp_file" | head -10 | while IFS=$'\t' read -r query pages positions page_count; do | ||
| printf "%.40s | %d | %s\n" "$query" "$page_count" "$positions" | ||
| done | ||
| else | ||
| print_warning "No content cannibalization detected" | ||
| fi | ||
| else | ||
| print_warning "No data available for cannibalization analysis" | ||
| fi | ||
|
|
||
| rm -f "$temp_file" "$query_pages" | ||
| echo "" | ||
| return 0 | ||
| } |
There was a problem hiding this comment.
URL deduping uses substring matching, causing false negatives.
Line 320 uses index(pages, page) which treats /blog/seo as already seen when /blog/seo-tools exists. This can hide real cannibalization. Track exact URLs with an associative set per query.
🐛 Safer dedupe with per‑query sets
- sort -t$'\t' -k1,1 "$query_pages" | awk -F'\t' '
+ sort -t$'\t' -k1,1 "$query_pages" | awk -F'\t' '
{
query = $1
page = $2
pos = $3
imp = $4
src = $5
if (query != prev_query && prev_query != "") {
if (page_count > 1) {
# Output cannibalization
print prev_query "\t" pages "\t" positions "\t" page_count
}
pages = ""
positions = ""
page_count = 0
+ delete seen
}
# Check if this page is already seen for this query
- if (index(pages, page) == 0) {
+ if (!(page in seen)) {
+ seen[page] = 1
if (pages != "") {
pages = pages "," page
positions = positions "," pos
} else {
pages = page
positions = pos📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| analyze_cannibalization() { | |
| local domain="$1" | |
| local domain_dir="$SEO_DATA_DIR/$domain" | |
| local output_file="$2" | |
| print_header "Content Cannibalization Analysis" | |
| print_info "Finding queries with multiple ranking URLs" | |
| echo "" | |
| local temp_file | |
| temp_file=$(mktemp) | |
| local query_pages | |
| query_pages=$(mktemp) | |
| # Collect all query-page pairs | |
| for toon_file in "$domain_dir"/*.toon; do | |
| [[ -f "$toon_file" ]] || continue | |
| [[ "$(basename "$toon_file")" == analysis-* ]] && continue | |
| local source | |
| source=$(get_toon_meta "$toon_file" "source") | |
| parse_toon_data "$toon_file" | awk -F'\t' -v src="$source" \ | |
| 'NF>=6 && $2!="" { | |
| print tolower($1) "\t" $2 "\t" $6 "\t" $4 "\t" src | |
| }' >> "$query_pages" | |
| done | |
| # Find queries with multiple pages | |
| if [[ -s "$query_pages" ]]; then | |
| # Group by query, find those with multiple unique pages | |
| sort -t$'\t' -k1,1 "$query_pages" | awk -F'\t' ' | |
| { | |
| query = $1 | |
| page = $2 | |
| pos = $3 | |
| imp = $4 | |
| src = $5 | |
| if (query != prev_query && prev_query != "") { | |
| if (page_count > 1) { | |
| # Output cannibalization | |
| print prev_query "\t" pages "\t" positions "\t" page_count | |
| } | |
| pages = "" | |
| positions = "" | |
| page_count = 0 | |
| } | |
| # Check if this page is already seen for this query | |
| if (index(pages, page) == 0) { | |
| if (pages != "") { | |
| pages = pages "," page | |
| positions = positions "," pos | |
| } else { | |
| pages = page | |
| positions = pos | |
| } | |
| page_count++ | |
| } | |
| prev_query = query | |
| } | |
| END { | |
| if (page_count > 1) { | |
| print prev_query "\t" pages "\t" positions "\t" page_count | |
| } | |
| }' > "$temp_file" | |
| if [[ -s "$temp_file" ]]; then | |
| echo "" >> "$output_file" | |
| echo "# Content Cannibalization" >> "$output_file" | |
| echo "query pages positions page_count" >> "$output_file" | |
| sort -t$'\t' -k4 -rn "$temp_file" | head -50 >> "$output_file" | |
| local count | |
| count=$(wc -l < "$temp_file" | tr -d ' ') | |
| print_success "Found $count cannibalized queries" | |
| echo "" | |
| echo "Top 10 Cannibalized Queries:" | |
| echo "Query | # Pages | Positions" | |
| echo "------|---------|----------" | |
| sort -t$'\t' -k4 -rn "$temp_file" | head -10 | while IFS=$'\t' read -r query pages positions page_count; do | |
| printf "%.40s | %d | %s\n" "$query" "$page_count" "$positions" | |
| done | |
| else | |
| print_warning "No content cannibalization detected" | |
| fi | |
| else | |
| print_warning "No data available for cannibalization analysis" | |
| fi | |
| rm -f "$temp_file" "$query_pages" | |
| echo "" | |
| return 0 | |
| } | |
| analyze_cannibalization() { | |
| local domain="$1" | |
| local domain_dir="$SEO_DATA_DIR/$domain" | |
| local output_file="$2" | |
| print_header "Content Cannibalization Analysis" | |
| print_info "Finding queries with multiple ranking URLs" | |
| echo "" | |
| local temp_file | |
| temp_file=$(mktemp) | |
| local query_pages | |
| query_pages=$(mktemp) | |
| # Collect all query-page pairs | |
| for toon_file in "$domain_dir"/*.toon; do | |
| [[ -f "$toon_file" ]] || continue | |
| [[ "$(basename "$toon_file")" == analysis-* ]] && continue | |
| local source | |
| source=$(get_toon_meta "$toon_file" "source") | |
| parse_toon_data "$toon_file" | awk -F'\t' -v src="$source" \ | |
| 'NF>=6 && $2!="" { | |
| print tolower($1) "\t" $2 "\t" $6 "\t" $4 "\t" src | |
| }' >> "$query_pages" | |
| done | |
| # Find queries with multiple pages | |
| if [[ -s "$query_pages" ]]; then | |
| # Group by query, find those with multiple unique pages | |
| sort -t$'\t' -k1,1 "$query_pages" | awk -F'\t' ' | |
| { | |
| query = $1 | |
| page = $2 | |
| pos = $3 | |
| imp = $4 | |
| src = $5 | |
| if (query != prev_query && prev_query != "") { | |
| if (page_count > 1) { | |
| # Output cannibalization | |
| print prev_query "\t" pages "\t" positions "\t" page_count | |
| } | |
| pages = "" | |
| positions = "" | |
| page_count = 0 | |
| delete seen | |
| } | |
| # Check if this page is already seen for this query | |
| if (!(page in seen)) { | |
| seen[page] = 1 | |
| if (pages != "") { | |
| pages = pages "," page | |
| positions = positions "," pos | |
| } else { | |
| pages = page | |
| positions = pos | |
| } | |
| page_count++ | |
| } | |
| prev_query = query | |
| } | |
| END { | |
| if (page_count > 1) { | |
| print prev_query "\t" pages "\t" positions "\t" page_count | |
| } | |
| }' > "$temp_file" | |
| if [[ -s "$temp_file" ]]; then | |
| echo "" >> "$output_file" | |
| echo "# Content Cannibalization" >> "$output_file" | |
| echo "query pages positions page_count" >> "$output_file" | |
| sort -t$'\t' -k4 -rn "$temp_file" | head -50 >> "$output_file" | |
| local count | |
| count=$(wc -l < "$temp_file" | tr -d ' ') | |
| print_success "Found $count cannibalized queries" | |
| echo "" | |
| echo "Top 10 Cannibalized Queries:" | |
| echo "Query | # Pages | Positions" | |
| echo "------|---------|----------" | |
| sort -t$'\t' -k4 -rn "$temp_file" | head -10 | while IFS=$'\t' read -r query pages positions page_count; do | |
| printf "%.40s | %d | %s\n" "$query" "$page_count" "$positions" | |
| done | |
| else | |
| print_warning "No content cannibalization detected" | |
| fi | |
| else | |
| print_warning "No data available for cannibalization analysis" | |
| fi | |
| rm -f "$temp_file" "$query_pages" | |
| echo "" | |
| return 0 | |
| } |
🤖 Prompt for AI Agents
In @.agent/scripts/seo-analysis-helper.sh around lines 270 - 366, The
cannibalization detection in analyze_cannibalization incorrectly uses
index(pages, page) (substring matching) so URLs like /blog/seo are treated as
duplicates of /blog/seo-tools; update the AWK grouping logic inside
analyze_cannibalization to use an associative set keyed by the exact page string
(e.g., seen[page]) rather than index(), add pages to a pages_list only when
seen[page] is not set, and reset the seen map, pages_list, positions, and
page_count when query changes (ensure the same unique symbols: query, page, pos,
pages, positions, page_count, prev_query are used so the rest of the pipeline
still works).
| @@ -0,0 +1,285 @@ | |||
| #!/bin/bash | |||
| # shellcheck disable=SC2034,SC2155,SC2317,SC2329,SC2016,SC2181,SC1091,SC2154,SC2015,SC2086,SC2129,SC2030,SC2031,SC2119,SC2120,SC2001,SC2162,SC2088,SC2089,SC2090,SC2029,SC2006,SC2153 | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's examine the actual file
echo "=== File content (first 30 lines) ==="
head -30 .agent/scripts/seo-export-bing.sh
echo ""
echo "=== Total line count ==="
wc -l .agent/scripts/seo-export-bing.sh
echo ""
echo "=== Running shellcheck without the disable line ==="
# Create a temporary file without the disable directive
sed '2d' .agent/scripts/seo-export-bing.sh > /tmp/seo-export-bing-check.sh
chmod +x /tmp/seo-export-bing-check.sh
# Run shellcheck and show all violations
if command -v shellcheck &> /dev/null; then
shellcheck -f gcc /tmp/seo-export-bing-check.sh 2>&1 | head -100
else
echo "shellcheck not found, checking with alternative approach"
fi
rm -f /tmp/seo-export-bing-check.sh
echo ""
echo "=== Analyzing the disabled rules ==="
# Extract the disabled rules from line 2
echo "Disabled rules in the file:"
grep "shellcheck disable" .agent/scripts/seo-export-bing.sh | sed 's/.*disable=//' | tr ',' '\n' | sort | uniqRepository: marcusquinn/aidevops
Length of output: 1947
Remove overly broad shellcheck disables; 20 of 23 disabled rules don't trigger.
Verification shows only 3 rules actually trigger: SC1091 (necessary for sourced files), SC2034 (unused variable), and SC2086 (quoting). The remaining 20 disabled rules add no value and obscure potential issues.
Refactor to use only necessary disables:
- Keep SC1091 (external sourced files are expected)
- Replace the blanket disable with targeted, inline disables for SC2034 and SC2086 only, with comments explaining why each is necessary
This aligns with the A-grade quality standard of addressing issues directly rather than suppressing them wholesale.
🤖 Prompt for AI Agents
In @.agent/scripts/seo-export-bing.sh at line 2, Replace the blanket "shellcheck
disable=SC2034,SC2155,..." directive with a focused approach: keep SC1091
globally (for expected sourced files) by retaining "shellcheck disable=SC1091"
at the top, remove the long list, and then add targeted inline disables where
needed — e.g., use "# shellcheck disable=SC2034 - reason: variable used by
external consumer" next to the unused-variable, and "# shellcheck disable=SC2086
- reason: intentional word-splitting/expansion" where expansions require no
quoting; ensure each inline disable includes a brief explanatory comment.
| curl -s -X GET \ | ||
| "https://ssl.bing.com/webmaster/api.svc/json/$endpoint?siteUrl=$encoded_url&apikey=$api_key" | ||
| return 0 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Bing Webmaster Tools API authentication method apikey
💡 Result:
Bing Webmaster Tools APIs support API key (“apikey”) authentication (in addition to OAuth 2.0, which Microsoft labels as recommended). (learn.microsoft.com)
How API key auth works
- You generate one API key per user in Bing Webmaster Tools: Settings → API Access → API Key → Generate API Key. (learn.microsoft.com)
- You pass it as a query-string parameter named
apikeyon the API endpoint, e.g.:.../SubmitUrl?apikey=YOUR_KEY(blogs.bing.com).../GetUrlSubmissionQuota?siteUrl=...&apikey=YOUR_KEY(blogs.bing.com)
So the authentication method is simply: append ?apikey=... (or &apikey=...) to the request URL. (blogs.bing.com)
Citations:
- 1: https://learn.microsoft.com/en-us/bingwebmaster/getting-access?utm_source=openai
- 2: https://learn.microsoft.com/en-us/bingwebmaster/getting-access?utm_source=openai
- 3: https://blogs.bing.com/webmaster/november-2019/Accessing-Bing-webmaster-tools-api-using-cURL?utm_source=openai
- 4: https://blogs.bing.com/webmaster/november-2019/Accessing-Bing-webmaster-tools-api-using-cURL?utm_source=openai
- 5: https://blogs.bing.com/webmaster/november-2019/Accessing-Bing-webmaster-tools-api-using-cURL?utm_source=openai
🏁 Script executed:
# First, let's examine the full script context around the flagged lines
cat -n .agent/scripts/seo-export-bing.sh | head -100Repository: marcusquinn/aidevops
Length of output: 3910
🏁 Script executed:
# Check the script beyond line 75 to see error handling
sed -n '73,120p' .agent/scripts/seo-export-bing.shRepository: marcusquinn/aidevops
Length of output: 1290
Add error handling and stderr redirection to prevent API key exposure via curl errors.
The Bing Webmaster Tools API requires apikey in the query string (confirmed as the only authentication method). However, the current implementation lacks protection against accidental key exposure:
curl -sonly suppresses progress output—stderr and error messages are NOT suppressed- If curl fails, error output (potentially containing the URL with apikey) flows directly to stdout/stderr unfiltered
bing_request()always returns 0 regardless of curl success, masking failures
Redirect stderr, validate curl exit code, and consider using --write-out for error diagnosis without exposing the full URL:
curl -s -X GET \
"https://ssl.bing.com/webmaster/api.svc/json/$endpoint?siteUrl=$encoded_url&apikey=$api_key" \
2>/dev/null || return 1
Per automation script guidelines, failures should propagate with proper exit codes for reliable error recovery.
🤖 Prompt for AI Agents
In @.agent/scripts/seo-export-bing.sh around lines 73 - 75, The bing_request()
curl call currently leaves stderr unredirected and always returns 0, risking API
key exposure and masking failures; modify the curl invocation in bing_request()
to redirect stderr to /dev/null (or a safe log), use --write-out to capture HTTP
status or a short error token (not the full URL), check the curl exit code and
HTTP status, and propagate failures by returning a non-zero exit code when curl
fails or the HTTP status is not successful; also ensure no raw URL containing
apikey is printed to stdout/stderr or logs.
| --days) | ||
| days="${2:-$DEFAULT_DAYS}" | ||
| shift 2 | ||
| ;; |
There was a problem hiding this comment.
Edge case: --days without value causes shift error.
If a user passes --days as the last argument without a value, shift 2 will fail under set -e. The ${2:-$DEFAULT_DAYS} fallback helps, but shift 2 still attempts to shift two positions.
🔧 Proposed fix for safer argument handling
--days)
- days="${2:-$DEFAULT_DAYS}"
- shift 2
+ if [[ -n "${2:-}" && ! "$2" =~ ^- ]]; then
+ days="$2"
+ shift 2
+ else
+ days="$DEFAULT_DAYS"
+ shift
+ fi
;;🤖 Prompt for AI Agents
In @.agent/scripts/seo-export-bing.sh around lines 248 - 251, The `--days` case
currently does `days="${2:-$DEFAULT_DAYS}"; shift 2` which will fail if `--days`
is the last arg; update the `--days` case logic to explicitly check whether the
next positional parameter exists and is not another option (inspect `$2` for
emptiness or leading `-`), set `days` to `$2` when present or to `DEFAULT_DAYS`
otherwise, and only `shift 2` when a real value was consumed; if no value is
present, set `days` to the default and `shift` once to drop the `--days` token.
Reference: the `--days` case and the `days` variable in the argument-parsing
switch.
- Replace ((count++)) with count=$((count + 1)) to avoid set -e issues - Use find instead of ls glob to handle empty directories safely - Add -w0 flag to base64 for GNU compatibility (prevents line wrapping) - Fix URL substring matching in cannibalization detection using delimiters
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Jan 28 20:43:12 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
- Add validation for --days, --country, --location, --language flags - Prevent shift errors when flags are passed without values - Clean up redundant default value assignments after validation
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Jan 28 20:48:44 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
- Add scoring constants for quick wins and striking distance calculations - Add IMPRESSION_VOLUME_MULTIPLIER for Ahrefs and DataForSEO - Add TARGET_CTR_IMPROVEMENT for low CTR potential calculation - Pass constants to awk/jq via variables for clarity
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Jan 28 21:02:20 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |



Summary
/seo-export,/seo-analyze,/seo-opportunitiesNew Scripts
seo-export-helper.shseo-export-gsc.shseo-export-bing.shseo-export-ahrefs.shseo-export-dataforseo.shseo-analysis-helper.shAnalysis Types
Usage
Storage
Data stored in TOON format at:
Documentation
seo/data-export.md- Export documentationseo/ranking-opportunities.md- Analysis documentationSummary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.