Skip to content

feat: integrate korean book metadata and UI citations#10

Merged
SanghunYun95 merged 6 commits intomainfrom
feat/book-metadata
Mar 3, 2026
Merged

feat: integrate korean book metadata and UI citations#10
SanghunYun95 merged 6 commits intomainfrom
feat/book-metadata

Conversation

@SanghunYun95
Copy link
Copy Markdown
Owner

@SanghunYun95 SanghunYun95 commented Mar 2, 2026

Summary by CodeRabbit

  • 새로운 기능
    • 다중 Gemini API 키 지원 추가 — 키 묶음 로드, 기본 키와 대체 키(Fallback) 사용으로 자동 키 로테이션 지원
    • .env에서 복수 키 파싱 기능 제공
    • 대규모 도서 매핑 데이터 및 매핑 생성/SQL 생성/DB 업데이트 스크립트 추가
    • 채팅 인용 UI 개선 — 한국어 제목, 표지 썸네일, 도서 정보 링크 노출
    • 메타데이터 필드 확장: kr_title, thumbnail, link 추가

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

다중 Gemini API 키 지원(파싱·회전·폴백), 도서 한글 번역·메타데이터 매핑 파이프라인(크롤링 및 Aladin 보강), 매핑 기반 SQL/DB 업데이트 스크립트, 설정 .env 경로 조정 및 프론트엔드 인용 렌더링·타입 확장이 추가되었습니다.

Changes

Cohort / File(s) Summary
Config & Env utils
backend/app/core/config.py, backend/app/core/env_utils.py
Settings.model_config.env_file 경로를 parents[3]로 조정. 새 유틸 parse_gemini_api_keys(env_path: Path) -> list[str] 추가(로컬 .env에서 GEMINI_API_KEY 목록 추출 및 중복 제거, 환경변수 폴백).
LLM 서비스
backend/app/services/llm.py
get_all_gemini_keys() 추가 및 get_llm() 변경: .env/환경에서 여러 Gemini 키 로드, 첫 키로 기본 Gemini 구성, 추가 키로 폴백 LLM 생성 및 with_fallbacks 연결, 키 없으면 런타임 에러 발생.
도서 매핑 데이터 및 생성 스크립트
backend/data/books_mapping.json, backend/scripts/generate_book_mapping.py
대량 books_mapping.json 추가(원문→한글 번역 매핑). generate_book_mapping.py 추가: 파일 탐색, Gemini 키 회전/재시도, Kyobo 폴백 스크래핑, Aladin 보강, 비동기 처리 및 안전한 I/O 로직 포함.
DB 업데이트 파이프라인 & SQL
backend/scripts/generate_sql_updates.py, backend/scripts/update_db_metadata.py, backend/update_metadata.sql
매핑으로 SQL 업데이트문 생성 스크립트 추가, Supabase 대상 문서 메타데이터 조회·매칭·배치 업데이트(스레드/재시도) 스크립트 추가, SQL 마이그레이션 파일(인덱스 생성 및 metadata jsonb 갱신) 추가.
유틸 스크립트
backend/scripts/check_models.py
로컬 .env에서 키 읽어 Gemini(Generative AI) 모델 목록 조회하는 점검 스크립트 추가.
프론트엔드 타입 및 UI
frontend/types/chat.ts, frontend/components/chat/MessageList.tsx
DocumentMetadata에 kr_title?, thumbnail?, link? 필드 추가. MessageList 카드 레이아웃을 썸네일·한글 제목·링크 표시 형태로 재구성, 클릭으로 인용 열기 트리거 유지.

Sequence Diagram

sequenceDiagram
    participant FS as 파일시스템
    participant Script as generate_book_mapping.py
    participant Gemini as Gemini LLM
    participant Kyobo as Kyobo 웹사이트
    participant Aladin as Aladin API
    participant Mapping as books_mapping.json

    FS->>Script: data/*.txt 파일 탐색
    loop 파일별 처리
        Script->>Script: 파일명에서 원제 도출
        Script->>Gemini: 번역 요청 (첫 키)
        alt 번역 성공
            Gemini-->>Script: 한글 제목/저자
        else 키/할당량 오류
            Script->>Gemini: 다음 키로 재시도 (회전)
            alt 모든 키 실패
                Script->>Kyobo: Kyobo에서 폴백 검색
                Kyobo-->>Script: 한글 제목/저자 (폴백)
            end
        end
        Script->>Aladin: 추가 메타데이터 조회 (ALADIN_API_KEY 있으면)
        Aladin-->>Script: 제목, 링크, 썸네일, ISBN 등
        Script-->>Mapping: `books_mapping.json`에 항목 추가/갱신
    end
    Script-->>FS: 결과 저장 (books_mapping.json)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • PR #7: 동일 파일(backend/app/core/config.py, backend/app/services/llm.py) 수정 — 설정 경로 및 LLM 키 처리 관련 강한 연관성.

Poem

🐰📚 파일 속 영문 제목을 냄새 맡아,
키를 모아 돌리고 폴백으로 길을 내며,
알라딘 그림자까지 꿰어 메타를 붙였네,
DB에 썸네일 반짝이고 카드엔 한글이 빛나네,
토끼가 살며시 매핑을 완성했어요. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed 제목이 PR의 주요 변경사항을 명확하게 요약합니다. 한국어 책 메타데이터 통합 및 UI 인용 기능이 변경세트의 핵심 내용입니다.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/book-metadata

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (8)
backend/scripts/update_db_metadata.py (3)

20-20: async 키워드가 불필요합니다

update_database() 함수가 async로 선언되어 있지만 내부에서 await를 사용하지 않습니다. Supabase 클라이언트도 동기식입니다. async를 제거하고 일반 함수로 변경하는 것이 코드 의도를 명확히 합니다.

♻️ async 제거
-async def update_database():
+def update_database():
     print(f"Loading mapping from {MAPPING_FILE}")
 if __name__ == "__main__":
-    asyncio.run(update_database())
+    update_database()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/update_db_metadata.py` at line 20, The function
update_database() is declared async but never uses await and the Supabase client
is synchronous; remove the unnecessary async keyword and make update_database a
regular function, update any places that call it to stop awaiting it (remove
await or adjust to call synchronously), and run tests to ensure no callers
expect a coroutine; reference the update_database() definition when making the
change.

68-68: 파일 확장자 처리 개선 권장

generate_sql_updates.py와 동일하게 original_file[:-4].txt 확장자를 가정합니다. Path.stem을 사용하면 더 안전합니다.

♻️ Path.stem 사용
-        name_without_ext = original_file[:-4]
+        name_without_ext = Path(original_file).stem
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/update_db_metadata.py` at line 68, The slice
original_file[:-4] assumes a 4-character extension; replace it with
Path(original_file).stem to safely get the filename without extension. Update
the code in update_db_metadata.py where name_without_ext is assigned (currently
name_without_ext = original_file[:-4]) to use Path(...).stem, and add "from
pathlib import Path" at the top if not already present so other code using
name_without_ext continues to work with the correct base name.

121-133: 동시 업데이트 워커 수 및 재시도 로직 검토

50개의 워커로 동시 업데이트 시 Supabase 요청 제한(rate limit)에 도달할 수 있습니다. 실패 시 재시도 로직이 없어 일시적 오류로 인한 업데이트 누락이 발생할 수 있습니다.

♻️ 워커 수 감소 및 재시도 로직 추가
+from time import sleep
+
 def update_doc(doc):
+    max_retries = 3
+    for attempt in range(max_retries):
+        try:
+            supabase.table("documents").update({"metadata": doc["metadata"]}).eq("id", doc["id"]).execute()
+            return True
+        except Exception as e:
+            if attempt < max_retries - 1:
+                sleep(0.5 * (attempt + 1))  # Exponential backoff
+                continue
+            print(f"Error updating {doc['id']}: {e}")
+            return False
-    try:
-        supabase.table("documents").update({"metadata": doc["metadata"]}).eq("id", doc["id"]).execute()
-        return True
-    except Exception as e:
-        print(f"Error updating {doc['id']}: {e}")
-        return False

-print(f"Starting concurrent updates with 50 workers for {len(updates_to_make)} rows...")
+print(f"Starting concurrent updates with 10 workers for {len(updates_to_make)} rows...")
 processed = 0
-with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
+with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/update_db_metadata.py` around lines 121 - 133, The current
concurrent update loop uses ThreadPoolExecutor(max_workers=50) and lacks retry
logic, which risks hitting Supabase rate limits and losing updates; reduce
concurrency (e.g., to 5-10) by changing ThreadPoolExecutor(max_workers=50) and
add retry-with-backoff inside update_doc (introduce parameters like max_retries
and base_delay), so update_doc catches transient errors (including HTTP 429
rate-limit responses), sleeps with exponential backoff plus jitter between
attempts, and returns success/failure only after retries are exhausted; ensure
existing counters (success_count, error_count) and the processed reporting
remain unchanged and that update_doc logs retry attempts for observability.
backend/update_metadata.sql (1)

1-9: 트랜잭션 래핑 고려

현재 UPDATE 문들이 트랜잭션으로 감싸져 있지 않아 일부 업데이트만 성공할 수 있습니다. 모든 업데이트의 원자성(atomicity)이 필요하다면 트랜잭션으로 감싸는 것을 권장합니다.

♻️ 트랜잭션 래핑 예시
 CREATE INDEX IF NOT EXISTS idx_documents_book_title ON documents ((metadata->'book_info'->>'title'));
 
 SET statement_timeout = '120s'; -- Increase timeout to be safe
+
+BEGIN;
 
 UPDATE documents
 SET metadata = metadata || '{"kr_title": "역설의 예산 1권", "thumbnail": "", "link": ""}'::jsonb
 WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Budget of Paradoxes Volume I';

파일 끝에 COMMIT; 추가

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/update_metadata.sql` around lines 1 - 9, Wrap the UPDATE statements
in an explicit transaction to ensure atomicity: start a transaction (e.g. BEGIN)
before the CREATE INDEX / UPDATE documents statement block and finish with a
COMMIT (or ROLLBACK on error) so either all metadata updates (the UPDATE
documents that sets kr_title/thumbnail/link) succeed together or none do; locate
the block around CREATE INDEX IF NOT EXISTS idx_documents_book_title and the
UPDATE documents ... WHERE metadata->'book_info'->>'title' = 'Korean Translation
of A Budget of Paradoxes Volume I' and add the transaction boundaries
accordingly.
backend/app/services/llm.py (1)

24-28: API 키 파싱 시 엣지 케이스 처리 필요

현재 정규식은 값에 포함된 따옴표나 후행 주석을 처리하지 않습니다. 예를 들어:

  • GEMINI_API_KEY="abc""abc" (따옴표 포함)
  • GEMINI_API_KEY=abc # commentabc # comment (주석 포함)
♻️ 키 값 정제를 위한 개선안
             for m in matches:
-                key = m.strip()
+                key = m.strip().strip('"').strip("'")
+                # Remove inline comments
+                if ' #' in key:
+                    key = key.split(' #')[0].strip()
                 if key and key not in keys:
                     keys.append(key)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm.py` around lines 24 - 28, The current parsing loop
that builds matches and assigns key is not stripping surrounding quotes or
removing inline comments; update the regex or post-process the capture in the
loop (the matches variable and the loop that sets key) so the extracted value
excludes trailing inline comments and surrounding single/double quotes. For
example, either change the pattern to capture quoted or unquoted values up to a
comment (e.g., use groups for "([^"]*)" or '([^']*)' or ([^#\n]+)) and pick the
non-empty group, or after getting m run m = re.split(r'\s+#', m,
1)[0].strip().strip('\'"') before appending to keys so values like
GEMINI_API_KEY="abc" and GEMINI_API_KEY=abc # comment yield clean abc.
frontend/components/chat/MessageList.tsx (1)

77-77: 더미 URL 상수 추출 권장

더미 URL 문자열이 여러 곳에 하드코딩되어 있습니다. 상수로 추출하면 유지보수성이 향상됩니다.

♻️ 상수 추출 예시
const DUMMY_COVER_URL = "https://image.aladin.co.kr/product/dummy";
const DUMMY_BOOK_LINK = "https://www.aladin.co.kr/dummy-link";

// 사용 예시
const coverUrl = meta.thumbnail || (meta.book_info?.cover_url !== DUMMY_COVER_URL ? meta.book_info?.cover_url : "");

Also applies to: 102-102, 120-120

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/components/chat/MessageList.tsx` at line 77, Multiple instances use
the hardcoded dummy URL string when computing coverUrl (e.g., the expression
meta.book_info?.cover_url !== "https://image.aladin.co.kr/product/dummy"), so
extract that literal into a shared constant (e.g., DUMMY_COVER_URL) and replace
all occurrences (including where coverUrl is computed and the other similar
spots) to improve maintainability; update relevant imports/exports if needed and
ensure comparisons use the new DUMMY_COVER_URL constant instead of the inline
string.
backend/scripts/generate_sql_updates.py (1)

18-24: 파일 확장자 처리 개선 권장

original_file[:-4].txt 확장자(4글자)를 가정합니다. 다른 확장자의 파일이 있을 경우 의도치 않은 결과가 발생할 수 있습니다.

♻️ Path를 사용한 안전한 확장자 처리
+from pathlib import Path
+
 def generate_sql():
     # ...
     for book in mapping_data:
         original_file = book.get("original_file", "")
-        name_without_ext = original_file[:-4]
+        name_without_ext = Path(original_file).stem
         parts = name_without_ext.rsplit(" by ", 1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/generate_sql_updates.py` around lines 18 - 24, The code
currently trims the extension with original_file[:-4], which wrongly assumes a
3-char extension like ".txt"; replace this with
os.path.splitext(original_file)[0] to safely get the filename without extension
(ensure you import os), handle the case original_file == "" gracefully (e.g.,
default to ""), and then continue to split name_without_ext with rsplit(" by ",
1) to derive title as done now; update occurrences of original_file,
name_without_ext, parts, and title accordingly.
backend/scripts/generate_book_mapping.py (1)

66-68: 광범위한 Exception 캐치는 장애 원인 분류와 복구 정책을 약화시킵니다.

Line 66, Line 103, Line 143, Line 197에서 광범위 캐치를 사용하고 있어 네트워크 오류/파싱 오류/로직 오류가 동일하게 처리됩니다. 최소한 오류 타입을 분리해 재시도/중단 정책을 다르게 적용하는 편이 안전합니다.

Also applies to: 103-110, 143-145, 197-199

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/generate_book_mapping.py` around lines 66 - 68, Replace broad
except Exception blocks that print "Kyobo fetch error: {e}" (and the similar
handlers at the other noted sites) with targeted exception handling: catch
network-related errors (e.g., requests.exceptions.RequestException) separately
from parsing/format errors (e.g., json.JSONDecodeError, ValueError) and
unexpected attribute/type errors (AttributeError, TypeError), log the exception
with context, and apply different recovery actions (retry/backoff for network
errors, abort/raise for parsing or logic errors). Update the handlers around the
Kyobo fetch code paths (the blocks that currently print the Kyobo fetch error)
to re-raise non-recoverable exceptions after logging and implement a simple
retry loop only for RequestException with a limited retry count. Ensure you
reference and modify the existing except blocks that contain the "Kyobo fetch
error" print so the change is localized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/scripts/generate_book_mapping.py`:
- Line 33: The single-line if statements (e.g., "if k: api_keys.append(k)" and
the similar one later) trigger Ruff E701; replace each one-line form with a
standard multi-line if block: make the condition and the body on separate lines
and use indentation so the append call is on the next line under the if; update
both occurrences (the one using variable k and api_keys and the other similar
instance around line 71) to the expanded form to satisfy the linter.
- Around line 64-65: urlopen 호출에 timeout이 없어 외부 요청이 무기한 블록될 수 있으니,
generate_book_mapping.py 안의 두 곳(현재 req 변수로 호출하는 urllib.request.urlopen(req)
위치들)을 urllib.request.urlopen(req, timeout=...)로 변경해 적절한 타임아웃(예: 5-15초) 값을 전달하고,
호출부(예: 해당 함수들)에서 socket.timeout/urllib.error.URLError를 잡아 로깅하고 실패한 항목을 건너뛰거나 재시도
로직으로 처리하도록 예외 처리를 추가하세요; 두 urlopen 호출을 모두 동일하게 수정해 일관된 동작을 보장하십시오.
- Around line 192-193: The code currently writes mapping directly to
MAPPING_FILE (json.dump(mapping, ...)) which can corrupt the file on
interruption; modify the write to use an atomic replace: write the JSON to a
temporary file in the same directory (e.g., via
tempfile.NamedTemporaryFile(delete=False) or similar), flush and fsync the temp
file, close it, then atomically replace MAPPING_FILE with os.replace(temp_path,
MAPPING_FILE); ensure you remove the temp file on error; apply this change
around the block that opens MAPPING_FILE and serializes "mapping".
- Around line 24-29: 현재 env file 파싱 블록(open(env_path, ...) for line in f)이 부분
문자열로 주석 처리된 라인(# GEMINI_API_KEY=...)까지 잡고 따옴표나 인라인 주석을 제거하지 않으므로 잘못된 키가 들어갑니다;
수정 방법은 라인을 처리할 때 먼저 라인 앞뒤 공백을 strip하고 주석('#')로 시작하는 라인은 무시하며 "GEMINI_API_KEY="을
찾을 때는 라인이 주석이 아닌 경우에만 처리하고, 값을 뽑은 뒤 inline comment( '#' 이후)를 제거하고 앞뒤 따옴표(' 또는
")를 제거한 다음 빈 값이 아니면 api_keys 리스트에 추가하도록 바꿔라(참조: env_path file read loop, 변수 key,
리스트 api_keys).

In `@backend/scripts/generate_sql_updates.py`:
- Around line 34-37: Remove the unnecessary escaped variables kr_title_esc,
thumbnail_esc, and link_esc from generate_sql_updates.py since the code uses
json.dumps and jsonb_payload_esc to handle escaping; keep title_to_match_esc if
it is actually used elsewhere. Search for the declarations of kr_title_esc,
thumbnail_esc, and link_esc and delete those lines plus any immediate unused
references, leaving the jsonb_payload_esc path intact.

---

Nitpick comments:
In `@backend/app/services/llm.py`:
- Around line 24-28: The current parsing loop that builds matches and assigns
key is not stripping surrounding quotes or removing inline comments; update the
regex or post-process the capture in the loop (the matches variable and the loop
that sets key) so the extracted value excludes trailing inline comments and
surrounding single/double quotes. For example, either change the pattern to
capture quoted or unquoted values up to a comment (e.g., use groups for
"([^"]*)" or '([^']*)' or ([^#\n]+)) and pick the non-empty group, or after
getting m run m = re.split(r'\s+#', m, 1)[0].strip().strip('\'"') before
appending to keys so values like GEMINI_API_KEY="abc" and GEMINI_API_KEY=abc #
comment yield clean abc.

In `@backend/scripts/generate_book_mapping.py`:
- Around line 66-68: Replace broad except Exception blocks that print "Kyobo
fetch error: {e}" (and the similar handlers at the other noted sites) with
targeted exception handling: catch network-related errors (e.g.,
requests.exceptions.RequestException) separately from parsing/format errors
(e.g., json.JSONDecodeError, ValueError) and unexpected attribute/type errors
(AttributeError, TypeError), log the exception with context, and apply different
recovery actions (retry/backoff for network errors, abort/raise for parsing or
logic errors). Update the handlers around the Kyobo fetch code paths (the blocks
that currently print the Kyobo fetch error) to re-raise non-recoverable
exceptions after logging and implement a simple retry loop only for
RequestException with a limited retry count. Ensure you reference and modify the
existing except blocks that contain the "Kyobo fetch error" print so the change
is localized.

In `@backend/scripts/generate_sql_updates.py`:
- Around line 18-24: The code currently trims the extension with
original_file[:-4], which wrongly assumes a 3-char extension like ".txt";
replace this with os.path.splitext(original_file)[0] to safely get the filename
without extension (ensure you import os), handle the case original_file == ""
gracefully (e.g., default to ""), and then continue to split name_without_ext
with rsplit(" by ", 1) to derive title as done now; update occurrences of
original_file, name_without_ext, parts, and title accordingly.

In `@backend/scripts/update_db_metadata.py`:
- Line 20: The function update_database() is declared async but never uses await
and the Supabase client is synchronous; remove the unnecessary async keyword and
make update_database a regular function, update any places that call it to stop
awaiting it (remove await or adjust to call synchronously), and run tests to
ensure no callers expect a coroutine; reference the update_database() definition
when making the change.
- Line 68: The slice original_file[:-4] assumes a 4-character extension; replace
it with Path(original_file).stem to safely get the filename without extension.
Update the code in update_db_metadata.py where name_without_ext is assigned
(currently name_without_ext = original_file[:-4]) to use Path(...).stem, and add
"from pathlib import Path" at the top if not already present so other code using
name_without_ext continues to work with the correct base name.
- Around line 121-133: The current concurrent update loop uses
ThreadPoolExecutor(max_workers=50) and lacks retry logic, which risks hitting
Supabase rate limits and losing updates; reduce concurrency (e.g., to 5-10) by
changing ThreadPoolExecutor(max_workers=50) and add retry-with-backoff inside
update_doc (introduce parameters like max_retries and base_delay), so update_doc
catches transient errors (including HTTP 429 rate-limit responses), sleeps with
exponential backoff plus jitter between attempts, and returns success/failure
only after retries are exhausted; ensure existing counters (success_count,
error_count) and the processed reporting remain unchanged and that update_doc
logs retry attempts for observability.

In `@backend/update_metadata.sql`:
- Around line 1-9: Wrap the UPDATE statements in an explicit transaction to
ensure atomicity: start a transaction (e.g. BEGIN) before the CREATE INDEX /
UPDATE documents statement block and finish with a COMMIT (or ROLLBACK on error)
so either all metadata updates (the UPDATE documents that sets
kr_title/thumbnail/link) succeed together or none do; locate the block around
CREATE INDEX IF NOT EXISTS idx_documents_book_title and the UPDATE documents ...
WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Budget of
Paradoxes Volume I' and add the transaction boundaries accordingly.

In `@frontend/components/chat/MessageList.tsx`:
- Line 77: Multiple instances use the hardcoded dummy URL string when computing
coverUrl (e.g., the expression meta.book_info?.cover_url !==
"https://image.aladin.co.kr/product/dummy"), so extract that literal into a
shared constant (e.g., DUMMY_COVER_URL) and replace all occurrences (including
where coverUrl is computed and the other similar spots) to improve
maintainability; update relevant imports/exports if needed and ensure
comparisons use the new DUMMY_COVER_URL constant instead of the inline string.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4f87e1 and 5a5407a.

📒 Files selected for processing (9)
  • backend/app/core/config.py
  • backend/app/services/llm.py
  • backend/data/books_mapping.json
  • backend/scripts/generate_book_mapping.py
  • backend/scripts/generate_sql_updates.py
  • backend/scripts/update_db_metadata.py
  • backend/update_metadata.sql
  • frontend/components/chat/MessageList.tsx
  • frontend/types/chat.ts

Comment thread backend/scripts/generate_book_mapping.py Outdated
Comment thread backend/scripts/generate_book_mapping.py Outdated
Comment thread backend/scripts/generate_book_mapping.py Outdated
Comment thread backend/scripts/generate_book_mapping.py Outdated
Comment thread backend/scripts/generate_sql_updates.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
backend/scripts/generate_book_mapping.py (1)

24-37: ⚠️ Potential issue | 🟠 Major

.env가 있어도 키 파싱 실패 시 환경변수 fallback이 동작하지 않습니다.

Line 24~37 구조에서는 .env 파일이 존재하기만 하면 os.getenv("GEMINI_API_KEY") fallback을 건너뜁니다. 결과적으로 키가 비어 LLM 경로를 타지 못할 수 있습니다.

🔧 제안 수정안
 if env_path.exists():
     with open(env_path, 'r', encoding='utf-8') as f:
         for line in f:
             m = re.match(r'^\s*GEMINI_API_KEY\s*=\s*(.+?)\s*(?:#.*)?$', line)
             if not m:
                 continue
             key = m.group(1).strip().strip('"').strip("'")
             if key and key not in api_keys:
                 api_keys.append(key)
-else:
-    # Fallback to os.environ if .env missing
-    k = os.getenv("GEMINI_API_KEY")
-    if k:
-        api_keys.append(k)
+
+# Fallback to os.environ when parsing produced no key
+if not api_keys:
+    k = os.getenv("GEMINI_API_KEY")
+    if k and k not in api_keys:
+        api_keys.append(k)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/generate_book_mapping.py` around lines 24 - 37, The current
logic in the env parsing block (around env_path.exists(), the loop that
populates api_keys, and the GEMINI_API_KEY fallback using os.getenv) skips the
environment-variable fallback whenever the .env file exists even if parsing
produced no keys; change the flow so that after processing the .env file you
check if api_keys is empty and only then call os.getenv("GEMINI_API_KEY") to
append a fallback key, i.e., keep the existing parsing loop for env_path and if
api_keys == [] (or len(api_keys) == 0) then read os.getenv("GEMINI_API_KEY") and
append if present (referencing env_path, api_keys, and GEMINI_API_KEY).
🧹 Nitpick comments (4)
backend/scripts/generate_book_mapping.py (1)

39-41: 테스트 전용 키 회전 하드코딩은 분리하는 것이 좋습니다.

Line 39~41의 순환 로직은 운영 실행 시 기본 키 우선순위를 바꿉니다. 환경변수 플래그로 opt-in 하거나 CLI 옵션으로 분리해 두는 편이 유지보수에 유리합니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/generate_book_mapping.py` around lines 39 - 41, The hardcoded
test-key rotation currently mutates api_keys (the api_keys variable) for all
runs; change this into an opt-in behavior by moving the rotation logic behind a
feature flag or CLI option (e.g., check an environment variable like
ENABLE_TEST_KEY_ROTATION or a CLI flag parsed in the script) so the default path
leaves api_keys unchanged; ensure the check encloses the existing rotation block
(the api_keys = api_keys[3:] + api_keys[:3] line) and only performs rotation
when the flag/arg is set, preserving current default ordering otherwise.
backend/update_metadata.sql (1)

1-1: 대규모 테이블이면 인덱스 생성 전략을 분리하는 것이 안전합니다.

Line 1의 CREATE INDEX는 실행 중 쓰기 블로킹을 유발할 수 있습니다. 운영 DB에서 온라인 적용이 필요하면 CREATE INDEX CONCURRENTLY를 별도 단계로 분리하세요.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/update_metadata.sql` at line 1, 현재 문장은 대규모 테이블에서 실행 시 쓰기 블로킹을 유발할 수
있는 CREATE INDEX (idx_documents_book_title on documents ...)을 즉시 생성하도록 되어 있으므로,
운영 DB에서는 별도 단계로 분리해 CREATE INDEX CONCURRENTLY를 사용하도록 변경하세요: 실행 계획은 CREATE INDEX
CONCURRENTLY idx_documents_book_title ON documents
((metadata->'book_info'->>'title')); 형태로 바꿔 배포 스크립트에서 트랜잭션 바깥에서(CONCURRENTLY는
트랜잭션 내부에서 실행할 수 없음) 별도 마이그레이션/배포 단계로 적용하도록 지시합니다.
backend/scripts/check_models.py (1)

10-12: 검증 실패를 종료 코드로 반환해 주세요.

Line 10~12는 API 키가 없어도 성공(0)으로 종료되어 자동화에서 실패를 감지하기 어렵습니다.

🔧 제안 수정안
 api_key = os.getenv("GEMINI_API_KEY")
 if not api_key:
     print("No API key found!")
+    raise SystemExit(1)
 else:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/check_models.py` around lines 10 - 12, The current check uses
"if not api_key: print('No API key found!') else:" which prints an error but
still exits with success; change this to terminate with a non-zero exit code so
CI detects failure: import sys at the top of backend/scripts/check_models.py (if
not already), replace the print-only branch to call sys.exit(1) (or raise
SystemExit(1)) when api_key is falsy, and keep the successful path
returning/allowing exit code 0; reference the api_key check in that file to
locate where to add sys.exit(1).
pr_commet.txt (1)

1-360: 리뷰 로그 산출물 파일은 저장소에서 제외하는 편이 좋습니다.

pr_commet.txt는 실행 코드나 문서 스펙이 아니라 리뷰 중간 산출물 성격이라, 유지보수 노이즈와 불필요한 정보 노출을 유발합니다. PR에서 제외(또는 .gitignore 처리) 권장드립니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pr_commet.txt` around lines 1 - 360, The review-output file pr_commet.txt
should be removed from the repository and prevented from being committed again:
delete pr_commet.txt from the branch (remove it from the index/history if
already committed), add an entry for pr_commet.txt (or a pattern covering review
artifacts) to .gitignore, and commit the removal plus .gitignore change so
future review logs are not tracked; ensure any CI or deploy steps that might
rely on this file are updated or validated before pushing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/llm.py`:
- Around line 24-31: The regex currently captures commented-out lines like "#
GEMINI_API_KEY=...", so update the parsing in the function that builds `matches`
to only match active assignments: use a multiline-anchored pattern that requires
the line not be commented, e.g. change the re.findall to something like
re.findall(r'^\s*GEMINI_API_KEY\s*=\s*(.+)', content, re.MULTILINE) (or use a
negative lookahead to exclude lines that start with '#' ) and keep the existing
inline-comment stripping and quoting logic for the captured `m` before adding to
`keys`.

In `@backend/scripts/generate_sql_updates.py`:
- Around line 12-15: The SQL generation lacks transactional boundaries; update
the sql_statements list in generate_sql_updates.py (the variable named
sql_statements) to wrap the DDL/DML lines with an explicit transaction by
inserting "BEGIN;" (or "BEGIN TRANSACTION;") as the first entry and "COMMIT;" as
the last entry so the CREATE INDEX and SET statement_timeout are applied
atomically; also make the same change where a similar sql_statements block
appears around lines 49-55 to ensure those generated statements are similarly
wrapped.
- Around line 6-8: The code uses relative paths (e.g., mapping_path variable
reading "data/books_mapping.json" and the logic that writes
"update_metadata.sql") which break when the script is run from a different CWD;
change path construction to be relative to the script file by using the script
directory (os.path.dirname(__file__)) as the base and join it with
"data/books_mapping.json" for reading and the desired output filename for
writing so both the open(...) that loads mapping_data and the file open/write
that creates update_metadata.sql use absolute paths derived from the script
location.

In `@backend/scripts/update_db_metadata.py`:
- Around line 86-90: The batch can crash when a record's metadata is None
because the code calls metadata.get(...) directly; update the loop that reads
doc['metadata'] to defensively handle None by checking if metadata is truthy
(e.g., if not metadata: continue or skip this doc) before computing db_title, so
references to metadata.get('book_info', {}).get('title', '') are only executed
when metadata is a dict (look for variables doc, metadata, db_title in
update_db_metadata.py and add a guard to skip or default for records with
metadata is None).

In `@backend/update_metadata.sql`:
- Around line 17-22: The stored JSON contains HTML-escaped ampersands ("&amp;")
in the "link" field which breaks real query parameters; for the two UPDATE
statements that set metadata -> 'link' (the rows filtered by
metadata->'book_info'->>'title' = 'Korean Translation of A Treatise of Human
Nature' and 'Korean Translation of A Vindication of the Rights of Woman'),
unescape HTML entities before writing: replace "&amp;" with "&" (or run an
HTML-unescape routine) when constructing the JSONB value, and prefer using
jsonb_build_object/jsonb_set or programmatic JSON generation instead of raw
string literals to ensure the link is inserted unescaped and safely escaped for
SQL.

---

Duplicate comments:
In `@backend/scripts/generate_book_mapping.py`:
- Around line 24-37: The current logic in the env parsing block (around
env_path.exists(), the loop that populates api_keys, and the GEMINI_API_KEY
fallback using os.getenv) skips the environment-variable fallback whenever the
.env file exists even if parsing produced no keys; change the flow so that after
processing the .env file you check if api_keys is empty and only then call
os.getenv("GEMINI_API_KEY") to append a fallback key, i.e., keep the existing
parsing loop for env_path and if api_keys == [] (or len(api_keys) == 0) then
read os.getenv("GEMINI_API_KEY") and append if present (referencing env_path,
api_keys, and GEMINI_API_KEY).

---

Nitpick comments:
In `@backend/scripts/check_models.py`:
- Around line 10-12: The current check uses "if not api_key: print('No API key
found!') else:" which prints an error but still exits with success; change this
to terminate with a non-zero exit code so CI detects failure: import sys at the
top of backend/scripts/check_models.py (if not already), replace the print-only
branch to call sys.exit(1) (or raise SystemExit(1)) when api_key is falsy, and
keep the successful path returning/allowing exit code 0; reference the api_key
check in that file to locate where to add sys.exit(1).

In `@backend/scripts/generate_book_mapping.py`:
- Around line 39-41: The hardcoded test-key rotation currently mutates api_keys
(the api_keys variable) for all runs; change this into an opt-in behavior by
moving the rotation logic behind a feature flag or CLI option (e.g., check an
environment variable like ENABLE_TEST_KEY_ROTATION or a CLI flag parsed in the
script) so the default path leaves api_keys unchanged; ensure the check encloses
the existing rotation block (the api_keys = api_keys[3:] + api_keys[:3] line)
and only performs rotation when the flag/arg is set, preserving current default
ordering otherwise.

In `@backend/update_metadata.sql`:
- Line 1: 현재 문장은 대규모 테이블에서 실행 시 쓰기 블로킹을 유발할 수 있는 CREATE INDEX
(idx_documents_book_title on documents ...)을 즉시 생성하도록 되어 있으므로, 운영 DB에서는 별도 단계로
분리해 CREATE INDEX CONCURRENTLY를 사용하도록 변경하세요: 실행 계획은 CREATE INDEX CONCURRENTLY
idx_documents_book_title ON documents ((metadata->'book_info'->>'title')); 형태로
바꿔 배포 스크립트에서 트랜잭션 바깥에서(CONCURRENTLY는 트랜잭션 내부에서 실행할 수 없음) 별도 마이그레이션/배포 단계로 적용하도록
지시합니다.

In `@pr_commet.txt`:
- Around line 1-360: The review-output file pr_commet.txt should be removed from
the repository and prevented from being committed again: delete pr_commet.txt
from the branch (remove it from the index/history if already committed), add an
entry for pr_commet.txt (or a pattern covering review artifacts) to .gitignore,
and commit the removal plus .gitignore change so future review logs are not
tracked; ensure any CI or deploy steps that might rely on this file are updated
or validated before pushing.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a5407a and 8a01e1d.

📒 Files selected for processing (8)
  • backend/app/services/llm.py
  • backend/scripts/check_models.py
  • backend/scripts/generate_book_mapping.py
  • backend/scripts/generate_sql_updates.py
  • backend/scripts/update_db_metadata.py
  • backend/update_metadata.sql
  • frontend/components/chat/MessageList.tsx
  • pr_commet.txt

Comment thread backend/app/services/llm.py Outdated
Comment thread backend/scripts/generate_sql_updates.py Outdated
Comment thread backend/scripts/generate_sql_updates.py
Comment on lines +86 to +90
metadata = doc['metadata']

# The DB stores the title we want to match inside metadata->book_info->title
db_title = metadata.get('book_info', {}).get('title', '')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

metadata가 null인 레코드에서 배치가 중단될 수 있습니다.

Line 89에서 metadata.get(...)을 바로 호출하므로 metadataNone이면 예외로 전체 업데이트가 중단됩니다. 레코드 단위로 방어 처리해 건너뛰는 것이 안전합니다.

🔧 제안 수정안
     for doc in all_docs:
         doc_id = doc['id']
-        metadata = doc['metadata']
+        metadata = doc.get('metadata') or {}
+        if not isinstance(metadata, dict):
+            print(f"Skipping {doc_id}: metadata is not an object")
+            continue
         
         # The DB stores the title we want to match inside metadata->book_info->title
         db_title = metadata.get('book_info', {}).get('title', '')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
metadata = doc['metadata']
# The DB stores the title we want to match inside metadata->book_info->title
db_title = metadata.get('book_info', {}).get('title', '')
metadata = doc.get('metadata') or {}
if not isinstance(metadata, dict):
print(f"Skipping {doc_id}: metadata is not an object")
continue
# The DB stores the title we want to match inside metadata->book_info->title
db_title = metadata.get('book_info', {}).get('title', '')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/update_db_metadata.py` around lines 86 - 90, The batch can
crash when a record's metadata is None because the code calls metadata.get(...)
directly; update the loop that reads doc['metadata'] to defensively handle None
by checking if metadata is truthy (e.g., if not metadata: continue or skip this
doc) before computing db_title, so references to metadata.get('book_info',
{}).get('title', '') are only executed when metadata is a dict (look for
variables doc, metadata, db_title in update_db_metadata.py and add a guard to
skip or default for records with metadata is None).

Comment on lines +17 to +22
SET metadata = metadata || '{"kr_title": "인간 본성론", "thumbnail": "https://image.aladin.co.kr/product/435/90/coversum/8949705206_1.jpg", "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=4359030&amp;partner=openAPI&amp;start=api"}'::jsonb
WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Treatise of Human Nature';

UPDATE documents
SET metadata = metadata || '{"kr_title": "여권 옹호", "thumbnail": "https://image.aladin.co.kr/product/4569/0/coversum/8994054596_1.jpg", "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=45690064&amp;partner=openAPI&amp;start=api"}'::jsonb
WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Vindication of the Rights of Woman';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

URL에 HTML 엔티티(&amp;)가 저장되어 링크가 깨집니다.

Line 17, Line 21의 link처럼 &amp;가 들어가면 브라우저에서 실제 쿼리 파라미터가 잘못 해석됩니다. 생성 단계에서 HTML unescape 후 SQL을 재생성하는 것이 안전합니다.

🔧 예시(한 건)
- "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=4359030&amp;partner=openAPI&amp;start=api"
+ "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=4359030&partner=openAPI&start=api"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SET metadata = metadata || '{"kr_title": "인간 본성론", "thumbnail": "https://image.aladin.co.kr/product/435/90/coversum/8949705206_1.jpg", "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=4359030&amp;partner=openAPI&amp;start=api"}'::jsonb
WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Treatise of Human Nature';
UPDATE documents
SET metadata = metadata || '{"kr_title": "여권 옹호", "thumbnail": "https://image.aladin.co.kr/product/4569/0/coversum/8994054596_1.jpg", "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=45690064&amp;partner=openAPI&amp;start=api"}'::jsonb
WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Vindication of the Rights of Woman';
SET metadata = metadata || '{"kr_title": "인간 본성론", "thumbnail": "https://image.aladin.co.kr/product/435/90/coversum/8949705206_1.jpg", "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=4359030&partner=openAPI&start=api"}'::jsonb
WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Treatise of Human Nature';
UPDATE documents
SET metadata = metadata || '{"kr_title": "여권 옹호", "thumbnail": "https://image.aladin.co.kr/product/4569/0/coversum/8994054596_1.jpg", "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=45690064&partner=openAPI&start=api"}'::jsonb
WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Vindication of the Rights of Woman';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/update_metadata.sql` around lines 17 - 22, The stored JSON contains
HTML-escaped ampersands ("&amp;") in the "link" field which breaks real query
parameters; for the two UPDATE statements that set metadata -> 'link' (the rows
filtered by metadata->'book_info'->>'title' = 'Korean Translation of A Treatise
of Human Nature' and 'Korean Translation of A Vindication of the Rights of
Woman'), unescape HTML entities before writing: replace "&amp;" with "&" (or run
an HTML-unescape routine) when constructing the JSONB value, and prefer using
jsonb_build_object/jsonb_set or programmatic JSON generation instead of raw
string literals to ensure the link is inserted unescaped and safely escaped for
SQL.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/llm.py`:
- Around line 15-16: The docstring for get_all_gemini_keys() incorrectly states
it reads "commented" GEMINI_API_KEYs while the current regex in that function
(the pattern used to parse .env lines) does not match commented lines; either
update the docstring to say it only reads active (uncommented) keys, or modify
the parsing regex logic to also capture commented lines (and strip comment
markers) if you intend to support commented keys—locate get_all_gemini_keys and
the parsing block (around the existing regex and loop that processes lines
24-28) and apply the chosen change so docstring and implementation match.

In `@backend/scripts/check_models.py`:
- Around line 11-23: The top-level script currently runs API-key checks and
exits during import (api_key, genai.configure, genai.list_models), which can
break imports/tests; refactor by moving the API-key lookup, genai.configure
call, and the model-listing logic into a main() function (e.g., def main(): ...)
and protect execution with if __name__ == "__main__": main() so importing the
module no longer causes sys.exit or prints to stdout, while keeping
genai.configure and the try/except around genai.list_models inside main().

In `@backend/scripts/generate_book_mapping.py`:
- Around line 191-192: The current comprehension building existing_files uses
item["original_file"] which can raise KeyError for malformed entries in mapping;
update the logic in the existing_files creation to safely access the key (e.g.,
filter mapping items that contain a non-null "original_file" via
item.get("original_file") or an "original_file" in item check) so malformed
entries are skipped and files_to_process = [f for f in txt_files if f.name not
in existing_files] continues without crashing; reference the existing_files
variable, the mapping iterable, item["original_file"] access, and
files_to_process/txt_files when making the change.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a01e1d and 133442a.

📒 Files selected for processing (5)
  • .gitignore
  • backend/app/services/llm.py
  • backend/scripts/check_models.py
  • backend/scripts/generate_book_mapping.py
  • backend/scripts/generate_sql_updates.py

Comment thread backend/app/services/llm.py Outdated
Comment thread backend/scripts/check_models.py Outdated
Comment thread backend/scripts/generate_book_mapping.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/app/services/llm.py (1)

15-40: .env 키 파싱 로직을 공통 유틸로 모으는 것을 권장합니다.

Line 15-40의 파싱 로직이 backend/scripts/generate_book_mapping.py에도 거의 동일하게 존재합니다. 파싱 규칙(주석/인라인 코멘트/인용부호 처리) 변경 시 한쪽만 수정될 위험이 있어, 공통 함수로 분리하면 유지보수성이 좋아집니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm.py` around lines 15 - 40, The GEMINI .env parsing
logic duplicated in get_all_gemini_keys and in
backend/scripts/generate_book_mapping.py should be refactored into a single
shared utility (e.g., parse_env_api_keys or parse_env_key_assignments) that
encapsulates the regex, inline comment/quote stripping, and the behavior of
including settings.GEMINI_API_KEY; replace the body of get_all_gemini_keys to
call that shared utility and update the generate_book_mapping code to call it as
well so both use the same implementation and preserve existing behavior (reading
root .env, deduping keys, and prepending settings.GEMINI_API_KEY when present).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/scripts/generate_book_mapping.py`:
- Around line 131-133: The Aladin API request URL in generate_book_mapping.py is
using plain HTTP and exposes ALADIN_API_KEY; update the URL construction for the
variable url to use HTTPS (change "http://" to "https://") so the key and query
are sent securely, and optionally refactor the request-building to use
urllib.parse.urlencode with a params dict to safely encode query parameters
(e.g., reference the url variable and the query construction around f"{title}
{author}" to locate the change).

---

Nitpick comments:
In `@backend/app/services/llm.py`:
- Around line 15-40: The GEMINI .env parsing logic duplicated in
get_all_gemini_keys and in backend/scripts/generate_book_mapping.py should be
refactored into a single shared utility (e.g., parse_env_api_keys or
parse_env_key_assignments) that encapsulates the regex, inline comment/quote
stripping, and the behavior of including settings.GEMINI_API_KEY; replace the
body of get_all_gemini_keys to call that shared utility and update the
generate_book_mapping code to call it as well so both use the same
implementation and preserve existing behavior (reading root .env, deduping keys,
and prepending settings.GEMINI_API_KEY when present).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 133442a and 43d1722.

📒 Files selected for processing (3)
  • backend/app/services/llm.py
  • backend/scripts/check_models.py
  • backend/scripts/generate_book_mapping.py

Comment thread backend/scripts/generate_book_mapping.py
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/scripts/generate_book_mapping.py`:
- Around line 100-107: The except block handling LLM translation errors stops
the key-rotation loop prematurely because it uses break; when non-quota/auth
errors occur the code falls back to Kyobo instead of trying remaining keys.
Change the control flow so that on non-quota exceptions you skip the current
attempt but continue trying other keys (use continue/advance logic instead of
break), and only fall back to Kyobo when current_key_idx has advanced past all
entries in api_keys (check current_key_idx >= len(api_keys) before triggering
the fallback for file_name). Ensure references to current_key_idx and api_keys
are updated consistently in generate_book_mapping.py so key rotation tries all
keys before fallback.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43d1722 and 0dd84a4.

📒 Files selected for processing (3)
  • backend/app/core/env_utils.py
  • backend/app/services/llm.py
  • backend/scripts/generate_book_mapping.py

Comment thread backend/scripts/generate_book_mapping.py Outdated
@SanghunYun95 SanghunYun95 merged commit f19e92e into main Mar 3, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant