feat: integrate korean book metadata and UI citations#10
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthrough다중 Gemini API 키 지원(파싱·회전·폴백), 도서 한글 번역·메타데이터 매핑 파이프라인(크롤링 및 Aladin 보강), 매핑 기반 SQL/DB 업데이트 스크립트, 설정 .env 경로 조정 및 프론트엔드 인용 렌더링·타입 확장이 추가되었습니다. Changes
Sequence DiagramsequenceDiagram
participant FS as 파일시스템
participant Script as generate_book_mapping.py
participant Gemini as Gemini LLM
participant Kyobo as Kyobo 웹사이트
participant Aladin as Aladin API
participant Mapping as books_mapping.json
FS->>Script: data/*.txt 파일 탐색
loop 파일별 처리
Script->>Script: 파일명에서 원제 도출
Script->>Gemini: 번역 요청 (첫 키)
alt 번역 성공
Gemini-->>Script: 한글 제목/저자
else 키/할당량 오류
Script->>Gemini: 다음 키로 재시도 (회전)
alt 모든 키 실패
Script->>Kyobo: Kyobo에서 폴백 검색
Kyobo-->>Script: 한글 제목/저자 (폴백)
end
end
Script->>Aladin: 추가 메타데이터 조회 (ALADIN_API_KEY 있으면)
Aladin-->>Script: 제목, 링크, 썸네일, ISBN 등
Script-->>Mapping: `books_mapping.json`에 항목 추가/갱신
end
Script-->>FS: 결과 저장 (books_mapping.json)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (8)
backend/scripts/update_db_metadata.py (3)
20-20:async키워드가 불필요합니다
update_database()함수가async로 선언되어 있지만 내부에서await를 사용하지 않습니다. Supabase 클라이언트도 동기식입니다.async를 제거하고 일반 함수로 변경하는 것이 코드 의도를 명확히 합니다.♻️ async 제거
-async def update_database(): +def update_database(): print(f"Loading mapping from {MAPPING_FILE}")if __name__ == "__main__": - asyncio.run(update_database()) + update_database()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/scripts/update_db_metadata.py` at line 20, The function update_database() is declared async but never uses await and the Supabase client is synchronous; remove the unnecessary async keyword and make update_database a regular function, update any places that call it to stop awaiting it (remove await or adjust to call synchronously), and run tests to ensure no callers expect a coroutine; reference the update_database() definition when making the change.
68-68: 파일 확장자 처리 개선 권장
generate_sql_updates.py와 동일하게original_file[:-4]는.txt확장자를 가정합니다.Path.stem을 사용하면 더 안전합니다.♻️ Path.stem 사용
- name_without_ext = original_file[:-4] + name_without_ext = Path(original_file).stem🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/scripts/update_db_metadata.py` at line 68, The slice original_file[:-4] assumes a 4-character extension; replace it with Path(original_file).stem to safely get the filename without extension. Update the code in update_db_metadata.py where name_without_ext is assigned (currently name_without_ext = original_file[:-4]) to use Path(...).stem, and add "from pathlib import Path" at the top if not already present so other code using name_without_ext continues to work with the correct base name.
121-133: 동시 업데이트 워커 수 및 재시도 로직 검토50개의 워커로 동시 업데이트 시 Supabase 요청 제한(rate limit)에 도달할 수 있습니다. 실패 시 재시도 로직이 없어 일시적 오류로 인한 업데이트 누락이 발생할 수 있습니다.
♻️ 워커 수 감소 및 재시도 로직 추가
+from time import sleep + def update_doc(doc): + max_retries = 3 + for attempt in range(max_retries): + try: + supabase.table("documents").update({"metadata": doc["metadata"]}).eq("id", doc["id"]).execute() + return True + except Exception as e: + if attempt < max_retries - 1: + sleep(0.5 * (attempt + 1)) # Exponential backoff + continue + print(f"Error updating {doc['id']}: {e}") + return False - try: - supabase.table("documents").update({"metadata": doc["metadata"]}).eq("id", doc["id"]).execute() - return True - except Exception as e: - print(f"Error updating {doc['id']}: {e}") - return False -print(f"Starting concurrent updates with 50 workers for {len(updates_to_make)} rows...") +print(f"Starting concurrent updates with 10 workers for {len(updates_to_make)} rows...") processed = 0 -with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor: +with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/scripts/update_db_metadata.py` around lines 121 - 133, The current concurrent update loop uses ThreadPoolExecutor(max_workers=50) and lacks retry logic, which risks hitting Supabase rate limits and losing updates; reduce concurrency (e.g., to 5-10) by changing ThreadPoolExecutor(max_workers=50) and add retry-with-backoff inside update_doc (introduce parameters like max_retries and base_delay), so update_doc catches transient errors (including HTTP 429 rate-limit responses), sleeps with exponential backoff plus jitter between attempts, and returns success/failure only after retries are exhausted; ensure existing counters (success_count, error_count) and the processed reporting remain unchanged and that update_doc logs retry attempts for observability.backend/update_metadata.sql (1)
1-9: 트랜잭션 래핑 고려현재 UPDATE 문들이 트랜잭션으로 감싸져 있지 않아 일부 업데이트만 성공할 수 있습니다. 모든 업데이트의 원자성(atomicity)이 필요하다면 트랜잭션으로 감싸는 것을 권장합니다.
♻️ 트랜잭션 래핑 예시
CREATE INDEX IF NOT EXISTS idx_documents_book_title ON documents ((metadata->'book_info'->>'title')); SET statement_timeout = '120s'; -- Increase timeout to be safe + +BEGIN; UPDATE documents SET metadata = metadata || '{"kr_title": "역설의 예산 1권", "thumbnail": "", "link": ""}'::jsonb WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Budget of Paradoxes Volume I';파일 끝에
COMMIT;추가🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/update_metadata.sql` around lines 1 - 9, Wrap the UPDATE statements in an explicit transaction to ensure atomicity: start a transaction (e.g. BEGIN) before the CREATE INDEX / UPDATE documents statement block and finish with a COMMIT (or ROLLBACK on error) so either all metadata updates (the UPDATE documents that sets kr_title/thumbnail/link) succeed together or none do; locate the block around CREATE INDEX IF NOT EXISTS idx_documents_book_title and the UPDATE documents ... WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Budget of Paradoxes Volume I' and add the transaction boundaries accordingly.backend/app/services/llm.py (1)
24-28: API 키 파싱 시 엣지 케이스 처리 필요현재 정규식은 값에 포함된 따옴표나 후행 주석을 처리하지 않습니다. 예를 들어:
GEMINI_API_KEY="abc"→"abc"(따옴표 포함)GEMINI_API_KEY=abc # comment→abc # comment(주석 포함)♻️ 키 값 정제를 위한 개선안
for m in matches: - key = m.strip() + key = m.strip().strip('"').strip("'") + # Remove inline comments + if ' #' in key: + key = key.split(' #')[0].strip() if key and key not in keys: keys.append(key)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm.py` around lines 24 - 28, The current parsing loop that builds matches and assigns key is not stripping surrounding quotes or removing inline comments; update the regex or post-process the capture in the loop (the matches variable and the loop that sets key) so the extracted value excludes trailing inline comments and surrounding single/double quotes. For example, either change the pattern to capture quoted or unquoted values up to a comment (e.g., use groups for "([^"]*)" or '([^']*)' or ([^#\n]+)) and pick the non-empty group, or after getting m run m = re.split(r'\s+#', m, 1)[0].strip().strip('\'"') before appending to keys so values like GEMINI_API_KEY="abc" and GEMINI_API_KEY=abc # comment yield clean abc.frontend/components/chat/MessageList.tsx (1)
77-77: 더미 URL 상수 추출 권장더미 URL 문자열이 여러 곳에 하드코딩되어 있습니다. 상수로 추출하면 유지보수성이 향상됩니다.
♻️ 상수 추출 예시
const DUMMY_COVER_URL = "https://image.aladin.co.kr/product/dummy"; const DUMMY_BOOK_LINK = "https://www.aladin.co.kr/dummy-link"; // 사용 예시 const coverUrl = meta.thumbnail || (meta.book_info?.cover_url !== DUMMY_COVER_URL ? meta.book_info?.cover_url : "");Also applies to: 102-102, 120-120
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/components/chat/MessageList.tsx` at line 77, Multiple instances use the hardcoded dummy URL string when computing coverUrl (e.g., the expression meta.book_info?.cover_url !== "https://image.aladin.co.kr/product/dummy"), so extract that literal into a shared constant (e.g., DUMMY_COVER_URL) and replace all occurrences (including where coverUrl is computed and the other similar spots) to improve maintainability; update relevant imports/exports if needed and ensure comparisons use the new DUMMY_COVER_URL constant instead of the inline string.backend/scripts/generate_sql_updates.py (1)
18-24: 파일 확장자 처리 개선 권장
original_file[:-4]는.txt확장자(4글자)를 가정합니다. 다른 확장자의 파일이 있을 경우 의도치 않은 결과가 발생할 수 있습니다.♻️ Path를 사용한 안전한 확장자 처리
+from pathlib import Path + def generate_sql(): # ... for book in mapping_data: original_file = book.get("original_file", "") - name_without_ext = original_file[:-4] + name_without_ext = Path(original_file).stem parts = name_without_ext.rsplit(" by ", 1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/scripts/generate_sql_updates.py` around lines 18 - 24, The code currently trims the extension with original_file[:-4], which wrongly assumes a 3-char extension like ".txt"; replace this with os.path.splitext(original_file)[0] to safely get the filename without extension (ensure you import os), handle the case original_file == "" gracefully (e.g., default to ""), and then continue to split name_without_ext with rsplit(" by ", 1) to derive title as done now; update occurrences of original_file, name_without_ext, parts, and title accordingly.backend/scripts/generate_book_mapping.py (1)
66-68: 광범위한Exception캐치는 장애 원인 분류와 복구 정책을 약화시킵니다.Line 66, Line 103, Line 143, Line 197에서 광범위 캐치를 사용하고 있어 네트워크 오류/파싱 오류/로직 오류가 동일하게 처리됩니다. 최소한 오류 타입을 분리해 재시도/중단 정책을 다르게 적용하는 편이 안전합니다.
Also applies to: 103-110, 143-145, 197-199
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/scripts/generate_book_mapping.py` around lines 66 - 68, Replace broad except Exception blocks that print "Kyobo fetch error: {e}" (and the similar handlers at the other noted sites) with targeted exception handling: catch network-related errors (e.g., requests.exceptions.RequestException) separately from parsing/format errors (e.g., json.JSONDecodeError, ValueError) and unexpected attribute/type errors (AttributeError, TypeError), log the exception with context, and apply different recovery actions (retry/backoff for network errors, abort/raise for parsing or logic errors). Update the handlers around the Kyobo fetch code paths (the blocks that currently print the Kyobo fetch error) to re-raise non-recoverable exceptions after logging and implement a simple retry loop only for RequestException with a limited retry count. Ensure you reference and modify the existing except blocks that contain the "Kyobo fetch error" print so the change is localized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/scripts/generate_book_mapping.py`:
- Line 33: The single-line if statements (e.g., "if k: api_keys.append(k)" and
the similar one later) trigger Ruff E701; replace each one-line form with a
standard multi-line if block: make the condition and the body on separate lines
and use indentation so the append call is on the next line under the if; update
both occurrences (the one using variable k and api_keys and the other similar
instance around line 71) to the expanded form to satisfy the linter.
- Around line 64-65: urlopen 호출에 timeout이 없어 외부 요청이 무기한 블록될 수 있으니,
generate_book_mapping.py 안의 두 곳(현재 req 변수로 호출하는 urllib.request.urlopen(req)
위치들)을 urllib.request.urlopen(req, timeout=...)로 변경해 적절한 타임아웃(예: 5-15초) 값을 전달하고,
호출부(예: 해당 함수들)에서 socket.timeout/urllib.error.URLError를 잡아 로깅하고 실패한 항목을 건너뛰거나 재시도
로직으로 처리하도록 예외 처리를 추가하세요; 두 urlopen 호출을 모두 동일하게 수정해 일관된 동작을 보장하십시오.
- Around line 192-193: The code currently writes mapping directly to
MAPPING_FILE (json.dump(mapping, ...)) which can corrupt the file on
interruption; modify the write to use an atomic replace: write the JSON to a
temporary file in the same directory (e.g., via
tempfile.NamedTemporaryFile(delete=False) or similar), flush and fsync the temp
file, close it, then atomically replace MAPPING_FILE with os.replace(temp_path,
MAPPING_FILE); ensure you remove the temp file on error; apply this change
around the block that opens MAPPING_FILE and serializes "mapping".
- Around line 24-29: 현재 env file 파싱 블록(open(env_path, ...) for line in f)이 부분
문자열로 주석 처리된 라인(# GEMINI_API_KEY=...)까지 잡고 따옴표나 인라인 주석을 제거하지 않으므로 잘못된 키가 들어갑니다;
수정 방법은 라인을 처리할 때 먼저 라인 앞뒤 공백을 strip하고 주석('#')로 시작하는 라인은 무시하며 "GEMINI_API_KEY="을
찾을 때는 라인이 주석이 아닌 경우에만 처리하고, 값을 뽑은 뒤 inline comment( '#' 이후)를 제거하고 앞뒤 따옴표(' 또는
")를 제거한 다음 빈 값이 아니면 api_keys 리스트에 추가하도록 바꿔라(참조: env_path file read loop, 변수 key,
리스트 api_keys).
In `@backend/scripts/generate_sql_updates.py`:
- Around line 34-37: Remove the unnecessary escaped variables kr_title_esc,
thumbnail_esc, and link_esc from generate_sql_updates.py since the code uses
json.dumps and jsonb_payload_esc to handle escaping; keep title_to_match_esc if
it is actually used elsewhere. Search for the declarations of kr_title_esc,
thumbnail_esc, and link_esc and delete those lines plus any immediate unused
references, leaving the jsonb_payload_esc path intact.
---
Nitpick comments:
In `@backend/app/services/llm.py`:
- Around line 24-28: The current parsing loop that builds matches and assigns
key is not stripping surrounding quotes or removing inline comments; update the
regex or post-process the capture in the loop (the matches variable and the loop
that sets key) so the extracted value excludes trailing inline comments and
surrounding single/double quotes. For example, either change the pattern to
capture quoted or unquoted values up to a comment (e.g., use groups for
"([^"]*)" or '([^']*)' or ([^#\n]+)) and pick the non-empty group, or after
getting m run m = re.split(r'\s+#', m, 1)[0].strip().strip('\'"') before
appending to keys so values like GEMINI_API_KEY="abc" and GEMINI_API_KEY=abc #
comment yield clean abc.
In `@backend/scripts/generate_book_mapping.py`:
- Around line 66-68: Replace broad except Exception blocks that print "Kyobo
fetch error: {e}" (and the similar handlers at the other noted sites) with
targeted exception handling: catch network-related errors (e.g.,
requests.exceptions.RequestException) separately from parsing/format errors
(e.g., json.JSONDecodeError, ValueError) and unexpected attribute/type errors
(AttributeError, TypeError), log the exception with context, and apply different
recovery actions (retry/backoff for network errors, abort/raise for parsing or
logic errors). Update the handlers around the Kyobo fetch code paths (the blocks
that currently print the Kyobo fetch error) to re-raise non-recoverable
exceptions after logging and implement a simple retry loop only for
RequestException with a limited retry count. Ensure you reference and modify the
existing except blocks that contain the "Kyobo fetch error" print so the change
is localized.
In `@backend/scripts/generate_sql_updates.py`:
- Around line 18-24: The code currently trims the extension with
original_file[:-4], which wrongly assumes a 3-char extension like ".txt";
replace this with os.path.splitext(original_file)[0] to safely get the filename
without extension (ensure you import os), handle the case original_file == ""
gracefully (e.g., default to ""), and then continue to split name_without_ext
with rsplit(" by ", 1) to derive title as done now; update occurrences of
original_file, name_without_ext, parts, and title accordingly.
In `@backend/scripts/update_db_metadata.py`:
- Line 20: The function update_database() is declared async but never uses await
and the Supabase client is synchronous; remove the unnecessary async keyword and
make update_database a regular function, update any places that call it to stop
awaiting it (remove await or adjust to call synchronously), and run tests to
ensure no callers expect a coroutine; reference the update_database() definition
when making the change.
- Line 68: The slice original_file[:-4] assumes a 4-character extension; replace
it with Path(original_file).stem to safely get the filename without extension.
Update the code in update_db_metadata.py where name_without_ext is assigned
(currently name_without_ext = original_file[:-4]) to use Path(...).stem, and add
"from pathlib import Path" at the top if not already present so other code using
name_without_ext continues to work with the correct base name.
- Around line 121-133: The current concurrent update loop uses
ThreadPoolExecutor(max_workers=50) and lacks retry logic, which risks hitting
Supabase rate limits and losing updates; reduce concurrency (e.g., to 5-10) by
changing ThreadPoolExecutor(max_workers=50) and add retry-with-backoff inside
update_doc (introduce parameters like max_retries and base_delay), so update_doc
catches transient errors (including HTTP 429 rate-limit responses), sleeps with
exponential backoff plus jitter between attempts, and returns success/failure
only after retries are exhausted; ensure existing counters (success_count,
error_count) and the processed reporting remain unchanged and that update_doc
logs retry attempts for observability.
In `@backend/update_metadata.sql`:
- Around line 1-9: Wrap the UPDATE statements in an explicit transaction to
ensure atomicity: start a transaction (e.g. BEGIN) before the CREATE INDEX /
UPDATE documents statement block and finish with a COMMIT (or ROLLBACK on error)
so either all metadata updates (the UPDATE documents that sets
kr_title/thumbnail/link) succeed together or none do; locate the block around
CREATE INDEX IF NOT EXISTS idx_documents_book_title and the UPDATE documents ...
WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Budget of
Paradoxes Volume I' and add the transaction boundaries accordingly.
In `@frontend/components/chat/MessageList.tsx`:
- Line 77: Multiple instances use the hardcoded dummy URL string when computing
coverUrl (e.g., the expression meta.book_info?.cover_url !==
"https://image.aladin.co.kr/product/dummy"), so extract that literal into a
shared constant (e.g., DUMMY_COVER_URL) and replace all occurrences (including
where coverUrl is computed and the other similar spots) to improve
maintainability; update relevant imports/exports if needed and ensure
comparisons use the new DUMMY_COVER_URL constant instead of the inline string.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
backend/app/core/config.pybackend/app/services/llm.pybackend/data/books_mapping.jsonbackend/scripts/generate_book_mapping.pybackend/scripts/generate_sql_updates.pybackend/scripts/update_db_metadata.pybackend/update_metadata.sqlfrontend/components/chat/MessageList.tsxfrontend/types/chat.ts
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
backend/scripts/generate_book_mapping.py (1)
24-37:⚠️ Potential issue | 🟠 Major
.env가 있어도 키 파싱 실패 시 환경변수 fallback이 동작하지 않습니다.Line 24~37 구조에서는
.env파일이 존재하기만 하면os.getenv("GEMINI_API_KEY")fallback을 건너뜁니다. 결과적으로 키가 비어 LLM 경로를 타지 못할 수 있습니다.🔧 제안 수정안
if env_path.exists(): with open(env_path, 'r', encoding='utf-8') as f: for line in f: m = re.match(r'^\s*GEMINI_API_KEY\s*=\s*(.+?)\s*(?:#.*)?$', line) if not m: continue key = m.group(1).strip().strip('"').strip("'") if key and key not in api_keys: api_keys.append(key) -else: - # Fallback to os.environ if .env missing - k = os.getenv("GEMINI_API_KEY") - if k: - api_keys.append(k) + +# Fallback to os.environ when parsing produced no key +if not api_keys: + k = os.getenv("GEMINI_API_KEY") + if k and k not in api_keys: + api_keys.append(k)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/scripts/generate_book_mapping.py` around lines 24 - 37, The current logic in the env parsing block (around env_path.exists(), the loop that populates api_keys, and the GEMINI_API_KEY fallback using os.getenv) skips the environment-variable fallback whenever the .env file exists even if parsing produced no keys; change the flow so that after processing the .env file you check if api_keys is empty and only then call os.getenv("GEMINI_API_KEY") to append a fallback key, i.e., keep the existing parsing loop for env_path and if api_keys == [] (or len(api_keys) == 0) then read os.getenv("GEMINI_API_KEY") and append if present (referencing env_path, api_keys, and GEMINI_API_KEY).
🧹 Nitpick comments (4)
backend/scripts/generate_book_mapping.py (1)
39-41: 테스트 전용 키 회전 하드코딩은 분리하는 것이 좋습니다.Line 39~41의 순환 로직은 운영 실행 시 기본 키 우선순위를 바꿉니다. 환경변수 플래그로 opt-in 하거나 CLI 옵션으로 분리해 두는 편이 유지보수에 유리합니다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/scripts/generate_book_mapping.py` around lines 39 - 41, The hardcoded test-key rotation currently mutates api_keys (the api_keys variable) for all runs; change this into an opt-in behavior by moving the rotation logic behind a feature flag or CLI option (e.g., check an environment variable like ENABLE_TEST_KEY_ROTATION or a CLI flag parsed in the script) so the default path leaves api_keys unchanged; ensure the check encloses the existing rotation block (the api_keys = api_keys[3:] + api_keys[:3] line) and only performs rotation when the flag/arg is set, preserving current default ordering otherwise.backend/update_metadata.sql (1)
1-1: 대규모 테이블이면 인덱스 생성 전략을 분리하는 것이 안전합니다.Line 1의
CREATE INDEX는 실행 중 쓰기 블로킹을 유발할 수 있습니다. 운영 DB에서 온라인 적용이 필요하면CREATE INDEX CONCURRENTLY를 별도 단계로 분리하세요.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/update_metadata.sql` at line 1, 현재 문장은 대규모 테이블에서 실행 시 쓰기 블로킹을 유발할 수 있는 CREATE INDEX (idx_documents_book_title on documents ...)을 즉시 생성하도록 되어 있으므로, 운영 DB에서는 별도 단계로 분리해 CREATE INDEX CONCURRENTLY를 사용하도록 변경하세요: 실행 계획은 CREATE INDEX CONCURRENTLY idx_documents_book_title ON documents ((metadata->'book_info'->>'title')); 형태로 바꿔 배포 스크립트에서 트랜잭션 바깥에서(CONCURRENTLY는 트랜잭션 내부에서 실행할 수 없음) 별도 마이그레이션/배포 단계로 적용하도록 지시합니다.backend/scripts/check_models.py (1)
10-12: 검증 실패를 종료 코드로 반환해 주세요.Line 10~12는 API 키가 없어도 성공(0)으로 종료되어 자동화에서 실패를 감지하기 어렵습니다.
🔧 제안 수정안
api_key = os.getenv("GEMINI_API_KEY") if not api_key: print("No API key found!") + raise SystemExit(1) else:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/scripts/check_models.py` around lines 10 - 12, The current check uses "if not api_key: print('No API key found!') else:" which prints an error but still exits with success; change this to terminate with a non-zero exit code so CI detects failure: import sys at the top of backend/scripts/check_models.py (if not already), replace the print-only branch to call sys.exit(1) (or raise SystemExit(1)) when api_key is falsy, and keep the successful path returning/allowing exit code 0; reference the api_key check in that file to locate where to add sys.exit(1).pr_commet.txt (1)
1-360: 리뷰 로그 산출물 파일은 저장소에서 제외하는 편이 좋습니다.
pr_commet.txt는 실행 코드나 문서 스펙이 아니라 리뷰 중간 산출물 성격이라, 유지보수 노이즈와 불필요한 정보 노출을 유발합니다. PR에서 제외(또는.gitignore처리) 권장드립니다.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pr_commet.txt` around lines 1 - 360, The review-output file pr_commet.txt should be removed from the repository and prevented from being committed again: delete pr_commet.txt from the branch (remove it from the index/history if already committed), add an entry for pr_commet.txt (or a pattern covering review artifacts) to .gitignore, and commit the removal plus .gitignore change so future review logs are not tracked; ensure any CI or deploy steps that might rely on this file are updated or validated before pushing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/services/llm.py`:
- Around line 24-31: The regex currently captures commented-out lines like "#
GEMINI_API_KEY=...", so update the parsing in the function that builds `matches`
to only match active assignments: use a multiline-anchored pattern that requires
the line not be commented, e.g. change the re.findall to something like
re.findall(r'^\s*GEMINI_API_KEY\s*=\s*(.+)', content, re.MULTILINE) (or use a
negative lookahead to exclude lines that start with '#' ) and keep the existing
inline-comment stripping and quoting logic for the captured `m` before adding to
`keys`.
In `@backend/scripts/generate_sql_updates.py`:
- Around line 12-15: The SQL generation lacks transactional boundaries; update
the sql_statements list in generate_sql_updates.py (the variable named
sql_statements) to wrap the DDL/DML lines with an explicit transaction by
inserting "BEGIN;" (or "BEGIN TRANSACTION;") as the first entry and "COMMIT;" as
the last entry so the CREATE INDEX and SET statement_timeout are applied
atomically; also make the same change where a similar sql_statements block
appears around lines 49-55 to ensure those generated statements are similarly
wrapped.
- Around line 6-8: The code uses relative paths (e.g., mapping_path variable
reading "data/books_mapping.json" and the logic that writes
"update_metadata.sql") which break when the script is run from a different CWD;
change path construction to be relative to the script file by using the script
directory (os.path.dirname(__file__)) as the base and join it with
"data/books_mapping.json" for reading and the desired output filename for
writing so both the open(...) that loads mapping_data and the file open/write
that creates update_metadata.sql use absolute paths derived from the script
location.
In `@backend/scripts/update_db_metadata.py`:
- Around line 86-90: The batch can crash when a record's metadata is None
because the code calls metadata.get(...) directly; update the loop that reads
doc['metadata'] to defensively handle None by checking if metadata is truthy
(e.g., if not metadata: continue or skip this doc) before computing db_title, so
references to metadata.get('book_info', {}).get('title', '') are only executed
when metadata is a dict (look for variables doc, metadata, db_title in
update_db_metadata.py and add a guard to skip or default for records with
metadata is None).
In `@backend/update_metadata.sql`:
- Around line 17-22: The stored JSON contains HTML-escaped ampersands ("&")
in the "link" field which breaks real query parameters; for the two UPDATE
statements that set metadata -> 'link' (the rows filtered by
metadata->'book_info'->>'title' = 'Korean Translation of A Treatise of Human
Nature' and 'Korean Translation of A Vindication of the Rights of Woman'),
unescape HTML entities before writing: replace "&" with "&" (or run an
HTML-unescape routine) when constructing the JSONB value, and prefer using
jsonb_build_object/jsonb_set or programmatic JSON generation instead of raw
string literals to ensure the link is inserted unescaped and safely escaped for
SQL.
---
Duplicate comments:
In `@backend/scripts/generate_book_mapping.py`:
- Around line 24-37: The current logic in the env parsing block (around
env_path.exists(), the loop that populates api_keys, and the GEMINI_API_KEY
fallback using os.getenv) skips the environment-variable fallback whenever the
.env file exists even if parsing produced no keys; change the flow so that after
processing the .env file you check if api_keys is empty and only then call
os.getenv("GEMINI_API_KEY") to append a fallback key, i.e., keep the existing
parsing loop for env_path and if api_keys == [] (or len(api_keys) == 0) then
read os.getenv("GEMINI_API_KEY") and append if present (referencing env_path,
api_keys, and GEMINI_API_KEY).
---
Nitpick comments:
In `@backend/scripts/check_models.py`:
- Around line 10-12: The current check uses "if not api_key: print('No API key
found!') else:" which prints an error but still exits with success; change this
to terminate with a non-zero exit code so CI detects failure: import sys at the
top of backend/scripts/check_models.py (if not already), replace the print-only
branch to call sys.exit(1) (or raise SystemExit(1)) when api_key is falsy, and
keep the successful path returning/allowing exit code 0; reference the api_key
check in that file to locate where to add sys.exit(1).
In `@backend/scripts/generate_book_mapping.py`:
- Around line 39-41: The hardcoded test-key rotation currently mutates api_keys
(the api_keys variable) for all runs; change this into an opt-in behavior by
moving the rotation logic behind a feature flag or CLI option (e.g., check an
environment variable like ENABLE_TEST_KEY_ROTATION or a CLI flag parsed in the
script) so the default path leaves api_keys unchanged; ensure the check encloses
the existing rotation block (the api_keys = api_keys[3:] + api_keys[:3] line)
and only performs rotation when the flag/arg is set, preserving current default
ordering otherwise.
In `@backend/update_metadata.sql`:
- Line 1: 현재 문장은 대규모 테이블에서 실행 시 쓰기 블로킹을 유발할 수 있는 CREATE INDEX
(idx_documents_book_title on documents ...)을 즉시 생성하도록 되어 있으므로, 운영 DB에서는 별도 단계로
분리해 CREATE INDEX CONCURRENTLY를 사용하도록 변경하세요: 실행 계획은 CREATE INDEX CONCURRENTLY
idx_documents_book_title ON documents ((metadata->'book_info'->>'title')); 형태로
바꿔 배포 스크립트에서 트랜잭션 바깥에서(CONCURRENTLY는 트랜잭션 내부에서 실행할 수 없음) 별도 마이그레이션/배포 단계로 적용하도록
지시합니다.
In `@pr_commet.txt`:
- Around line 1-360: The review-output file pr_commet.txt should be removed from
the repository and prevented from being committed again: delete pr_commet.txt
from the branch (remove it from the index/history if already committed), add an
entry for pr_commet.txt (or a pattern covering review artifacts) to .gitignore,
and commit the removal plus .gitignore change so future review logs are not
tracked; ensure any CI or deploy steps that might rely on this file are updated
or validated before pushing.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
backend/app/services/llm.pybackend/scripts/check_models.pybackend/scripts/generate_book_mapping.pybackend/scripts/generate_sql_updates.pybackend/scripts/update_db_metadata.pybackend/update_metadata.sqlfrontend/components/chat/MessageList.tsxpr_commet.txt
| metadata = doc['metadata'] | ||
|
|
||
| # The DB stores the title we want to match inside metadata->book_info->title | ||
| db_title = metadata.get('book_info', {}).get('title', '') | ||
|
|
There was a problem hiding this comment.
metadata가 null인 레코드에서 배치가 중단될 수 있습니다.
Line 89에서 metadata.get(...)을 바로 호출하므로 metadata가 None이면 예외로 전체 업데이트가 중단됩니다. 레코드 단위로 방어 처리해 건너뛰는 것이 안전합니다.
🔧 제안 수정안
for doc in all_docs:
doc_id = doc['id']
- metadata = doc['metadata']
+ metadata = doc.get('metadata') or {}
+ if not isinstance(metadata, dict):
+ print(f"Skipping {doc_id}: metadata is not an object")
+ continue
# The DB stores the title we want to match inside metadata->book_info->title
db_title = metadata.get('book_info', {}).get('title', '')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| metadata = doc['metadata'] | |
| # The DB stores the title we want to match inside metadata->book_info->title | |
| db_title = metadata.get('book_info', {}).get('title', '') | |
| metadata = doc.get('metadata') or {} | |
| if not isinstance(metadata, dict): | |
| print(f"Skipping {doc_id}: metadata is not an object") | |
| continue | |
| # The DB stores the title we want to match inside metadata->book_info->title | |
| db_title = metadata.get('book_info', {}).get('title', '') | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/scripts/update_db_metadata.py` around lines 86 - 90, The batch can
crash when a record's metadata is None because the code calls metadata.get(...)
directly; update the loop that reads doc['metadata'] to defensively handle None
by checking if metadata is truthy (e.g., if not metadata: continue or skip this
doc) before computing db_title, so references to metadata.get('book_info',
{}).get('title', '') are only executed when metadata is a dict (look for
variables doc, metadata, db_title in update_db_metadata.py and add a guard to
skip or default for records with metadata is None).
| SET metadata = metadata || '{"kr_title": "인간 본성론", "thumbnail": "https://image.aladin.co.kr/product/435/90/coversum/8949705206_1.jpg", "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=4359030&partner=openAPI&start=api"}'::jsonb | ||
| WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Treatise of Human Nature'; | ||
|
|
||
| UPDATE documents | ||
| SET metadata = metadata || '{"kr_title": "여권 옹호", "thumbnail": "https://image.aladin.co.kr/product/4569/0/coversum/8994054596_1.jpg", "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=45690064&partner=openAPI&start=api"}'::jsonb | ||
| WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Vindication of the Rights of Woman'; |
There was a problem hiding this comment.
URL에 HTML 엔티티(&)가 저장되어 링크가 깨집니다.
Line 17, Line 21의 link처럼 &가 들어가면 브라우저에서 실제 쿼리 파라미터가 잘못 해석됩니다. 생성 단계에서 HTML unescape 후 SQL을 재생성하는 것이 안전합니다.
🔧 예시(한 건)
- "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=4359030&partner=openAPI&start=api"
+ "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=4359030&partner=openAPI&start=api"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SET metadata = metadata || '{"kr_title": "인간 본성론", "thumbnail": "https://image.aladin.co.kr/product/435/90/coversum/8949705206_1.jpg", "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=4359030&partner=openAPI&start=api"}'::jsonb | |
| WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Treatise of Human Nature'; | |
| UPDATE documents | |
| SET metadata = metadata || '{"kr_title": "여권 옹호", "thumbnail": "https://image.aladin.co.kr/product/4569/0/coversum/8994054596_1.jpg", "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=45690064&partner=openAPI&start=api"}'::jsonb | |
| WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Vindication of the Rights of Woman'; | |
| SET metadata = metadata || '{"kr_title": "인간 본성론", "thumbnail": "https://image.aladin.co.kr/product/435/90/coversum/8949705206_1.jpg", "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=4359030&partner=openAPI&start=api"}'::jsonb | |
| WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Treatise of Human Nature'; | |
| UPDATE documents | |
| SET metadata = metadata || '{"kr_title": "여권 옹호", "thumbnail": "https://image.aladin.co.kr/product/4569/0/coversum/8994054596_1.jpg", "link": "https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=45690064&partner=openAPI&start=api"}'::jsonb | |
| WHERE metadata->'book_info'->>'title' = 'Korean Translation of A Vindication of the Rights of Woman'; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/update_metadata.sql` around lines 17 - 22, The stored JSON contains
HTML-escaped ampersands ("&") in the "link" field which breaks real query
parameters; for the two UPDATE statements that set metadata -> 'link' (the rows
filtered by metadata->'book_info'->>'title' = 'Korean Translation of A Treatise
of Human Nature' and 'Korean Translation of A Vindication of the Rights of
Woman'), unescape HTML entities before writing: replace "&" with "&" (or run
an HTML-unescape routine) when constructing the JSONB value, and prefer using
jsonb_build_object/jsonb_set or programmatic JSON generation instead of raw
string literals to ensure the link is inserted unescaped and safely escaped for
SQL.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/services/llm.py`:
- Around line 15-16: The docstring for get_all_gemini_keys() incorrectly states
it reads "commented" GEMINI_API_KEYs while the current regex in that function
(the pattern used to parse .env lines) does not match commented lines; either
update the docstring to say it only reads active (uncommented) keys, or modify
the parsing regex logic to also capture commented lines (and strip comment
markers) if you intend to support commented keys—locate get_all_gemini_keys and
the parsing block (around the existing regex and loop that processes lines
24-28) and apply the chosen change so docstring and implementation match.
In `@backend/scripts/check_models.py`:
- Around line 11-23: The top-level script currently runs API-key checks and
exits during import (api_key, genai.configure, genai.list_models), which can
break imports/tests; refactor by moving the API-key lookup, genai.configure
call, and the model-listing logic into a main() function (e.g., def main(): ...)
and protect execution with if __name__ == "__main__": main() so importing the
module no longer causes sys.exit or prints to stdout, while keeping
genai.configure and the try/except around genai.list_models inside main().
In `@backend/scripts/generate_book_mapping.py`:
- Around line 191-192: The current comprehension building existing_files uses
item["original_file"] which can raise KeyError for malformed entries in mapping;
update the logic in the existing_files creation to safely access the key (e.g.,
filter mapping items that contain a non-null "original_file" via
item.get("original_file") or an "original_file" in item check) so malformed
entries are skipped and files_to_process = [f for f in txt_files if f.name not
in existing_files] continues without crashing; reference the existing_files
variable, the mapping iterable, item["original_file"] access, and
files_to_process/txt_files when making the change.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.gitignorebackend/app/services/llm.pybackend/scripts/check_models.pybackend/scripts/generate_book_mapping.pybackend/scripts/generate_sql_updates.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/services/llm.py (1)
15-40:.env키 파싱 로직을 공통 유틸로 모으는 것을 권장합니다.Line 15-40의 파싱 로직이
backend/scripts/generate_book_mapping.py에도 거의 동일하게 존재합니다. 파싱 규칙(주석/인라인 코멘트/인용부호 처리) 변경 시 한쪽만 수정될 위험이 있어, 공통 함수로 분리하면 유지보수성이 좋아집니다.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm.py` around lines 15 - 40, The GEMINI .env parsing logic duplicated in get_all_gemini_keys and in backend/scripts/generate_book_mapping.py should be refactored into a single shared utility (e.g., parse_env_api_keys or parse_env_key_assignments) that encapsulates the regex, inline comment/quote stripping, and the behavior of including settings.GEMINI_API_KEY; replace the body of get_all_gemini_keys to call that shared utility and update the generate_book_mapping code to call it as well so both use the same implementation and preserve existing behavior (reading root .env, deduping keys, and prepending settings.GEMINI_API_KEY when present).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/scripts/generate_book_mapping.py`:
- Around line 131-133: The Aladin API request URL in generate_book_mapping.py is
using plain HTTP and exposes ALADIN_API_KEY; update the URL construction for the
variable url to use HTTPS (change "http://" to "https://") so the key and query
are sent securely, and optionally refactor the request-building to use
urllib.parse.urlencode with a params dict to safely encode query parameters
(e.g., reference the url variable and the query construction around f"{title}
{author}" to locate the change).
---
Nitpick comments:
In `@backend/app/services/llm.py`:
- Around line 15-40: The GEMINI .env parsing logic duplicated in
get_all_gemini_keys and in backend/scripts/generate_book_mapping.py should be
refactored into a single shared utility (e.g., parse_env_api_keys or
parse_env_key_assignments) that encapsulates the regex, inline comment/quote
stripping, and the behavior of including settings.GEMINI_API_KEY; replace the
body of get_all_gemini_keys to call that shared utility and update the
generate_book_mapping code to call it as well so both use the same
implementation and preserve existing behavior (reading root .env, deduping keys,
and prepending settings.GEMINI_API_KEY when present).
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/app/services/llm.pybackend/scripts/check_models.pybackend/scripts/generate_book_mapping.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/scripts/generate_book_mapping.py`:
- Around line 100-107: The except block handling LLM translation errors stops
the key-rotation loop prematurely because it uses break; when non-quota/auth
errors occur the code falls back to Kyobo instead of trying remaining keys.
Change the control flow so that on non-quota exceptions you skip the current
attempt but continue trying other keys (use continue/advance logic instead of
break), and only fall back to Kyobo when current_key_idx has advanced past all
entries in api_keys (check current_key_idx >= len(api_keys) before triggering
the fallback for file_name). Ensure references to current_key_idx and api_keys
are updated consistently in generate_book_mapping.py so key rotation tries all
keys before fallback.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/app/core/env_utils.pybackend/app/services/llm.pybackend/scripts/generate_book_mapping.py
Summary by CodeRabbit