Skip to content

Commit

Permalink
[naver] EUC-KR encoding issue in old image URLs Fix
Browse files Browse the repository at this point in the history
Around October 2010, the image server URL format and file name
encoding changed from EUC-KR to UTF-8.
Modified to detect old URL format and decode image URLs into EUC-KR

- (lint with flake8) Customize conditions
  Wrap lines smaller than 79 characters

- (lint with flake8) Customize conditions (2nd try)
  - One import per line
  - Indent on consecutive lines

- (lint with flake8) Customize conditions (3rd try)
  - E128 continuation line under-indented for visual indent
  - E123 closing bracket does not match indentation of opening bracket's line

- Update naver.py
  Check encoding for all image URLs
  • Loading branch information
9CB797FF-9380-45F2-BB88-BB86CA0E32BF authored and mikf committed Mar 5, 2024
1 parent 22647c2 commit 009322a
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion gallery_dl/extractor/naver.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

from .common import GalleryExtractor, Extractor, Message
from .. import text
from urllib.parse import unquote


class NaverBase():
Expand Down Expand Up @@ -63,7 +64,13 @@ def metadata(self, page):

def images(self, page):
return [
(url.replace("://post", "://blog", 1).partition("?")[0], None)
(unquote(url, encoding="EUC-KR")
.replace("://post", "://blog", 1)
.partition("?")[0], None)
if "\ufffd" in unquote(url)
else
(url.replace("://post", "://blog", 1)
.partition("?")[0], None)
for url in text.extract_iter(page, 'data-lazy-src="', '"')
]

Expand Down

0 comments on commit 009322a

Please sign in to comment.