Skip to content

Commit

Permalink
GH-351: release 03 new datasets (#378)
Browse files Browse the repository at this point in the history
  • Loading branch information
rain1024 committed Dec 27, 2020
1 parent ca041f9 commit 9c313e5
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 2 deletions.
3 changes: 2 additions & 1 deletion underthesea/data_fetcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,8 @@ def download_data(data, url):
zip_datasets = [
"VNTC", "VLSP2013-WTK", "VLSP2013-POS", "VTB-CHUNK",
"VLSP2016-NER", "VLSP2018-NER", "AIVIVN2019-SA",
"VLSP2016-SA", "VLSP2018-SA", "UTS2017-BANK"
"VLSP2016-SA", "VLSP2018-SA", "UTS2017-BANK",
"DI_Vietnamese-UVD", "CP_Vietnamese-UNC", "SE_Vietnamese-UBS"
]
if data in set(zip_datasets):
if repo_data["license"] == "Close":
Expand Down
24 changes: 24 additions & 0 deletions underthesea/datasets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -94,3 +94,27 @@ VNTC:
filepath: ''
url: https://www.dropbox.com/s/4iw3xtnkd74h3pj/VNTC.zip?dl=1
url_filename: VNTC.zip?dl=1
CP_Vietnamese-UNC:
cache_dir: datasets/CP_Vietnamese-UNC
type: Plaintext
license: Open
year: 2020
filepath: ''
url: https://github.com/undertheseanlp/resources/releases/download/1.3.x/CP_Vietnamese-UNC.zip
url_filename: CP_Vietnamese-UNC.zip
DI_Vietnamese-UVD:
cache_dir: datasets/DI_Vietnamese-UVD
type: Dictionary
license: Open
year: 2020
filepath: ''
url: https://github.com/undertheseanlp/resources/releases/download/1.3.x/DI_Vietnamese-UVD.zip
url_filename: DI_Vietnamese-UVD.zip
SE_Vietnamese-UBS:
cache_dir: datasets/SE_Vietnamese-UBS
type: Dictionary
license: Open
year: 2020
filepath: ''
url: https://github.com/undertheseanlp/resources/releases/download/1.3.x/SE_Vietnamese-UBS.zip
url_filename: SE_Vietnamese-UBS.zip
3 changes: 2 additions & 1 deletion underthesea/file_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ def get_from_cache(url: str, cache_dir: Path = None) -> Path:
# make HEAD request to check ETag
response = requests.head(url)

if response.status_code != 200:
# (anhv: 27/12/2020) github release assets return 302
if response.status_code not in [200, 302]:
if "www.dropbox.com" in url:
# dropbox return code 301, so we ignore this error
pass
Expand Down

0 comments on commit 9c313e5

Please sign in to comment.