Skip to content

Releases: J535D165/datahugger

v0.8

14 Sep 16:47
2972fca
Compare
Choose a tag to compare

What's Changed

  • Improve resolve speed and prevent hitting re3data.org servers by @J535D165 in #52
  • Add extensible support for handle systems and metadata by @J535D165 in #56
  • Auto unzip option #53 by @davetromp in #55
  • Fix datahugger errors for CrossRef DOIs by @J535D165 in #58

New Contributors

Full Changelog: v0.7...v0.8

Coverage report

The following benchmark was applied to 500 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 22.4%
Percentage of datasets not supported: 60.6%
Percentage of datasets with error: 17.0%

Table with unexpected errors

id type url service error
7 10.60516/au6956100 dois https://pid.geoscience.gov.au/sample/AU6956100 nan 'unknown'
9 10.48448/kgfs-s492 dois https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus nan 500 Server Error: Internal Server Error for url: https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus
12 10.26197/ala.f34c149a-4578-47c5-83fc-52ba63e37cad dois https://doi.ala.org.au/doi/f34c149a-4578-47c5-83fc-52ba63e37cad nan 'other'
13 10.60494/4013-gm34 dois https://pid.geoscience.gov.au/sample/AU9420041 nan 'unknown'
18 10.60516/au4510089 dois https://pid.geoscience.gov.au/sample/AU4510089 nan 'unknown'
20 10.60516/au4857683 dois https://pid.geoscience.gov.au/sample/AU4857683 nan 'unknown'
52 10.18730/v7c2= dois https://glis.fao.org/glis/doi/10.18730/V7C2= nan '10.18730/v7c2=' is not a correct resource identifier (e.g. a URL, DOI, Handle)
53 10.60494/ksh4-h631 dois https://pid.geoscience.gov.au/sample/AU8303563 nan 'unknown'
55 10.13145/bacdive5076.20230509.8 dois https://bacdive.dsmz.de/index.php?site=pdf_view&id=5076&doi=doi:10.13145/bacdive5076.20230509.8 nan 'unknown'
57 10.60516/au1215414 dois https://pid.geoscience.gov.au/sample/AU1215414 nan 'unknown'
68 10.1594/pangaea.318586 dois https://doi.pangaea.de/10.1594/PANGAEA.318586 nan 'other'
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
75 10.60516/au5331689 dois https://pid.geoscience.gov.au/sample/AU5331689 nan 'unknown' ...
Read more

v0.7

10 Sep 06:56
e0f44da
Compare
Choose a tag to compare

What's Changed

  • Enhance errors for repositories that throw 403 errors by @J535D165 in #50
  • Add support for DOIs pointing to single files by @J535D165 in #51

Full Changelog: v0.6...v0.7

Coverage report

The following benchmark was applied to 500 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 18.6%
Percentage of datasets not supported: 75.0%
Percentage of datasets with error: 6.4%

Table with unexpected errors

id type url service error
9 10.48448/kgfs-s492 dois https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus nan 500 Server Error: Internal Server Error for url: https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus
64 10.7910/dvn/ghcv1g/bbucjs dois https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/GHCV1G/BBUCJS nan Failed to parse URL 'https://dataverse.harvard.edu/loginpage.xhtml;jsessionid=de4d68eca12a3479d7a636cd6d83?redirectPage=%2Ffile.xhtml%3FpersistentId%3Ddoi%3A10.7910%2FDVN%2FGHCV1G%2FBBUCJS'
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
81 10.7916/d8-qcx3-yp94 dois https://dlc.library.columbia.edu/resolve/10.7916/d8-qcx3-yp94 nan 500 Server Error: Internal Server Error for url: https://dlc.library.columbia.edu/catalog/10.7916/d8-qcx3-yp94
96 10.17876/plate/dr.2/plates/201_33742 dois https://www.plate-archive.org/objects/dr.2/plates/201_33742 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_33742/
128 10.25560/78890 dois http://spiral.imperial.ac.uk/handle/10044/1/78890 nan 404 Client Error: for url: https://spiral.imperial.ac.uk/rest/handle/10044/1
146 10.15496/publikation-32226 dois https://publikationen.uni-tuebingen.de/xmlui/handle/10900/90845 nan 403 Client Error: Forbidden for url: https://publikationen.uni-tuebingen.de/rest/handle/10900/90845
163 10.34755/irok.2022.72.26.033 dois https://www.elibrary.ru/item.asp?id=48800309&pff=1 nan ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
170 10.25673/opendata2-168870 dois https://opendata2.uni-halle.de//handle/1516514412012/168894 nan 404 Client Error: for url: https://opendata2.uni-halle.de/rest/handle/1516514412012/168894
200 10.23725/akhp-6959 dois https://ors.datacite.org/doi:/10.23725/akhp-6959 nan HTTPSConnectionPool(host='ors.datacite.org', port=443): Max retries exceeded with url: /doi:/10.23725/akhp-6959 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f1549b33710>: Failed to resolve 'ors.datacite.org' ([Errno -2] Name or service not known)"))
202 10.48370/ofd/clo2tl/a0ndzw dois https://dataverse.openforestdata.pl/file.xhtml?persistentId=doi:10.48370/OFD/CLO2TL/A0NDZW nan list index out of range
252 10.14469/ch/129258 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/134211 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/134211 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
257 10.25673/opendata2-91140 dois https://opendata2.uni-halle.de//handle/1516514412012/91154 nan 404 Client Error: for url: https://opendata2.uni-halle.de/rest/handle/1516514412012/91154
258 10.14469/ch/41814 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/48213 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/48213 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
296 10.14457/cmu.the.2009.132 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/CMU.the.2009.132 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Read timed out. (read timeout=3)
297 10.6085/aa/lop001_026mtbd004r00_20100911.40.1 dois https://data.piscoweb.org/catalog/d1/mn/v1/object/doi:10.6085/AA/LOP001_026MTBD004R00_20100911.40.1 nan Failed to parse URL 'https://data.piscoweb.org/metacat/d1/mn/v1/object/doi:10.6085/AA/LOP001_026MTBD004R00_20100911.40.1' ...
Read more

v0.6

09 Sep 20:45
8e8dab4
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.5...v0.6

Coverage report

The following benchmark was applied to 500 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 17.2%
Percentage of datasets not supported: 71.2%
Percentage of datasets with error: 11.6%

Table with unexpected errors

id type url service error
9 10.48448/kgfs-s492 dois https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus nan 500 Server Error: Internal Server Error for url: https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus
33 10.13140/rg.2.2.34874.93125 dois http://rgdoi.net/10.13140/RG.2.2.34874.93125 nan 403 Client Error: Forbidden for url: https://www.researchgate.net/publication/355480338?channel=doi&linkId=617372013c987366c3cf6014&showFulltext=true
62 10.16902/ethz-a-000027472 dois https://biosys.e-pics.ethz.ch/catalog/ETH.BIOSYS/r/27935 nan 403 Client Error: Forbidden for url: https://biosys.e-pics.ethz.ch/index.jspx?deeplink=%7B%22RecordItemCollection-%7B506ffea7-e574-4c3d-8ccb-fbafce360b70%7D%22%3A%7B%22viewMode%22%3A+%22infoview%22%2C%22previousViewMode%22%3A%22galleryview%22%2C%22previousMultiPageViewMode%22%3A%22galleryview%22%2C%22itemStartIdx%22%3A0%2C%22itemID%22%3A%227%3A27935%22%7D%7D
64 10.7910/dvn/ghcv1g/bbucjs dois https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/GHCV1G/BBUCJS nan Failed to parse record identifier from URL 'https://dataverse.harvard.edu/loginpage.xhtml;jsessionid=bb615fb668e59ba5e1f442e6e0b0?redirectPage=%2Ffile.xhtml%3FpersistentId%3Ddoi%3A10.7910%2FDVN%2FGHCV1G%2FBBUCJS'
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
81 10.7916/d8-qcx3-yp94 dois https://dlc.library.columbia.edu/resolve/10.7916/d8-qcx3-yp94 nan 500 Server Error: Internal Server Error for url: https://dlc.library.columbia.edu/catalog/10.7916/d8-qcx3-yp94
82 10.7910/dvn/vzp5cg/y4kxrr dois https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/VZP5CG/Y4KXRR nan Failed to parse record identifier from URL 'https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/VZP5CG/Y4KXRR'
91 10.13140/rg.2.2.15104.25602 dois http://rgdoi.net/10.13140/RG.2.2.15104.25602 nan 403 Client Error: Forbidden for url: https://www.researchgate.net/publication/347486438?channel=doi&linkId=5fddbdc492851c13fe9c7051&showFulltext=true
96 10.17876/plate/dr.2/plates/201_33742 dois https://www.plate-archive.org/objects/dr.2/plates/201_33742 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_33742/
110 10.7910/dvn/6oieqe/ndjzuo dois https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/6OIEQE/NDJZUO nan Failed to parse record identifier from URL 'https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/6OIEQE/NDJZUO'
128 10.25560/78890 dois http://spiral.imperial.ac.uk/handle/10044/1/78890 nan 404 Client Error: for url: https://spiral.imperial.ac.uk/rest/handle/10044/1
146 10.15496/publikation-32226 dois https://publikationen.uni-tuebingen.de/xmlui/handle/10900/90845 nan 403 Client Error: Forbidden for url: https://publikationen.uni-tuebingen.de/rest/handle/10900/90845
149 10.7910/dvn/hzbyg7/rq26h2 dois https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/HZBYG7/RQ26H2 nan Failed to parse record identifier from URL 'https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/HZBYG7/RQ26H2'
154 10.7910/dvn/sgeesj/1sradn dois https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/SGEESJ/1SRADN nan Failed to parse record identifier from URL 'https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/SGEESJ/1SRADN' ...
Read more

v0.6a7

09 Sep 20:26
a49626c
Compare
Choose a tag to compare
v0.6a7 Pre-release
Pre-release
Improve report on benchmark datasets

v0.6a6

09 Sep 20:19
48e497c
Compare
Choose a tag to compare
v0.6a6 Pre-release
Pre-release
Bump update release workflow action

v0.6a5

09 Sep 19:44
48e497c
Compare
Choose a tag to compare
v0.6a5 Pre-release
Pre-release
Bump update release workflow action

v0.6a4

09 Sep 19:36
cf270dd
Compare
Choose a tag to compare
v0.6a4 Pre-release
Pre-release

Full Changelog: v0.6a3...v0.6a4

v0.6a3

09 Sep 19:28
Compare
Choose a tag to compare
v0.6a3 Pre-release
Pre-release

Full Changelog: v0.6a2...v0.6a3

v0.6a2

09 Sep 18:49
Compare
Choose a tag to compare
v0.6a2 Pre-release
Pre-release
Fix requirements for benchmark workflow

v0.6a1

09 Sep 18:41
dadbd30
Compare
Choose a tag to compare
v0.6a1 Pre-release
Pre-release

What's Changed

  • Datahugger supports ~20.0% of the DOIs in Datacite [v0.5] by @J535D165 in #44

Full Changelog: v0.6a0...v0.6a1