Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New govcan_dl_resources() methods #10

Merged
merged 11 commits into from
Nov 30, 2020
Merged

New govcan_dl_resources() methods #10

merged 11 commits into from
Nov 30, 2020

Conversation

KevCaz
Copy link
Contributor

@KevCaz KevCaz commented Oct 17, 2020

Hi @VLucet ,

I took some time to rewrite govcan_dl_resources(), in a nutshell :

  • there are now methods for ckan_package_stack, ckan_package, ckan_resource, ckan_resource_stack and character.
  • there are two arguments to readily filter files to be downloaded.
  • there is an argument (ìd_as_filename) to use resource id as file name, this was needed as sometimes different resources have the same names (...:):).
  • there is a clear message for every attempt to download a file, a checkbox is appended if the file is successfully downloaded otherwise there is a warning message to explain why the attempt was skipped.
  • a data frame (that also has the class tibble) is returned to identify which files were downloaded (see example below in details) and to gather all URLs.
  • this function has been tested.
  • the CI environment for Linux has been updated and Travis CI has been dropped out (and the corresponding badge removed)

Note that documents such as html or wms are not downloaded (and for the moment I honestly think it is for the best). Also, because of the difficulties with "session" in ckan_fetch(), I think it makes more sense to just store files locally, so I always use store = "disk" but this cam always be changed in a future release.

Let me know what do you think but I think with a little more work on this, you will be closed to a nice first release to the CRAN.
I can (of course) give you a hand for the doc and to add more tests.

Example with landuse as key word.
R> govcan_dl_resources(govcan_search("landuse"), path = "tmp")                          
ℹ Searching the Open Portal for records matching: landuseCKAN query: 79 records found for keywords: landuse79 matching records were found, 10 records were returned
Searching for dataset with id: 85b6ef22-d013-52e4-87fd-26bb57899499Record found: "Ottawa and Toronto"Download the English JPG through HTTP (jpg) ⚠ skipped (already downloaded).Download the English PDF through HTTP (pdf) ⚠ skipped (already downloaded).Download the French JPG through HTTP (jpg) ⚠ skipped (already downloaded).Download the French PDF through HTTP (pdf) ⚠ skipped (already downloaded).
Searching for dataset with id: 4934cd4e-088f-51ce-84bc-0dba8551248aRecord found: "Quebec City and Montreal"Download the English JPG through HTTP (jpg) ✔
ℹ Download the English PDF through HTTP (pdf) ✔
ℹ Download the French JPG through HTTP (jpg) ⚠ skipped (already downloaded).Download the French PDF through HTTP (pdf) ⚠ skipped (already downloaded).
Searching for dataset with id: 43675fb6-6510-4d90-80d0-71f7e96b9604Record found: "Annual Decay Rates - Prince Edward Island"Annual Decay Rates - Prince Edward Island (csv) ⚠ skipped (not supported).Annual Decay Rates - Prince Edward Island - Data Dictionary (csv) ⚠ skipped (not supported).
Searching for dataset with id: 60260b59-b81d-47b0-bb80-5ca9d6b5131fRecord found: "Land-use Framework Planning Regions"Land-use Framework Planning Regions (esri rest) ⚠ skipped (not supported).Land-use Framework Planning Regions (esri rest) ⚠ skipped (not supported).Alberta Geoportal (html) ⚠ skipped (not supported).
Searching for dataset with id: 2012a482-fd0d-47c3-ba33-35bbb33201dcRecord found: "2M Base Map plus Land-use Framework Planning Regions, Treaty Boundary - Provincial Base Map Series"Alberta Geoportal (html) ⚠ skipped (not supported).2MLUFRegTreatyBdy.zip (other) ✔
Searching for dataset with id: cf7f0363-b899-4d8d-a973-f5b66f4e1fe8Record found: "2M Base Map plus Land-use Framework Planning Regions, Municipalities, Green/White Area - Provincial Base Map Series"Alberta Geoportal (html) ⚠ skipped (not supported).2MLUFRegMunicGreenArea.zip (other) ✔
Searching for dataset with id: 9d672147-0584-43b1-b30d-f60db1762a67Record found: "750K Base Map plus Land-use Framework Regions / Green and White Areas - Provincial Base Map Series"Alberta Geoportal (html) ⚠ skipped (not supported).750kLUFRegGreenArea.zip (other) ✔
Searching for dataset with id: 39c973f4-808f-4fa5-b555-4a97ce039050Record found: "2M Base Map plus Land-use Framework Planning Regions, Green/White - Provincial Base Map Series"Alberta Geoportal (html) ⚠ skipped (not supported).2MLUFRegGreenArea.zip (other) ✔
Searching for dataset with id: b7ca71fa-6265-46e7-a73c-344ded9212b0Record found: "Legal Planning Objectives - Current - Point"KML Network Link (kml) ✔
ℹ Legal Planning Objectives - Current - Point (wms) ⚠ skipped (not supported).Legal Planning Objectives - Current - Point (wms) ⚠ skipped (not supported).Data Dictionaries for Strategic Land and Resource Plans (other) ⚠ skipped (ftp not supported yet).BC Geographic Warehouse Custom Download (other) ⚠ skipped (not supported).British Columbia Geoportal (html) ⚠ skipped (not supported).
Searching for dataset with id: 5d859a89-f173-4006-82f9-16254de2c1fcRecord found: "Non Legal Planning Features - Current - Polygon"KML Network Link (kml) ✔
ℹ Non Legal Planning Features - Current - Polygon (wms) ⚠ skipped (not supported).Non Legal Planning Features - Current - Polygon (wms) ⚠ skipped (not supported).Data Dictionaries for Strategic Land and Resource Plans (other) ⚠ skipped (ftp not supported yet).BC Geographic Warehouse Custom Download (other) ⚠ skipped (not supported).British Columbia Geoportal (html) ⚠ skipped (not supported).
# A tibble: 33 x 7
   id             package_id         url                        path   fmt   store data 
   <chr>          <chr>              <chr>                      <chr>  <chr> <chr> <lgl>
 1 11fb314a-8d5985b6ef22-d013-52ehttp://ftp.geogratis.gc.ctmp/1jpg   disk  NA   
 2 5de82de5-649085b6ef22-d013-52ehttp://ftp.geogratis.gc.ctmp/1pdf   disk  NA   
 3 e92d5f8f-b7bc85b6ef22-d013-52ehttp://ftp.geogratis.gc.ctmp/1jpg   disk  NA   
 4 40dbfff0-ca1985b6ef22-d013-52ehttp://ftp.geogratis.gc.ctmp/1pdf   disk  NA   
 5 a24951eb-bb564934cd4e-088f-51chttp://ftp.geogratis.gc.ctmp/1jpg   disk  NA   
 6 7ea56d7f-49964934cd4e-088f-51chttp://ftp.geogratis.gc.ctmp/1pdf   disk  NA   
 7 4f6cfb27-a5ca4934cd4e-088f-51chttp://ftp.geogratis.gc.ctmp/1jpg   disk  NA   
 8 abb1013b-ee084934cd4e-088f-51chttp://ftp.geogratis.gc.ctmp/1pdf   disk  NA   
 9 0c1cc3d1-29c843675fb6-6510-4d9https://124gc.sharepoint.NA     csv   NA    NA   
10 689c67bf-e3dc43675fb6-6510-4d9https://124gc.sharepoint.NA     csv   NA    NA   
# … with 23 more rows

@KevCaz
Copy link
Contributor Author

KevCaz commented Oct 17, 2020

Some issue on window for a specific test, I'm investigating (kind of harder without a Windows machine though),

@KevCaz
Copy link
Contributor Author

KevCaz commented Oct 17, 2020

I had a lot of trouble with windows, turn out part of the problem was with the virtual environment.... see actions/runner-images#712, the second solution solves this!

@campersau
Copy link

@KevCaz To avoid all these set-env warnings you can use the new syntax echo "TEMP=$env:USERPROFILE\AppData\Local\Temp" >> $env:GITHUB_ENV

@KevCaz
Copy link
Contributor Author

KevCaz commented Oct 21, 2020

Yep, that's what I ended up doing!

@VLucet
Copy link
Owner

VLucet commented Nov 26, 2020

This looks absolutely awesome. Will review this on Monday.

@@ -20,7 +20,7 @@ print.ckan_package_stack <- function(x, ...) {
} else {
cat(" Packages: \n")
cli::cat_line()
purrr::map(x[1:dim(x)], print_ckan_package_custom)
purrr::map(x[seq_len(dim(x))], print_ckan_package_custom)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I always forget this is best practice to use seq_len

Copy link
Owner

@VLucet VLucet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thank you for the thorough testing and the much cleaner methods. I just asked for a clarification on a line.

R/govcan_dl_resources.R Show resolved Hide resolved
@VLucet
Copy link
Owner

VLucet commented Nov 30, 2020

Also, I agree about what you said for downloads and html files.

@VLucet VLucet merged commit e2d84d1 into VLucet:master Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants