Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: rmangal #332

Closed
11 of 25 tasks
SteveViss opened this issue Jul 30, 2019 · 24 comments
Closed
11 of 25 tasks

Submission: rmangal #332

SteveViss opened this issue Jul 30, 2019 · 24 comments

Comments

@SteveViss
Copy link
Member

SteveViss commented Jul 30, 2019

Submitting Author: Steve Vissault (@SteveViss)
Repository: https://github.com/mangal-wg/rmangal
Version submitted: 1.9.1
Editor: @noamross
Reviewer 1: @thomasp85
Reviewer 2: @arw36
Archive: TBD
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Package: rmangal
Type: Package
Title: Mangal Client
Version: 1.9.1
Authors@R: c(
    person(given = "Steve", family = "Vissault", comment = c(ORCID = "0000-0002-0866-4376"),
      email = "[email protected]", role = c("aut", "cre")),
    person(given = "Kevin", family = "Cazelles", comment = c(ORCID = "0000-0001-6619-9874"),
        email = "[email protected]", role = c("aut","ctb")),
    person(given = "Gabriel", family = "Bergeron",
      email = "[email protected]", role = c("aut","ctb")),
    person(given = "Benjamin", family = "Mercier",
      email = "[email protected]", role = c("aut","ctb")),
    person(given = "Clément", family = "Violet",
      email = "[email protected]", role = c("aut","ctb")),
    person(given = "Dominique", family = "Gravel",
      email = "[email protected]", role = c("aut")),
    person(given = "Timothée", family = "Poisot",
      email = "[email protected]", role = c("aut"))
      )
Description: An interface to the Mangal database - a collection of ecological networks
<http://poisotlab.biol.umontreal.ca/#/>. 
This package includes functions to work with the Mangal RESTFul API methods 
(<https://mangal-wg.github.io/mangal-api/#api-documentation>).
URL: http://poisotlab.biol.umontreal.ca/#/, https://github.com/mangal-wg/rmangal
BugReports: https://github.com/mangal-wg/rmangal/issues
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Imports:
    httr (>= 1.3.1),
    jsonlite (>= 1.5),
    igraph,
    purrr,
    sf
RoxygenNote: 6.1.1
Suggests:
    knitr,
    magrittr,
    mapview,
    rmarkdown,
    taxize,
    testthat,
    tibble,
    USAboundaries
VignetteBuilder: knitr
Roxygen: list(markdown = TRUE)

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • database access
    • data munging
    • data deposition
    • reproducibility
    • geospatial data
    • text analysis
  • Explain how the and why the package falls under these categories (briefly, 1-2 sentences).

The rmangal package provides all the functions to retrieve ecological networks stored in Mangal interactions database v2. The rmangal package makes easier to integrate the data acquisition step of ecological networks into a reproducible workflow.

Please note any areas you are unsure of: reproducibility

  • Who is the target audience and what are scientific applications of this package?

This package targets mostly ecologists interested in the structure of interactions among species,
individus and populations at different scales. By giving access to a large collection of networks around the globe, Mangal opens the path to study ecological networks properties (connectance, degree, trophic structure etc.) at global scale in a more reproducible way.

The only package that could have a potential overlap is rglobi. rmangal serves a different purpose by giving access at ecological networks (already published) in a standardized matter; whether GloBI is aggregating pairwise interactions among species and do not keep track of the study context (e.g. sampling date and location). Mangal 1.0 was a data provider of the GloBi infrastructure, and we plan to do the same with Mangal 2.0.

  • Any other questions or issues we should be aware of?

No


Accepted presubmission by @noamross in #318


Technical checks

Confirm each of the following by checking the box. This package:

Publication options

Yes, we are currently writing an abstract describing this package. In this paper, we would also like to introduce the julia client in order to streamline references to Mangal clients. That being said, we are not quite sure about what does this mean on your end, we believe that this implies that the Julia client will need to be reviewed independently.

JOSS Options
  • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct


Best,
Steve Vissault
on the behalf of all authors / contributors (@KevCaz, @gabrielbouleau, @BenMerSci, @clementviolet, @tpoisot).

@noamross
Copy link
Contributor

noamross commented Aug 4, 2019

Thanks for your submission, @SteveViss and co-authors! I am now seeking reviewers.

Regarding JOSS, from our end this simply means that we pass our reviews to the JOSS editors for them to use in place of new reviews. As you have multiple clients and would like to include them in a single submission to JOSS, I suggest you submit to JOSS and request they review the Julia client, and we will pass forward the reviews for the R client. This could wait until this review is finished or you could submit pre-emptively so that the Julia review starts, as well (as you may want to address review comments jointly should they affect common/upstream logic).

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

goodpractice::gp() tests pass with flying colors.


Reviewers:
Due date:

@noamross
Copy link
Contributor

noamross commented Aug 6, 2019

Reviewers assigned:
Reviewer 1: @thomasp85
Reviewer 2: @arw36
Due date: 2019-08-26

@tpoisot
Copy link

tpoisot commented Aug 9, 2019

@noamross we will submit to JOSS asap -- the Julia code is really tight (under 700 sloc), so review should be fast.

@noamross
Copy link
Contributor

noamross commented Aug 9, 2019

Great, please tag me in the issue there to track it.

@thomasp85
Copy link

Hi All

Please find my review below. All in all a very good job with this package 🙂

best
Thomas

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

The rmangal package provides a clean and consistent API for interacting with the
Mangal database. It is in general very well done and signals high quality to the
user. Still, a few issues need to be addressed before approval but all of these
are pretty minor and can easily be handled by the development team. My specific
comments are below:

Website

The website is good, and as it is hosted under the main service it wraps it
gives credibility to the whole package. One issue that could be adressed is the
reference index where a bit of care could be used to group functions and give
each group a short description. This will help new users a lot when navigating
the website.

Dependencies

The hard dependency on sf could be lifted as it doesn't appear to be used
outside of a single functionality and sf has some very strong system
dependencies. I suggest moving sf to suggest and testing for its availability
when needed along with an error message about installing it if missing.

Check

R CMD check reports a range of examples with run time > 5sec and a total runtime
of examples of 53 sec. This is likely to be flagged by CRAN during submission
and it might as well be fixed beforehand:

✔  checking examples (53.4s)
   Examples with CPU or elapsed time > 5s
                        user system elapsed
   get_collection      4.812  0.069  22.876
   combine_mgNetworks  1.129  0.056   6.831
   search_interactions 0.718  0.010   6.440
   search_datasets     0.369  0.008   5.359

Readme

The README is a tad short. It does include installation instructions and some
very useful links, but there is no motivating text for why this package may be
of interest and what it tries to solve (besides being an API wrapper for a
service). It should at least contain descriptions of why one might want to use
the mangal database and an example of a network retrieval to show the strength
of the API

Vignette

This is a very good introduction to the package and shows of all important
features in an easy to understand way. I would personally include a short
section on visualising the networks, and maybe drag in tidygraph and ggraph
(disclaimer: Author of both) for a more familiar interface to network analysis
and visualisation than igraph (at least to the tidyverse crowd). I will not
insist on this, though as package preferences are subjective

Documentation

A few of the smaller functions lack examples, such a avail_type() and
clear_cache_rmangal(). These are both functions without arguments so it is not
clear that any example would add to the understanding, but in the interest of
completeness I'll just mention this.
Most of the documentations have very short, sub-title like descriptions, with
most of the text relegated to the Details section. This generally result in
documentation entries where one have to read quite a bit before the function is
justified and the experience could be improved by having a proper paragraph in
the Description section.
Cross referencing could be improved. All functions that work with mgNetwork or
mgNetworkCollection objects should link to or describe how one might obtain such
an object. The same is true for functions working with mgSearch* objects. The
key search_*() could also benefit from better crosslinking by using the
'@familykeyword from roxygen2. There are documentation entries for all methods to generics (pointing to the same doc). This makes the documentation index a bit messy so if this could be solved it would add to the quality. Small issues: combine_mgNetworks has a stray#'in the...` argument

Functionality

The igraph integration is nice. With a very little modification you can make it
work directly with tidygraph and ggraph.

  1. Either provide an explicit as.igraph method for mgNetwork object, which
    could simply be an alias for mg_to_igraph. Tidygraph and ggraph will look
    for such a method if no other exist and use that.
  2. Provide an as_tbl_graph method directly. This can again just be a wrapper
    around mg_to_igraph, e.g.
    as_tbl_graph.mgNetwork <- function(x, ...) {
      as_tbl_graph(mg_to_igraph(x))
    }
    The obvious benefit of 1) is that you wouldn't need to list tidygraph as a
    dependency, while conversely the benefit of 2) is that you make support
    explicit

Unit testing

The package has a very good unit test coverage on all aspects of the code. The
only area lacking is testing error handling as unit tests only tend to cover
whether functions behave as expected with proper input. Tests for whether
functions fails as expected and fails gracefully in the face of problems or bad
input would be icing on the cake.

@noamross
Copy link
Contributor

Thank you, @thomasp85! @SteveViss, I suggest waiting for @arw36's review so you can address them together.

@arw36
Copy link

arw36 commented Aug 26, 2019

Hi all,

Here is my review. Apologies coming in right at the deadline, first ROpenSci review...

Best,
Anna

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Not be minor 'utility' packages, such as 'thin' API clients.

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

rmangal provides access and navigation of the Mangal database of ecological networks. The ease of this package combined with the valuable information stored in Mangal is an exciting and timely contribution to ecological research, and really expands the scale of between-network comparisons possible. I am particularly impressed with the integration of various taxonomic authorities, citation databases, and the addition of spatial information to published networks that will greatly enhance the flexibility of the collated dataset. I also really like showing the % progress of each dataset as it loads. Overall, a promising package with good coverage and usability. Here are some things that can improve the package.

Minor Comments

  • It would be easy to prepopulate standard metrics. For the network datatable this would include # of nodes and interactions, potentially a modularity factor, and the interactions database with degree and eigen.centrality.
  • Unclear what the df_net_unknown <- search_interactions(list(attr_id = 100), verbose = FALSE) example is meant to accomplish in the search_interactions help file. This returned an empty dataframe for me.
  • May want to add get_collection() to the mg_insect example for search_networks documentation
  • Suggest you add California border to the plot(networks_in_area) example
  • I would like to see more metadata on variables including definitions for attribute.id and associated columns (name, description, unit) as well as user_id.
  • For the date in the edges table, it is unclear when this is extracted as a network-level trait, is an trait specific to each edge, or is NA. For instance, for edges of network_id = 19 it appears to be extrapolated from the network trait, despite the dataset_id (1) giving a range of dates (1897-09 to 1899-07).
  • Each data table starts with an id column, it may be worth clarifying: network_id, edge_id, node_id, e.t.c. in the main table. This will allow easy linking across tables, as ids are referenced by this in other tables (e.g. the network_id column in the nodes table).
  • I suggest you make all your functions plural (currently search_reference is the only singular one.
  • I suggest you rename the author column in the reference table to be first_author
  • I was able to retrieve complete data tables for network, dataset, node, interactions using search_*(“”). This did not work for the reference table. Ease of full database download would be nice, perhaps with warnings about download times and file sizes.
  • I think there should be greater curation/spell checks done in the interaction table, method column. What does biblio mean? Standardize capitalization.
  • I find the categorization of interaction types to be non-standard. For instance, I would standardize null, NA, and none of interactions$attribute.unit to be the same (or clarify NA vs. none).
  • I think there is some confusion between edges/interactions. I wish the use of interaction was standardized in output text.
  • Currently, you can only search_taxonomy by the identified taxon rank. It would be really nice for broader rank taxon searches at the order or phylum level to retrieve all entries, even those that have a smaller scale taxonomy. For instance, as of now search_taxonomy(“Arthropoda”) returns no entries, despite there being lots of arthropods in the datasets.
  • To conform with ROpenSci packaging guidelines, you will need to add brief demonstration usage, a description of how similar packages (i.e. rglobi) relate to this one, and citation information.
  • In general, the ReadMe is short. Keep in mind this ReadMe may be first point of entry, so could be fleshed out a bit more with material you already have in vignettes/website.
  • Do you expect users to upload networks to this database like rglobi? May be worth saying something about contributing new data or identifying new papers.

Typos

  • In the get_network_by_id.R file identifiera should be identifiers.
  • In the rmangal.Rmd file. embbeded should be embedded.
  • In the “Get started with rmangal” page, section Data Structures, nodes - should be “individual”. This also comes up the search_nodes help page and rmangal.Rmd.
  • Line 197 of rmangal.Rmd should be lagoon instead of laggoon.
  • Line 379 of rmangal.Rmd should be the instead of tha
  • Line 3 of NEWS.md submited should be submitted.
  • Line 3 of search_reference.Rd. key wor should be keyword

@SteveViss
Copy link
Member Author

Thanks @arw36 & @thomasp85 for your amazing and constructive reviews. We will refer each individual comments to issues & commits (as much as possible). In that way, you will be able to easily track modifications to the code.

@noamross
Copy link
Contributor

Thank you @arw36! @SteveViss, yes, do tag and track as you incorporate your changes, but when you are done please provide a response to the reviewers here that summarizes your changes.

@SteveViss
Copy link
Member Author

We would like to thank again both reviewers for their constructive comments. Below, we first highlight the major changes we made in the package before addressing all comments individually.

NB: currently Appveyor is failing because binary for ggraph 2.0.0 are not available yet.

Mangal database

Some of the comments made by one of the reviewer suggested changes to the database. We considered and applied some changes but refrained to incorporate others that implied major changes on the data model and consequently trigger downstream changes on the API and the website. However, we kept track of these recommendations and may integrate these recommendations in future version of the infrastructure. Speaking of the future of Mangal, and to answer one comment, opening the database to publication is now our priority. We are currently thinking about a publication / submission process of new ecological networks using the mgNetwork class and a robust suite of tests to ensure data quality, and reduce curation time.

Compute apriori networks properties

For the reason explained above, we did not prepopulate metrics (such as degree, centrality, connectance etc.) within the networks datatable. This will require extensive developments (for few functionalities) on the infrastructure backend. We have to change the model definition (i.e. add one by metric virtual / auto-generated field, compute metrics sometimes based on 2 other datatables: nodes and interactions).

We are also concerned about computing too many network descriptors / properties because several are sensitive to the algorithm selected. We basically recommend the user to do his own analysis through igraph (for instance), which is well-suited for this purpose. We however decided to provide a summary method that print a very simple set of network properties for mgNetwork objects (see comment 7872b50)

Package features

summary method

As mentioned above we added a summary methods for mgNetwork objects.

mg_to_igraph becomes as.igraph

As suggested by one reviewer, instead of using mg_to_igraph() we extended the as.igraph() method for mgNetwork objects (see 1d47483). This makes the conversion to igraph and tbl_graph objects seamless.

sf dependencies

sf has been moved to Suggests. Moreover, sf features are only used in search_networks_sf() and when argument as_sf is set to TRUE, see #89.

search_references()

search_references() has been rewritten see #85;

Documentation

README

Now the README includes much more details about rmangal to provide a better description of the goals of rmangal and to conform with ROpenSci packaging guidelines, see commits 275f30a, 0ec0789, b1c0d83.

API

We enhanced the API documentation by providing definition and type of all fields for each datatable. We made an explicit link to this in each R functions (see @references sections, be50421).

Functions documentation

We moved some text from the details section to the description as suggested by one reviewer.

Vignette

We now provide an example of network manipulation with tidygraph and a an example of visualization with ggraph. See 46e4a63. We also expand on how to use sf with rmangal (see [#89])

search_references() has been rewritten [#85];
as_sf is set to TRUE ;

Website

The website has been rebuilt and it is now deployed by Travis CI. We also broke down the reference page to four categories of functions (https://mangal-wg.github.io/rmangal/reference/index.html).

Tests

New features have all been covered in the unit testing and we added a little bit of icing on the cake!

Review by @thomasp85

Website

One issue that could be adressed is the
reference index where a bit of care could be used to group functions and give
each group a short description. This will help new users a lot when navigating
the website.

This has been done in https://mangal.io/doc/r/reference/index.html

Dependencies

The hard dependency on sf could be lifted as it doesn't appear to be used
outside of a single functionality and sf has some very strong system
dependencies. I suggest moving sf to suggest and testing for its availability
when needed along with an error message about installing it if missing.

We have considered the recommandations (see this issue, and 0264bec).

Check

R CMD check reports a range of examples with run time > 5sec and a total runtime
of examples of 53 sec. This is likely to be flagged by CRAN during submission
and it might as well be fixed beforehand:

checking examples (53.4s)
Examples with CPU or elapsed time > 5s
user system elapsed
get_collection 4.812 0.069 22.876
combine_mgNetworks 1.129 0.056 6.831
search_interactions 0.718 0.010 6.440
search_datasets 0.369 0.008 5.359

We have now address this issue in commits f31417c, 621dcb8, ebc4cf0, 75dee5a by retrieving smaller networks and removing some examples. In improving examples, I noticed that examples are likely to have inconsistent timelapse because API responses are bandwidth dependent. If some issues are flagged by CRAN, we will use \donttest.

Readme

The README is a tad short. It does include installation instructions and some
very useful links, but there is no motivating text for why this package may be
of interest and what it tries to solve (besides being an API wrapper for a
service). It should at least contain descriptions of why one might want to use
the mangal database and an example of a network retrieval to show the strength
of the API

This comment has been addressed above (see section Readme).

Vignette

This is a very good introduction to the package and shows of all important
features in an easy to understand way. I would personally include a short
section on visualising the networks, and maybe drag in tidygraph and ggraph
(disclaimer: Author of both) for a more familiar interface to network analysis
and visualisation than igraph (at least to the tidyverse crowd). I will not
insist on this, though as package preferences are subjective Documentation

See section Vignette above.

A few of the smaller functions lack examples, such a avail_type() and
clear_cache_rmangal(). These are both functions without arguments so it is not
clear that any example would add to the understanding, but in the interest of
completeness I'll just mention this.

avail_type() enumerate the list of interactions accessible from the database. clear_cache_rmangal() reset the memoise cache wrapped around get_gen() & get_singleton(). Both functions commit actions that do not required arguments. Because of that, we don't think any examples will improve functions usability.

Most of the documentations have very short, sub-title
like descriptions, with most of the text relegated to the Details section. This
generally result in documentation entries where one have to read quite a bit
before the function is justified and the experience could be improved by having
a proper paragraph in the Description section.

We have improved functions description in 4271bf8.

Cross referencing could be improved. All functions that work with mgNetwork or
mgNetworkCollection objects should link to or describe how one might obtain such
an object. The same is true for functions working with mgSearch* objects.

This has been done in 0ec0789

The
key search_*() could also benefit from better crosslinking by using the
'@family keyword from roxygen2. There are documentation entries for all methods
to generics (pointing to the same doc). This makes the documentation index a bit
messy so if this could be solved it would add to the quality.

Small issues:
combine_mgNetworks has a stray#'in the...` argument

This typo error has been fixed.

Functionality

The igraph integration is nice. With a very little modification you can make it
work directly with tidygraph and ggraph.

Either provide an explicit as.igraph method for mgNetwork object, which
could simply be an alias for mg_to_igraph. Tidygraph and ggraph will look
for such a method if no other exist and use that.
Provide an as_tbl_graph method directly. This can again just be a wrapper
around mg_to_igraph, e.g.

as_tbl_graph.mgNetwork <- function(x, ...) {
as_tbl_graph(mg_to_igraph(x))
}

The obvious benefit of 1) is that you wouldn't need to list tidygraph as a
dependency, while conversely the benefit of 2) is that you make support
explicit

This comment has been addressed (see section Package features).

Unit testing

The package has a very good unit test coverage on all aspects of the code. The
only area lacking is testing error handling as unit tests only tend to cover
whether functions behave as expected with proper input. Tests for whether
functions fails as expected and fails gracefully in the face of problems or bad
input would be icing on the cake.

Addressed in section Tests.

Review by @arw36

It would be easy to prepopulate standard metrics. For the network datatable
this would include # of nodes and interactions, potentially a modularity factor,
and the interactions database with degree and eigen.centrality.

This comment has been answered in section Compute apriori networks properties.

Unclear what the df_net_unknown <- search_interactions(list(attr_id = 100), verbose = FALSE) example is meant to accomplish in the search_interactions help
file. This returned an empty dataframe for me.

We removed this example.

May want to add get_collection() to the mg_insect example for
search_networks documentation

Done!

Suggest you add California border to the plot(networks_in_area) example

Done!

I would like to see more metadata on variables including definitions for
attribute.id and associated columns (name, description, unit) as well as
user_id.

This has been answered in Documentation section.

For the date in the edges table, it is unclear when this is extracted as a
network-level trait, is an trait specific to each edge, or is NA.

We removed the field date in dataset table to avoid redundancies. We kept the date in the network and interaction datatables. In most cases, dates documented in edges and networks should be identical. But the network / sampling location could be visited several times (e.g. research would like to cover the flowering period). In that case, we keep track of the observation period through the field date in interaction datatable: the network datatable will contain the median date of the period and the interaction datable every observation dates.

For instance,
for edges of network_id = 19 it appears to be extrapolated from the network
trait, despite the dataset_id (1) giving a range of dates (1897-09 to 1899-07).

Regarding the dataset #1, all informations (dataset, network and interactions) point to the same date: 1899-07-01. We are not able to reproduce the range of dates given by the reviewer (see code below).

library(rmangal)
ds_1 <- search_datasets(list(id = 1))
net_19 <- get_collection(search_networks(list(id = 19)))
range(c(net_19$interactions$date, ds_1$date, net_19$network$date))

Each data table starts with an id column, it may be worth clarifying:
network_id, edge_id, node_id, e.t.c. in the main table. This will allow
easy linking across tables, as ids are referenced by this in other tables (e.g.
the network_id column in the nodes table).

We applied this suggestion on mgNetwork objects (see c6cd9c0). However, the SQL side hasn't changed as it allows to make the distinction between primary key and foreign key.

I suggest you make all your functions plural (currently search_reference
is the only singular one.

search_reference is now renamed as search_references and mgSearchReference
as mgSearchReferences (see a98475b).

I suggest you rename the author column in the reference table to be
first_author

This suggestion has been taken into consideration.

I was able to retrieve complete data tables for network, dataset, node,
interactions using search_*(“”). This did not work for the reference table.
Ease of full database download would be nice, perhaps with warnings about
download times and file sizes.

search_references("") is now working (see cc297c84)

I think there should be greater curation/spell checks done in the
interaction table, method column. What does biblio mean? Standardize
capitalization.

We standardized capitalization and replaced biblio with literature.

I find the categorization of interaction types to be non-standard. For
instance, I would standardize null, NA, and none of interactions$attribute.unit
to be the same (or clarify NA vs. none).

null, NA and none are now true SQL NULL.

I think there is some confusion between edges/interactions. I wish the use of
interaction was standardized in output text.

We replaced edges for interactions (see
cd1430d) in all code.

Currently, you can only search_taxonomy by the identified taxon rank. It
would be really nice for broader rank taxon searches at the order or phylum
level to retrieve all entries, even those that have a smaller scale taxonomy.
For instance, as of now search_taxonomy(“Arthropoda”) returns no entries,
despite there being lots of arthropods in the datasets.

Addressed in section Mangal database above.

To conform with ROpenSci packaging guidelines, you will need to add brief
demonstration usage, a description of how similar packages (i.e. rglobi)
relate to this one, and citation information.

Usage, description and rglobi comparison have been added to the README except for the citation which can be found at https://mangal-wg.github.io/rmangal/authors.html or with citation('rmangal')

In general, the ReadMe is short. Keep in mind this ReadMe may be first point
of entry, so could be fleshed out a bit more with material you already have in
vignettes/website.

See section Readme above.

Do you expect users to upload networks to this database like rglobi? May
be worth saying something about contributing new data or identifying new papers.

This comment has been discussed in section Mangal database above.

@noamross
Copy link
Contributor

noamross commented Sep 5, 2019

Thank you for your comprehensive response, @SteveViss! @thomasp85 and @arw36 please review and let us know if the changes and comments satisfy your concerns.

@thomasp85
Copy link

All my concerns have been addressed, but I will note (and urge a fix is made), that testing for the existence of sf should be done like:

if (!requireNamespace("sf", quietly = TRUE)) {
  stop("sf should be installed to use this functionality", call. = FALSE) #or whatever wording you prefer
}

instead of the current call to installed.packages()

@arw36
Copy link

arw36 commented Sep 5, 2019

Thanks for all the edits, all my concerns were covered. I particularly like the new summary feature and agree that with different metric algorithms this was the most practical solution.

Just a note about my confusion of the date dataset 1. I retrieved that date range by going back to the original reference material (roberson_1929), not by using the package. As that critique is of the underlying data within Mangal, rather than the functionality of rmangal, I'm not concerned about this edit for ROpenSci publication. I would encourage somewhere, whether on the website or in the ReadMe you mention how to comment or suggest edits of data cleaning issues as these are not package bugs per se. Currently, contributions seem limited to network structure and analysis functionality. rmangal may not best place to host data cleaning concerns, alternatively an "email us" note on the Mangal database would be better. I really don't mind the issue of not including the date range from roberson_1929 in particular (though 12 years of data collection tells something very different than one day of data collection), this was just an example of a data-cleaning issue that may occur elsewhere in Mangal. There are always some human extraction errors for databases of datasets like this, so an avenue to address them would benefit both users and the authors.

@noamross
Copy link
Contributor

noamross commented Sep 8, 2019

Thank you @SteveViss et al, @thomasp85 and @arw36, for an excellent review proccess! rmangal is approved. 🎉

To-dos:

  • Add the review badge to your README:
[![](https://badges.ropensci.org/332_status.svg)](https://github.com/ropensci/software-review/issues/332)
  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do and you can add any team members you wish.
  • Update any links in badges for CI and coverage to point to the ropensci URL, and package. Also point documentation urls to https://docs.ropensci.org/{package}. Docs should be built within 24 hours after transfer.

If/when you submit to a combined paper to JOSS, make a note in the pre-review thread that the R client has already been reviewed, linking to this thread, and tag @ropensci/editors. That way JOSS editors can limit review to the Julia client.

Our developer guide has a section for ongoing maintenance after acceptance. Welcome aboard!

This client but also the combined mangal.io ecosystem would be a great topic for a post on our blog. If you are interested in writing something up please let @stefaniebutland know here. The timing is flexible can work with the other pieces of the system you are putting in place

@stefaniebutland
Copy link
Member

Congratulations on approval @SteveViss!
Given Noam's 👍🏼 for a blog post, here's some further information. Happy to answer any questions.

This tag gets you (diverse) examples of blog posts by authors of peer-reviewed packages: https://ropensci.org/tags/software-peer-review/.

Here are some technical and editorial guidelines: https://github.com/ropensci/roweb2#contributing-a-blog-post. Publication date is flexible. I like to get a draft via pull request a week before the planned publication date so I can review.

@SteveViss
Copy link
Member Author

SteveViss commented Sep 9, 2019

We're grateful to the @ropensci team for this fantastic and constructive experience. Thank you @noamross, @thomasp85 & @arw36. We are doing the final checklist.

@arw36, we fully agree with your last comment. On the short term, we are tracking all data issues in https://github.com/mangal-wg/mangal-api/issues. Github is well suited for developers but not for all network ecologists. So on the long term, we may consider opening a discourse platform which will be more inclusive and hopefully encourage users to flag potential data issues.

Hi @stefaniebutland ! We're interested in posting to the blog. Mangal.io will be presented during the BiodiversityNext poster session (end of october). If possible, we propose to submit the pull request 2 weeks before that event (same for the JOSS paper).

@stefaniebutland
Copy link
Member

Do you prefer Tues Oct 15 or Tues Oct 22 blog publication date? (I saw pre-conf and main-conf dates). Once your pull request is submitted, I can give you a link to the post for your poster :-)

We can also coordinate this if you like: If you tweet to invite folks to your poster #, rOpenSci can quote tweet with a link to your post for more info and to bring more attention to your poster.

@SteveViss
Copy link
Member Author

Tues Oct 22 for the main conf. So our deadline should be Tue Oct 15, right? Let me know if you need more time.

We can also coordinate this if you like: If you tweet to invite folks to your poster #, rOpenSci can quote tweet with a link to your post for more info and to bring more attention to your poster.

It will be great. Thank @stefaniebutland !

@stefaniebutland
Copy link
Member

So our deadline should be Tue Oct 15, right?

Correct!

@stefaniebutland
Copy link
Member

@SteveViss Checking in to confirm you'll submit a draft post on or before Tues Oct 15 for publication on 22nd. I'll have time to review it on Wed 16th.

On track?

@KevCaz
Copy link
Member

KevCaz commented Oct 9, 2019

@stefaniebutland I will be the one leading the blog post, we should be able to write the blog post by Tuesday !

@stefaniebutland
Copy link
Member

Got it @KevCaz. Note that I'll be on vacation and mostly unavailable Fri Oct 18 - Mon Oct 21.

@KevCaz
Copy link
Member

KevCaz commented Oct 9, 2019

Noted!

@SteveViss
Copy link
Member Author

Post submitted via pull request: ropensci-archive/roweb2#525

@noamross noamross closed this as completed Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants