Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JOSS REVIEW] Suggestions for Documentation #5

Closed
xin-huang opened this issue Jan 9, 2024 · 7 comments
Closed

[JOSS REVIEW] Suggestions for Documentation #5

xin-huang opened this issue Jan 9, 2024 · 7 comments

Comments

@xin-huang
Copy link

This is a part of the JOSS review (openjournals/joss-reviews#6160)

  1. A statement of need

The opening section of the README seems inadequate for a comprehensive statement of need.

`autoStreamTree` is a Python package meant for analysing genomic variant (e.g., SNPs, microhaplotypes) differentiation in geospatial networks such as riverscapes.
An input geospatial network (as shapefile, geodatabase, or .gpkg) is first used to compute a minimally representative graph structure for all least-cost paths between your sampling locations. Pairwise genomic distances (e.g., Fst) are then 'fitted' to this graph by calculating the contribution of each graph segment, using the least-squares algorithm from Kalinowsky et al. (2008).
`autoStreamTree` automates what was previously a manual process, while also integrating new functionality such as per-locus or per-site fitted distances, which can be used for downstream outlier analysis.

I recommend that the authors enhance it by incorporating the statement of need from their paper. An introduction section specifically tailored for the statement of need in the README would be beneficial.

  1. Installation instructions

There appears to be a redundancy in the installation instructions. The content found here:

autostreamtree/README.md

Lines 53 to 60 in e86d8af

Simple installation of the development version may be done simply as:
```
git clone https://github.com/tkchafin/autostreamtree.git
cd autostreamtree
pip install .
```

is repeated later:

autostreamtree/README.md

Lines 78 to 85 in e86d8af

If you require the development branch, you can then install from GitHub as \
well:
```
git clone https://github.com/tkchafin/autostreamtree.git
cd autostreamtree
pip install .
```

I suggest removing the first instance to avoid duplication.

Regarding ARM Mac installations, I am unable to test them personally. However, I noticed the commands for mamba installation:

autostreamtree/README.md

Lines 125 to 128 in e86d8af

conda create -n streamtree
conda activate streamtree
conda config --env --set subdir osx-64
conda install python=3.10 mamba

According to the latest guidelines, installing mamba via conda is not recommended and should only be installed in the base environment.

Additionally, mamba now supports ARM64, so updating the installation instructions for mamba would be appropriate.

  1. Example usage

When attempting the example analysis with this command:

autostreamtree -s autostreamtree/data/test.shp -i autostreamtree/data/test.coords -v autostreamtree/data/test_sub100.vcf.gz -p autostreamtree/data/test.popmap -r ALL --reachid_col "HYRIV_ID" --length_col "LENGTH_KM" -o test

I encountered the following error:

Traceback (most recent call last):
  File "/home/xinhuang/software/mambaforge/envs/autostreamtree-dev/bin/autostreamtree", line 8, in <module>
    sys.exit(main())
  File "/home/xinhuang/software/mambaforge/envs/autostreamtree-dev/lib/python3.10/site-packages/autostreamtree/cli.py", line 28, in main
    np.random.seed(clock_seed)
  File "numpy/random/mtrand.pyx", line 4806, in numpy.random.mtrand.seed
  File "numpy/random/mtrand.pyx", line 250, in numpy.random.mtrand.RandomState.seed
  File "_mt19937.pyx", line 168, in numpy.random._mt19937.MT19937._legacy_seeding
  File "_mt19937.pyx", line 182, in numpy.random._mt19937.MT19937._legacy_seeding
ValueError: Seed must be between 0 and 2**32 - 1

This issue may need to be addressed for successful example replication.

  1. Functionality documentation

I was unable to locate any API documentation. It's important for the authors to add this as it is a requirement of the journal.

  1. Automated tests

Running pytest resulted in several warnings

tests/test_cli.py::test_autostreamtree_outputs
tests/test_cli.py::test_autostreamtree_output_shp
tests/test_cli.py::test_autostreamtree_outputs_gendist
tests/test_cli.py::test_autostreamtree_outputs_sdist
tests/test_functions.py::test_read_network
  /home/xinhuang/software/mambaforge/envs/autostreamtree-dev/lib/python3.10/site-packages/momepy/utils.py:252: UserWarning: Geometry is in a geographic CRS. Results from 'length' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

    gdf_network[length] = gdf_network.geometry.length

tests/test_cli.py::test_autostreamtree_outputs
tests/test_cli.py::test_autostreamtree_outputs
tests/test_cli.py::test_autostreamtree_output_shp
tests/test_cli.py::test_autostreamtree_output_shp
  /home/xinhuang/software/mambaforge/envs/autostreamtree-dev/lib/python3.10/site-packages/mantel/_test.py:217: DeprecationWarning: `np.math` is a deprecated alias for the standard library `math` module (Deprecated Numpy 1.25). Replace usages of `np.math` with `math`
    n = np.math.factorial(m)  # number of possible matrix permutations

tests/test_cli.py::test_autostreamtree_output_shp
  /home/xinhuang/software/mambaforge/envs/autostreamtree-dev/lib/python3.10/site-packages/pyogrio/raw.py:530: RuntimeWarning: Creating a 256th field, but some DBF readers might only support 255 fields
    ogr_write(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

Addressing these warnings could enhance the software's reliability.

Additionally, I recommend integrating code coverage reporting with a service like codecov. This would provide a clearer understanding of the extent to which the unit tests cover the codebase.

@xin-huang
Copy link
Author

Furthermore, considering that Apple Silicon (actions/runner-images#8439) is available in GitHub Actions now, it would be beneficial for the authors to include builds for OSX arm64 in the ci.yml. This addition would aid in verifying the effectiveness of the installation instructions for OSX arm64.

@tkchafin
Copy link
Owner

tkchafin commented Jan 9, 2024

Thanks @xin-huang for your input, I'm traveling for a conference this week but I'll get on with addressing all of these next week!

@radeva
Copy link

radeva commented Jan 18, 2024

it would be beneficial for the authors to include builds for OSX arm64 in the ci.yml.

@tkchafin Feel free to use FlyCI's M1 and M2 runners. Our runners are on average 2x faster and 2x cheaper than GitHub's AND we have a free tier for OSS projects (see below).

Install Instructions

  1. Install FlyCI app and
  2. Use a FlyCI runner in your workflow:
jobs:
 ci:
-    runs-on: macos-latest
+    runs-on: flyci-macos-large-latest-m1
   steps:
   - name: 👀 Checkout repo
     uses: actions/checkout@v4

500 mins/month Free for Public Repos

Since your repo is public, FlyCI offers 500 mins/month of free M1 runner usage with the flyci-macos-large-latest-m1 runner for public projects.

Don't hеsitate to contact us in case the free tier doesn't suit your needs or you experience any issues with the runners. Our team is here to support you!

Best Regards,
Veselina Radeva
Product Manager at FlyCI

tkchafin added a commit that referenced this issue Jan 23, 2024
tkchafin added a commit that referenced this issue Jan 23, 2024
@tkchafin
Copy link
Owner

Thanks @xin-huang, I've fixed all of these now. I sent a pull request with a fix for the warning coming from pymantel which has now been merged, so that one will clear once the conda installation has been updated to the newest release. The other warnings can be ignored (in the context of the unit tests!) so I have suppressed them.

@xin-huang
Copy link
Author

Hello @tkchafin

Thank you for the update.

With the current version in this repository, I can proceed with the example analysis. However, when I used the command listed in the README to install autostreamtree from the ecoevoinfo channel:

mamba install -c ecoevoinfo -c conda-forge -c bioconda autostreamtree

I still got the same error: ValueError: Seed must be between 0 and 2**32 - 1.

To resolve this, I had to explicitly specify the version with the following command:

mamba install -c ecoevoinfo -c conda-forge -c bioconda autostreamtree==v1.1.3

This suggests that the conda channel might need an update to automatically provide the latest version to users.

Besides, I observed that the version on the conda channel is v1.1.3, whereas the setup.py in this repository lists version 1.0.1.

version='1.0.1',

It might be beneficial to synchronize the version in setup.py with the conda channel to avoid confusion.

@tkchafin tkchafin reopened this Feb 7, 2024
@tkchafin
Copy link
Owner

tkchafin commented Feb 9, 2024

Versioning issues should be fixed now

@tkchafin tkchafin closed this as completed Feb 9, 2024
@xin-huang
Copy link
Author

yes, it works now, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants