Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make picopili portable #56

Merged
merged 49 commits into from
Mar 20, 2017
Merged

Make picopili portable #56

merged 49 commits into from
Mar 20, 2017

Conversation

rkwalters
Copy link
Contributor

Fully removes dependency on ricopili (https://github.com/Nealelab/ricopili), and attempts to make picopili more easily portable to other cluster environments.

Primary structural changes

Accomplishing the separation involves some major reworks:

  • Creating a stand-alone configuration file ~/picopili.conf separate from ricopili.conf (See Create separate config file for picopili #35). The general format is kept very consistent with ricopili, but includes new dependencies (.e.g. Admixture, Primus)
  • Provide a setup script to install this new configuration file: config_pico.pl This is directly derived from rp_config, including providing default locations where available from previous ricopili installations, but with some streamlining to hopefully require fewer restarts to complete configuration (see below about changing perl dependency structure).
  • Provide access to the few reference files from ricopili that are still required (for guessing genotyping platform and genome build) but not distributed from some existing public source. Due to file sizes they aren't being distributed directly with picopili, but get_refs.sh has been added to manage either linking to local copies (for environments with existing ricopili installs) or downloading them from a hosted copy.
  • Rewrite the cluster interface code, with blueprint.py replacing blueprint_pico.pl previously migrated from ricopili. A partial command line interface for ricopili-style job submission is maintained for the sake of the imp_prep.pl scripts, but most submission now operates through the python function call. Compared to blueprint_pico.pl, the code is substantially streamlined to avoid any hard-coding of job submission mechanics. Instead, cluster configuration is managed through a config file and a template job script placed in ./cluster_templates/. Substantial attention has been given to trying to be a good citizen of cluster that dispatch parallel jobs per-machine rather than per-CPU (e.g. Lisa). Example configurations are provided for Broad and Lisa. More documentation of this system for adding additional clusters is pending.

Features

Unrelated to these portability/structural changes, there's one noteworthy feature updates:

  • Add support in admix_rel.py for Admixture 1.3's more principled projection of admixture solutions from the unrelated subset to the full sample, rather than selecting population "exemplars" from the unrelateds to run a supervised solution. This behavior is now default, but the old approach is still accessible with --use-exemplars.
  • admix_rel.py also now allows starting from a existing admixture solution for unrelated individuals (supplied with --admix-p specifying output .P file). Should benefit resubmission of jobs that crash downstream, or have run an initial admixture solution in some other context.

Minor changes

Additional minor changes accompanying the structural changes include:

  • Porting the addition of Utils.pm from ricopili, and moving perl dependencies from an environment variable (previously rp_perlpackages) to the config.
  • Additional documentation of python and external software dependencies, added in ./docs/.
  • Slightly more adaptive resource requests on jobs being resubmitted after failing initially. There's still a lot of room for improvement in this area, but there's a first pass at creating the machinery to better track successive re-submissions (using pickled submission info) and it's at least a bit more responsive than needing to manually edit the scripts involved.
  • Making email notifications optional (in the few places they currently exist; see Allow disabling email #53). Is done by providing a invalid email address in picopili.conf (implemented as a naive check for an @), and is intentionally caught so that the mail system doesn't get invoked with the bad email.
  • Setting imp_prep.pl, the primary legacy code kept from ricopili, to use local log files rather than interacting with ricopili logs.
  • Stop using greedy import statement in .py scripts, to hopefully reduce update bugs caused by omitting required imports/variable initiatialization.
  • Remove code referencing reference files on any particular platform, including some lingering references to fall-back default files in dead code (primarily in .pl files) or argparse defaults.
  • Adjust some code documentation to make it clear that any recent mistakes in the code ported from ricopili are entirely mine, and credit for the original code goes the Stephan and the ricopili project.
  • Minor logging changes. A bigger restructure is on the to-do list, but minor tweaks continue to be made until then.

Final note on ricopili

There's no desire to abandon ricopili here, and the connection between the two programs remains clearly documented in ./docs/RICOPILI.md to recognize that we're building very directly on that previous work.

Removing the formal code dependence should make it easier share and maintain this project, especially since that was a pre-requisite to making picopili portable to environments that don't currently have fully supported ricopili installations.

The separation will probably make some tasks more difficult (e.g. incorporating future updates to ricopili, see #52) but it should be a net improvement, especially for usability.

@rkwalters
Copy link
Contributor Author

This has been the de facto main branch for a few months, and has performed well over that time. Have been a number of bug fixes over that time (as evident from commit history on this pull), and should be fairly stable at this point. At minimum, is as least as stable as the current master on Broad, and has a lot of added benefits.

As such, merging this upward to master. If there's a need for the last Broad-specific before this transition, it's commit dcf4a8f.

@rkwalters rkwalters merged commit 39f35f4 into dev Mar 20, 2017
rkwalters added a commit that referenced this pull request Mar 20, 2017
Update install info to reflect move to portable version (#56). Add contact info.
@rkwalters rkwalters deleted the portable branch May 17, 2021 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant