Create a data manager to retrieve liftOver files from UCSC #1904

jennaj · 2016-03-10T21:47:23Z

Suggested options:

List out all loaded UCSC reference genomes by dbkey and name (only UCSC genomes!) that are already on the instance (not just in the builds list, but dbkeys that actually have fasta data loaded) so that multiple can be checked off to queue as one launched job. Spawning individual output datasets per genome is probably a good idea to track what was pulled or not. A top header above the checkboxes for "Select all, none" would be helpful.
Make the DM "smart" so that it will get new liftOver data only and ignore data already loaded (no duplicates). Ideally, the tool would check for pre-existing dbkeys that have liftOver content loaded (loc), compare existing files against all files available at UCSC, then load whatever is new. This is somewhat important since genomes often have liftOver data added as time passes. A way to update this data with a DM, without creating duplicates, along with retrieving all data for new genomes, in one tool, in batch, will make data admins happy.
Limit the tool so that it the retrieval of data is for 1-3 genomes at a time, allow to complete, then start next 1-3, repeat. UCSC will time out if too many connections are made at the same time, resulting in failed jobs._ The jobs must be paced temporally in order to not trigger a block_.
Other ideas to make this type of tool useful?

Move this request to tools? galaxy or iuc?

jennaj added triage friendliness/intermediate area/tools area/admin labels Mar 10, 2016

martenson mentioned this issue Aug 10, 2017

The Roadmap #1928

Closed

jmchilton assigned davebx Aug 10, 2017

jennaj removed the triage label Feb 8, 2018

jennaj mentioned this issue Aug 12, 2019

Genome Additions Master Ticket galaxyproject/usegalaxy-playbook#242

Open

19 tasks

jxtx mentioned this issue Aug 15, 2019

Issues with main galaxyproject/usegalaxy-playbook#250

Open

20 tasks

martenson unassigned davebx Mar 12, 2024

Provide feedback