-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inputs.txt to handle internal file dependencies #24
Comments
I have created the inputs.txt files in 276cb0c. There should also be changes in The changes in - # Copy and/or symlink input files to local /input/ directory
- # (Make sure this section is updated to pull in all needed input files!)
+ # Add symlink input files to local /input/ directory
+ # (Make sure inputs.txt is updated to pull in all needed input files!)
rm -rf "${MAKE_SCRIPT_DIR}/input"
mkdir -p "${MAKE_SCRIPT_DIR}/input"
- # cp my_source_files "${MAKE_SCRIPT_DIR}/input/"
+ if [[ -f "inputs.txt" ]]; then # check if inputs.txt exists
+ links_created=false
+ while IFS= read -r file_path; do
+ if [[ -n "$file_path" && "$file_path" != \#* ]]; then # skip empty or commented out lines
+ if [[ -f "$file_path" ]]; then # check if the file_path is valid
+ file_name=$(basename "$file_path")
+ abs_path=$(realpath "$file_path") # get absolute path
+ ln -sf "$abs_path" "${MAKE_SCRIPT_DIR}/input/$file_name" # create symlink in the input folder
+ links_created=true
+ else
+ echo "Error: $file_path does not exist or is not a valid file path." >&2
+ false # trigger error handler
+ fi
+ fi
+ done < "inputs.txt"
+ if [[ "$links_created" == true ]]; then
+ echo -e "\nAll input links were created!"
+ else
+ echo -e "\n\033[0;34mNote:\033[0m There were no input links to create."
+ fi
+ else
+ echo -e "\nError: No inputs.txt file found in the module." | tee -a "${LOGFILE}"
+ false # trigger error handler
+ fi Note that this change will also require updating the template documentation and the instructions on how to run the example scripts. |
I have also done some research regarding whether it would be possible to implement automated scanning of file dependencies. One alternative is using a DAG structure with software like SCons, Snakemake, or Nextflow, but this seems like a more complex step. We could also write ad hoc scripts to do it manually, but it would require some creative combination of regex and searching for functions such as "load" or "read_csv", which does not seem very practical and would increase the complexity of the template without large returns. My sense is that we can stick with manually adding input files for now, either with the |
@lucamlouzada Thanks for the great work here. If we were to go this route, we'd want to abstract the code for scanning inputs.txt into a helper function rather than having it directly in More fundamentally, I'm a little worried looking at it that writing our own parser like this is going to create a potential source of errors/complexity down the line. E.g., cases where people get the syntax in Here are two possible alternatives. (1) (2)
These could be stored in |
Thanks @gentzkow. Agreed on making this a separate helper script if we go down this route.
What kind of other errors/complexity did you have in mind? The current approach implements error handling for the paths in the shell script, so that errors in the txt files will be shown to the users as warning messages. But these alternatives are both good options, my sense is it depends a bit on what our priorities will be.
By this, do you mean users would edit the shell file instead of the txt file? We could have a section in the shell script for users to edit (similarly to what is done in the current template version with the
This is indeed simpler and could be easy to implement. The downside is having users edit more than one file if inputs for different languages are not concentrated in the same script. |
Just that it's another failure point. I know you've implemented some good error handling, but it's always possible that there's a gap in that or that a user does something unexpected.
I'm leaning toward this approach. It would remove one big layer of complexity. I think the case where a given module has scripts in multiple languages is relatively rare, and in those cases it's not obviously a bad thing to make clear which inputs are being used by the different categories of scripts. Let's proceed w/ this approach. |
Hi @gentzkow, looking back into this now.
I realized that this approach would require users to add code on top of their scripts to source the paths from the Two potential solutions (let me know if you think of others; I can also keep looking for alternatives):
The problem with (1) is that it makes the I think (1) is preferable, but still don't know whether it is better than |
Thanks @lucamlouzada. I'm actually not strongly opposed to having those source commands at the top of the scripts. That's actually what I was imagining. One downside of this approach is that there's no way to tell from within a given This approach still seems more robust to me than either (1) or (2), since it preserves the property that someone can clone the repo and then run the individual |
This issue is part of an effort to implement substantive improvements to the lab template, as discussed in #16.
In this issue, the goal is to work on the way the template handles input files. The approach to be adopted per the decision in plans for next steps is:
input.txt
file in which users should list the files that are required as inputsmake.sh
will read through these files and add symlinks to theinput
folderI am assigning myself to work on this. I will consider whether the creation of the symlinks should be done in
make.sh
itself or in an additional script stored inlib/shell
. I will also investigate whether there is a way to scan over all scripts and automate the identification of paths.The text was updated successfully, but these errors were encountered: