Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create framework to handle external inputs #20

Open
lucamlouzada opened this issue Sep 19, 2024 · 4 comments
Open

Create framework to handle external inputs #20

lucamlouzada opened this issue Sep 19, 2024 · 4 comments
Assignees

Comments

@lucamlouzada
Copy link
Collaborator

This issue is part of an effort to implement substantive improvements to the lab template, as discussed in #16.

In this issue, the goal is to work on the way the template handles external paths for inputs. The main points to be addressed per the decision in plans for next steps are:

  • Let's not do .yaml, using Shell script.
  • External path only passed via local_env.sh.
  • People need to know how to deal with local_env.sh file conditioned on they know nothing.
  • The idea for the external paths is that users have the option to specify external paths in local_env.sh. The first time make.sh is run in a module, if there are external links, it will create a folder with symlinks in the root of the module. This part that deals with externals can be a specific .sh file stored in lib, only called by make.sh (to reduce the length of the make.sh file and make it simpler).

To achieve this, I will need to

  1. Investigate how to let shell scripts to read local_env.sh files to recognize the path in a robust way.
  2. See how bash handles complex objects to understand how to attribute global variables in a similar way done by .yaml
  3. If changes are possible, implement them in the best way possible
  4. Work on unit tests after implementation

I am assigning myself to work on this.

@lucamlouzada
Copy link
Collaborator Author

lucamlouzada commented Sep 19, 2024

I have pushed two commits that introduce the framework for externals. It is quite straightforward:

  • In local_env.sh, the user should specify their external paths with names. I have changed local_env_template.sh in
    219c6c4 to reflect how this specification should be done
  • In 4a61543, I created a new file make_externals.sh in the lib that loops through the external paths in local_env.sh and creates an external folder with symlinks

The idea is that make_externals.sh will be called by make.sh in each module that requires external links. It is therefore necessary to modify make.sh to reflect these changes. The changes in make.sh should look as follows:

+ # Optional: Create external symlinks
+ # Uncomment the following line if you need to handle external paths
+ # source "${REPO_ROOT}/lib/shell/make_externals.sh"

These three lines could be added right after the lines that copy other source files for inputs.

I have not pushed these modifications to make.sh yet because I want to avoid conflicts with the changes in make.sh made in #19.

If this framework is approved, I believe we would also have to change the local_env.sh file in the root of the template repository. Which raises the question - should we even have such a file? I am thinking whether we should delete this from the Github template repository as it is already created by check_setup.sh anyway.

@gentzkow
Copy link
Owner

@lucamlouzada Thanks!

Comments:

  1. I agree that local_env.sh should not be committed to the repository. It should be ignored in .gitignore. We only want to have local_env_template.sh.

  2. What about if we create the /external/ subdirectory with the symlinks at the root of the repository rather than in the individual modules? The original reason for doing it in the modules was to be able to see easily which directories depended on which external resources. But the case where there are more than one or two external resources seems to be rare in practice. Since in your scheme we're always creating links to the full set of paths specified in local_env.sh, there wouldn't seem to be much loss from just creating the directory once at setup.

  3. This would be another reason to go back to having a setup.sh script that

  • Creates local_env.sh
  • Prompts the user to edit local_env.sh to fill in the relevant paths
  • Creates /external/ with the appropriate symlinks
  • Installs any R / Python / Stata dependencies we want to install in the "simple" repo case (no conda or Renv)
  • Runs check_setup.sh to make sure everything is kosher

@lucamlouzada
Copy link
Collaborator Author

Thanks @gentzkow. For the first two points, agreed. Will implement these in this branch.

For the third, I agree it could be a good option. It goes back to thinking about how many of these helper scripts we want to have. The only thing I am unsure about is the point about installing dependencies for the "simple" repo case. Not sure if we should make that a default option as even installing the simplest packages without setting a virtual environment could lead to version issues in different projects. I think it might be better to just explain how to do that in the Wiki. In this case we could consider whether it's better to create a new setup.sh script or just work on improving check_setup (so that it can tell if it's the first time it's being run in a repo and perform some setup actions as needed).

lucamlouzada added a commit that referenced this issue Oct 2, 2024
@lucamlouzada
Copy link
Collaborator Author

Update:

I agree that local_env.sh should not be committed to the repository. It should be ignored in .gitignore. We only want to have local_env_template.sh.

I have deleted local_env.sh (and it was already listed in .gitignore so I don't think we should have problems in future commits).

What about if we create the /external/ subdirectory with the symlinks at the root of the repository rather than in the individual modules?

I will wait until we have discussed the creation of setup.sh to update make_externals.sh accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants