pepkit
diff --git a/‎bash_complete.sh
+17 b/‎bash_complete.sh
+17
diff --git a/‎docs/README.md
+1-1 b/‎docs/README.md
+1-1
diff --git a/‎docs/advanced.md
+27 b/‎docs/advanced.md
+27
diff --git a/‎docs/changelog.md
+16-1 b/‎docs/changelog.md
+16-1
diff --git a/‎docs/defining-a-project.md
+24-127 b/‎docs/defining-a-project.md
+24-127
diff --git a/‎docs/README_divvy.md ‎docs/divvy/README.md
+5-5 b/‎docs/README_divvy.md ‎docs/divvy/README.md
+5-5
diff --git a/‎docs/adapters_divvy.md ‎docs/divvy/adapters.md b/‎docs/adapters_divvy.md ‎docs/divvy/adapters.md
diff --git a/‎docs/configuration_divvy.md ‎docs/divvy/configuration.md
+25 b/‎docs/configuration_divvy.md ‎docs/divvy/configuration.md
+25
diff --git a/‎docs/containers_divvy.md ‎docs/divvy/containers.md b/‎docs/containers_divvy.md ‎docs/divvy/containers.md
diff --git a/‎docs/default_packages_divvy.md ‎docs/divvy/default-packages.md b/‎docs/default_packages_divvy.md ‎docs/divvy/default-packages.md
diff --git a/‎docs/features.md
+1 b/‎docs/features.md
+1
@@ -0,0 +1,17 @@
+# Begin looper bash autocomplete
+_looper_autocomplete()
+{
+    local cur prev opts1
+    cur=${COMP_WORDS[COMP_CWORD]}
+    prev=${COMP_WORDS[COMP_CWORD-1]}
+    opts1=$(looper --commands)
+    case ${COMP_CWORD} in
+        1)
+            COMPREPLY=($(compgen -W "${opts1}" -- ${cur}))
+            ;;
+        2)
+            COMPREPLY=()
+            ;;
+    esac
+} && complete -o bashdefault -o default -F _looper_autocomplete looper
+# end looper bash autocomplete
@@ -51,7 +51,7 @@ unzip master.zip
 
 # Run looper:
 cd hello_looper-master
-looper run project/project_config.yaml
+looper run --looper-config .looper.yaml project/project_config.yaml
 ```
 
 Detailed explanation of results is in the [Hello world tutorial](hello-world.md).
@@ -56,3 +56,30 @@ Once a pipeline is submitted any remaining interface files will be ignored.
 Until an appropriate pipeline is found, each interface file will be considered in succession.
 If no suitable pipeline is found in any interface, the sample will be skipped.
 In other words, the `pipeline_interfaces` value specifies a *prioritized* search list.
+
+## Set up tab completion
+
+Source `bash_complete.sh` to your `~/.bashrc` to get basic tab completion for Looper.
+
+Then, simply type `looper <tab> <tab>` to see a list of commands and `looper comma<tab>` to get autocompletion for specific commands.
+
+Source script to add to `~/.bashrc`:
+```bash
+# Begin looper bash autocomplete
+_looper_autocomplete()
+{
+    local cur prev opts1
+    cur=${COMP_WORDS[COMP_CWORD]}
+    prev=${COMP_WORDS[COMP_CWORD-1]}
+    opts1=$(looper --commands)
+    case ${COMP_CWORD} in
+        1)
+            COMPREPLY=($(compgen -W "${opts1}" -- ${cur}))
+            ;;
+        2)
+            COMPREPLY=()
+            ;;
+    esac
+} && complete -o bashdefault -o default -F _looper_autocomplete looper
+# end looper bash autocomplete
+```
@@ -2,6 +2,21 @@
 
 This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.
 
+## [1.6.0] -- 2023-12-22
+
+### Added
+- `looper link` creates symlinks for results grouped by record_identifier. It requires pipestat to be configured. [#72](https://github.com/pepkit/looper/issues/72)
+- basic tab completion. 
+
+### Changed
+- looper now works with pipestat v0.6.0 and greater.
+- `looper table`, `check` now use pipestat and therefore require pipestat configuration. [#390](https://github.com/pepkit/looper/issues/390)
+- changed how looper configures pipestat [#411](https://github.com/pepkit/looper/issues/411)
+- initializing pipeline interface also writes an example `output_schema.yaml` and `count_lines.sh` pipeline
+
+### Fixed
+- filtering via attributes that are integers.
+
 ## [1.5.1] -- 2023-08-14
 
 ### Fixed
@@ -68,7 +83,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 ## [1.3.1] -- 2021-06-18
 
 ### Changed
-- If remote schemas are not accessbile, the job submission doesn't fail anymore
+- If remote schemas are not accessible, the job submission doesn't fail anymore
 - Fixed a bug where looper stated "No failed flag found" when a failed flag was found
 
 ### Deprecated
 
@@ -4,142 +4,39 @@
 
 To start, you need a project defined in the [standard Portable Encapsulated Project (PEP) format](http://pep.databio.org). Start by [creating a PEP](https://pep.databio.org/en/latest/simple_example/).
 
-## 2. Connect the PEP to looper
+## 2. Specify the Sample Annotation
 
-### 2.1 Specify `output_dir`
-
-Once you have a basic PEP, you can connect it to looper. Just provide the required looper-specific piece of information -- `output-dir`, a parent folder where you want looper to store your results. You do this by adding a `looper` section to your PEP. The `output_dir` key is expected in the top level of the `looper` section of the project configuration file. Here's an example:
+This information generally lives in a `project_config.yaml` file.
 
+Simplest example:
 ```yaml
-looper:
-  output_dir: "/path/to/output_dir"
+pep_version: 2.0.0
+sample_table: sample_annotation.csv
 ```
 
-### 2.2 Configure pipestat
-
-*We recommend to read the [pipestat documentation](https://pipestat.databio.org) to learn more about the concepts described in this section*
-
-Additionally, you may configure pipestat, the tool used to manage pipeline results. Pipestat provides lots of flexibility, so there are multiple configuration options that you can provide in `looper.pipestat.sample` or `looper.pipestat.project`, depending on the pipeline level you intend to run.
-
-Please note that all the configuration options listed below *do not* specify the values passed to pipestat *per se*, but rather `Project` or `Sample` attribute names that hold these values. This way the pipestat configuration can change with pipeline submitted for every `Sample` if the PEP `sample_modifiers` are used.
-
-- `results_file_attribute`: name of the `Sample` or `Project` attribute that indicates the path to the YAML results file that will be used to report results into. Default value: `pipestat_results_file`, so the path will be sourced from either `Sample.pipestat_results_file` or `Project.pipestat_results_file`. If the path provided this way is not absolute, looper will make it relative to `{looper.output_dir}`.
-- `namespace_attribute`: name of the `Sample` or `Project` attribute that indicates the namespace to report into. Default values: `sample_name` for sample-level pipelines `name` for project-level pipelines , so the path will be sourced from either `Sample.sample_name` or `Project.name`.
-- `config_attribute`: name of the `Sample` or `Project` attribute that indicates the path to the pipestat configuration file. It's not needed in case the intended pipestat backend is the YAML results file mentioned above. It's required if the intended pipestat backend is a PostgreSQL database, since this is the only way to provide the database login credentials. Default value: `pipestat_config`, so the path will be sourced from either `Sample.pipestat_config` or `Project.pipestat_config`.
-
-Non-configurable pipestat options:
-
-- `schema_path`: never specified here, since it's sourced from `{pipeline.output_schema}`, that is specified in the pipeline interface file
-- `record_identifier`: is automatically set to `{pipeline.pipeline_name}`, that is specified in the pipeline interface file
-
+A more complicated example taken from [PEPATAC](https://pepatac.databio.org/en/latest/):
 
 ```yaml
-name: "test123"
-pipestat_results_file: "project_pipestat_results.yaml"
-pipestat_config: "/path/to/project_pipestat_config.yaml"
+pep_version: 2.0.0
+sample_table: tutorial.csv
 
 sample_modifiers:
-  append:
-    pipestat_config: "/path/to/pipestat_config.yaml"
-    pipestat_results_file: "RESULTS_FILE_PLACEHOLDER"
   derive:
-    attributes: ["pipestat_results_file"]
+    attributes: [read1, read2]
     sources:
-      RESULTS_FILE_PLACEHOLDER: "{sample_name}/pipestat_results.yaml"
-
-looper:
-  output_dir: "/path/to/output_dir"
-  # pipestat configuration starts here
-  # the values below are defaults, so they are not needed, but configurable
-  pipestat:
-    sample:
-      results_file_attribute: "pipestat_results_file"
-      config_attribute: "pipestat_config"
-      namespace_attribute: "sample_name"
-    project:
-      results_file_attribute: "pipestat_results_file"
-      config_attribute: "pipestat_config"
-      namespace_attribute: "name"
-```
-## 3. Link a pipeline to your project
-
-Next, you'll need to point the PEP to the *pipeline interface* file that describes the command you want looper to run.
-
-### Understanding pipeline interfaces
-
-Looper links projects to pipelines through a file called the *pipeline interface*. Any looper-compatible pipeline must provide a pipeline interface. To link the pipeline, you simply point each sample to the pipeline interfaces for any pipelines you want to run.
-
-Looper pipeline interfaces can describe two types of pipeline: sample-level pipelines or project-level pipelines. Briefly, a sample-level pipeline is executed with `looper run`, which runs individually on each sample. A project-level pipeline is executed with `looper runp`, which runs a single job *per pipeline* on an entire project. Typically, you'll first be interested in the sample-level pipelines. You can read in more detail in the [pipeline tiers documentation](pipeline-tiers.md).
-
-### Adding a sample-level pipeline interface
-
-Sample pipelines are linked by adding a sample attribute called `pipeline_interfaces`. There are 2 easy ways to do this: you can simply add a `pipeline_interfaces` column in the sample table, or you can use an *append* modifier, like this:
-
-```yaml
-sample_modifiers:
-  append:
-    pipeline_interfaces: "/path/to/pipeline_interface.yaml"
-```
-
-The value for the `pipeline_interfaces` key should be the *absolute* path to the pipeline interface file. The paths may also contain environment variables. Once your PEP is linked to the pipeline, you just need to make sure your project provides any sample metadata required by the pipeline.
-
-### Adding a project-level pipeline interface
-
-Project pipelines are linked in the `looper` section of the project configuration file:
-
-```
-looper:
-  pipeline_interfaces: "/path/to/project_pipeline_interface.yaml"
-```
-
-### How to link to multiple pipelines
-
-Looper decouples projects and pipelines, so you can have many projects using one pipeline, or many pipelines running on the same project. If you want to run more than one pipeline on a sample, you can simply add more than one pipeline interface, like this:
-
-```yaml
-sample_modifiers:
-  append:
-    pipeline_interfaces: ["/path/to/pipeline_interface.yaml", "/path/to/pipeline_interface2.yaml"]
-```
-
-Looper will submit jobs for both of these pipelines.
-
-If you have a project that contains samples of different types, then you can use an `imply` modifier in your PEP to select which pipelines you want to run on which samples, like this:
-
-
-```yaml
-sample_modifiers:
+      # Obtain tutorial data from http://big.databio.org/pepatac/ then set
+      # path to your local saved files
+      R1: "${TUTORIAL}/tools/pepatac/examples/data/{sample_name}_r1.fastq.gz"
+      R2: "${TUTORIAL}/tools/pepatac/examples/data/{sample_name}_r2.fastq.gz"
   imply:
-    - if:
-        protocol: "RRBS"
-      then:
-        pipeline_interfaces: "/path/to/pipeline_interface.yaml"
-    - if:
-        protocol: "ATAC"
-      then:
-        pipeline_interfaces: "/path/to/pipeline_interface2.yaml"
-```
-
-
-## 5. Customize looper
-
-That's all you need to get started linking your project to looper. But you can also customize things further. Under the `looper` section, you can provide a `cli` keyword to specify any command line (CLI) options from within the project config file. The subsections within this section direct the arguments to the respective `looper` subcommands. So, to specify, e.g. sample submission limit for a `looper run` command use:
-
-```yaml
-looper:
-  output_dir: "/path/to/output_dir"
-  cli:
-    run:
-      limit: 2
-```
-
-or, to pass this argument to any subcommand:
-
-```yaml
-looper:
-  output_dir: "/path/to/output_dir"
-  all:
-    limit: 2
-```
-
-Keys in the `cli.<subcommand>` section *must* match the long argument parser option strings, so `command-extra`, `limit`, `dry-run` and so on. For more CLI options refer to the subcommands [usage](usage.md).
+    - if: 
+        organism: ["human", "Homo sapiens", "Human", "Homo_sapiens"]
+      then: 
+        genome: hg38
+        prealignment_names: ["rCRSd"]
+        deduplicator: samblaster # Default. [options: picard]
+        trimmer: skewer          # Default. [options: pyadapt, trimmomatic]
+        peak_type: fixed         # Default. [options: variable]
+        extend: "250"            # Default. For fixed-width peaks, extend this distance up- and down-stream.
+        frip_ref_peaks: None     # Default. Use an external reference set of peaks instead of the peaks called from this run
+```
@@ -1,13 +1,13 @@
-![Logo](img/divvy_logo.svg)
+![Logo](../img/divvy_logo.svg)
 
 ## What is `divvy`?
 
-`Divvy` allows you to populate job submission scripts by integrating job-specific settings with separately configured computing environment settings. Divvy *makes software portable*, so users may easily toggle among any computing resource (laptop, cluster, cloud). 
+The submission configuration tool embedded in `looper` is called `divvy`. Divvy is useful independently from looper, but it ships with looper. Divvy allows you to populate job submission scripts by integrating job-specific settings with separately configured computing environment settings. Divvy *makes software portable*, so users may easily toggle among any computing resource (laptop, cluster, cloud). 
 
-![Merge](img/divvy-merge.svg)
+![Merge](../img/divvy-merge.svg)
 ## What makes `divvy` better?
 
-![NoDivvy](img/nodivvy.svg)
+![NoDivvy](../img/nodivvy.svg)
 
 Tools require a particular compute resource setup. For example, one pipeline requires SLURM, another requires AWS, and yet another just runs directly on your laptop. This makes it difficult to transfer to different environments. For tools that can run in multiple environments, each one must be configured separately.
 
@@ -16,7 +16,7 @@ Tools require a particular compute resource setup. For example, one pipeline req
 
 Instead, `divvy`-compatible tools can run on any computing resource. **Users configure their computing environment once, and all divvy-compatible tools will use this same configuration.**
 
-![Connect](img/divvy-connect.svg)
+![Connect](../img/divvy-connect.svg)
 
 Divvy reads a standard configuration file describing available compute resources and then uses a simple template system to write custom job submission scripts. Computing resources are organized as *compute packages*, which users select, populate with values, and build scripts for compute jobs. 
 
 
@@ -1,3 +1,28 @@
+# Installing divvy
+
+Divvy is automatically installed when you install looper. See if your install worked by calling `divvy -h` on the command line. If the `divvy` executable in not in your `$PATH`, append this to your `.bashrc` or `.profile` (or `.bash_profile` on macOS):
+
+```{console}
+export PATH=~/.local/bin:$PATH
+```
+
+# Initial configuration
+
+On a fresh install, `divvy` comes pre-loaded with some built-in compute packages, which you can explore by typing `divvy list`. If you need to tweak these or create your own packages, you will need to configure divvy manually. Start by initializing an empty `divvy` config file:
+
+```{console}
+export DIVCFG="divvy_config.yaml"
+divvy init $DIVCFG
+```
+
+This `init` command will create a default config file, along with a folder of templates. 
+
+The `divvy write` and `list` commands require knowing where this genome config file is. You can pass it on the command line all the time (using the -c parameter), but this gets old. An alternative is to set up the $DIVCFG environment variable. Divvy will automatically use the config file in this environmental variable if it exists. Add this line to your `.bashrc` or `.profile` if you want it to persist for future command-line sessions. You can always specify -c if you want to override the value in the $DIVCFG variable on an ad-hoc basis:
+
+```{console}
+export DIVCFG=/path/to/divvy_config.yaml
+```
+
 # The divvy configuration file
 
 At the heart of `divvy` is a the *divvy configuration file*, or `DIVCFG` for short. This is a `yaml` file that specifies a user's available *compute packages*. Each compute package represents a computing resource; for example, by default we have a package called `local` that populates templates to simple run jobs in the local console, and another package called `slurm` with a generic template to submit jobs to a SLURM cluster resource manager. Users can customize compute packages as much as needed. 
 
@@ -46,3 +46,4 @@ Looper uses a command-line interface so you have total power at your fingertips.
 ![html][html] **Beautiful linked result reports**
 
 Looper automatically creates an internally linked, portable HTML report highlighting all results for your pipeline, for every pipeline.
+For an html report example see: [PEPATAC Gold Summary](https://pepatac.databio.org/en/latest/files/examples/gold/gold_summary.html)
Original file line number	Diff line number	Diff line change
`@@ -46,3 +46,4 @@ Looper uses a command-line interface so you have total power at your fingertips.`
`46`	`46`	`![html][html] Beautiful linked result reports`
`47`	`47`
`48`	`48`	`Looper automatically creates an internally linked, portable HTML report highlighting all results for your pipeline, for every pipeline.`
	`49`	`+For an html report example see: [PEPATAC Gold Summary](https://pepatac.databio.org/en/latest/files/examples/gold/gold_summary.html)`