Cid store quick wins #5945

jorgee · 2025-04-04T11:29:54Z

Use Instant object instead of strings in model objects
Unify nomenclature using taskRun and workflowRun instead of publishBy, runBy, etc
Use #outputs fragment using run CID instead of a separate result CID
Use DataOutput in place of WorkflowOutput and TaskOutput
Include script checksum in TaskRun
WorkflowResults and TaskResults renamed by WorkflowOutputs and TaskOutputs
Clean some unused code and unify duplicated code, (encoding search results,...)

Signed-off-by: jorgee <[email protected]>

jorgee · 2025-04-04T14:11:15Z

Missing parts discussed today:

Bugs:

Fix outputs path in source field in the Workflow's data outputs.
Add workflowRun in TaskRun

Other:

Task Inputs and Output model (suggestion @bentsherman)
find command to look for CID's (not showing)

bentsherman · 2025-04-04T14:17:44Z

Based on our discussion today, let's make it so that you don't need to specify #outputs or /outputs to access a task output. That is more a workflow thing. Task outputs should simply be addressed as cid://<task-hash>/<relative-path>

Here is a proposed structure for task inputs and outputs:

{
  "inputs": {
    "val": { /* map of val names/values */ },
    "env": { /* map of env names/values */ },
    "file": [ /* list of file/path inputs */ ],
    "stdin": "<hash>/command.in" // or null if not specified
  }
}

{
  "outputs": {
    "env": { /* map of env names/values */ },
    "eval": { /* map of eval names/values */ },
    "file": [ /* list of file/path outputs */ ],
    "stdout": "<hash>/command.out" // or null if not specified
  }
}

This is how I modeled task inputs/outputs in #4553 (an experiment for static types) and it worked well. See for example ProcessInputs and ProcessOutputs.

Signed-off-by: jorgee <[email protected]>

Signed-off-by: Paolo Di Tommaso <[email protected]>

pditommaso · 2025-04-05T15:26:46Z

Let's move the open points into a new PR

pditommaso · 2025-04-06T08:50:08Z

Based on our discussion today, let's make it so that you don't need to specify #outputs or /outputs to access a task output.

Not sure to agree on this, there should be a convenient to access outputs both for workflow and tasks.

Here is a proposed structure for task inputs and outputs

I think we should unify both inputs and outputs and list of objects. The grouping can be a bit readable for humans, but the real consumer should be indexing sub-system. Having a flat, easy predictable scheme, likely would simplify things

bentsherman · 2025-04-06T20:28:09Z

@pditommaso I would not rush to try to fit tasks and workflows into the same model. They are similar but not exactly the same. Instead we should think about the best way to model task inputs and outputs on their own terms.

The structure I propose, I think is actually easier to consume for both users and the data provenance. Rather than searching through a list, you simply look up a value by name in a map.

For a workflow run, #outputs makes sense because the output is a data structure. But the output of a task is a collection of files, environment variables, and stdout, so that probably requires a different syntax. Maybe something like:

cid://<hash>#file/path/to/file.txt
cid://<hash>#env/MY_ENV_VAR
cid://<hash>#stdout

But I don't know if all that is really needed. The simplest thing would be to just access output files as cid://<hash>/<path>

@jorgee if you understand enough the structure I proposed, can you make an initial PR for it? Should be easier to discuss there

pditommaso · 2025-04-07T08:53:39Z

Not sure it's worth, at least in this milestone. It would be better to focus on simplify and consistency, hence my suggestion to use always a collection both inputs and outputs.

jorgee · 2025-04-07T08:53:58Z

I partially agree with both comments. I like maps, but I agree with @pditommaso that we should refer to outputs in the same way for workflows and tasks. So, cid://<hash>#outputs should refer to the outputs, whether a hash is for a task or a workflow. I do not see very intuitive referring directly to a type, because you are assuming it is an output.

Regarding the structure of the output, I was trying to model them as a List of Parameter in all the cases (parameter, task inputs, task outputs and workflow outputs) where each parameter had a name, type and value. But I saw that, depending on the case, the name or type is missing. I suppose this will not happen in the future with static types,

Writing them to a list or a map depends on what we want to support. The reason why I changed workflow outputs to a map is that it is very easy to refer to a single output ( cid://<hash>#outputs.myOutput) and I think it could be a common use case for workflow outputs. If they were a list, you should refer to something like cid://<hash>#outputs?name=myOutput. because the '.' is used to navigate through the object parameters and '?' is used to filter in a list. But using this format also has two problems it is not a correct URI (query should be before fragment) and the '?' character has issues with the pattern expected by the Channel.fromPath.

bentsherman · 2025-04-09T15:35:48Z

Should be a good topic to discuss on Friday. In my view, the correct model is also the simplest one in this case.

Add cid quick wins

a6424f5

Signed-off-by: jorgee <[email protected]>

jorgee requested a review from pditommaso April 4, 2025 11:30

jorgee and others added 4 commits April 4, 2025 17:52

fix outputs bug and add workflowRun

ee7506f

Signed-off-by: jorgee <[email protected]>

Render cid usage on missing sub-command [ci fast]

c3ced0d

Signed-off-by: Paolo Di Tommaso <[email protected]>

Fix warning message [ci fast]

e8538a7

Signed-off-by: Paolo Di Tommaso <[email protected]>

Improve cid history logs [ci fast]

1cc9d7d

Signed-off-by: Paolo Di Tommaso <[email protected]>

pditommaso merged commit 14e2841 into cid-store Apr 5, 2025
5 checks passed

pditommaso deleted the cid-store-m0-quick-wins branch April 5, 2025 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cid store quick wins #5945

Cid store quick wins #5945

Uh oh!

jorgee commented Apr 4, 2025

Uh oh!

jorgee commented Apr 4, 2025 •

edited

Loading

Uh oh!

bentsherman commented Apr 4, 2025

Uh oh!

Uh oh!

pditommaso commented Apr 5, 2025

Uh oh!

pditommaso commented Apr 6, 2025

Uh oh!

bentsherman commented Apr 6, 2025

Uh oh!

pditommaso commented Apr 7, 2025

Uh oh!

jorgee commented Apr 7, 2025 •

edited

Loading

Uh oh!

bentsherman commented Apr 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Cid store quick wins #5945

Cid store quick wins #5945

Uh oh!

Conversation

jorgee commented Apr 4, 2025

Uh oh!

jorgee commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bugs:

Other:

Uh oh!

bentsherman commented Apr 4, 2025

Uh oh!

Uh oh!

pditommaso commented Apr 5, 2025

Uh oh!

pditommaso commented Apr 6, 2025

Uh oh!

bentsherman commented Apr 6, 2025

Uh oh!

pditommaso commented Apr 7, 2025

Uh oh!

jorgee commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bentsherman commented Apr 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jorgee commented Apr 4, 2025 •

edited

Loading

jorgee commented Apr 7, 2025 •

edited

Loading