Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.41.0 Regression in sources behavior cause all files to be read even with method: timestamp #2029

Open
chrishoage opened this issue Jan 29, 2025 · 0 comments · May be fixed by #2031
Open
Labels
area: fingerprinting Changes related to checksums and caching. area: variables Changes related to variables.

Comments

@chrishoage
Copy link

Description

  • Running tasks with sources method timestamp causes all files to be fully read

This purple block in this graph is the data in sources being read
The yellow/green block in the graph is the actual task being run

Image

I noticed this behavior when upgrading to 3.41.0. Reverting to 3.40.1 resolves this issue completely.

Even with method: timestamp in the task definition it still reads all files.

If I had to guess it was regressed by #1872

It appears to set up both checkers regardless of method

task/variables.go

Lines 131 to 147 in f5121de

if len(origTask.Sources) > 0 {
timestampChecker := fingerprint.NewTimestampChecker(e.TempDir.Fingerprint, e.Dry)
checksumChecker := fingerprint.NewChecksumChecker(e.TempDir.Fingerprint, e.Dry)
for _, checker := range []fingerprint.SourcesCheckable{timestampChecker, checksumChecker} {
value, err := checker.Value(&new)
if err != nil {
return nil, err
}
vars.Set(strings.ToUpper(checker.Kind()), ast.Var{Live: value})
}
// Adding new variables, requires us to refresh the templaters
// cache of the the values manually
cache.ResetCache()
}

which ultimately calls

func (c *ChecksumChecker) checksum(t *ast.Task) (string, error) {
sources, err := Globs(t.Dir, t.Sources)
if err != nil {
return "", err
}
h := xxh3.New()
buf := make([]byte, 128*1024)
for _, f := range sources {
// also sum the filename, so checksum changes for renaming a file
if _, err := io.CopyBuffer(h, strings.NewReader(filepath.Base(f)), buf); err != nil {
return "", err
}
f, err := os.Open(f)
if err != nil {
return "", err
}
if _, err = io.CopyBuffer(h, f, buf); err != nil {
return "", err
}
f.Close()
}
hash := h.Sum128()
return fmt.Sprintf("%x%x", hash.Hi, hash.Lo), nil
}

I believe this behavior to be a bug as when declaring a method: timestamp that seems to me that the user expelictly does not want an expensive checksum done

To fix this I suppose the first referenced code block needs to check the method of the task and use this to set up the correct checker.

Version

3.41.0

Operating system

Alpine Linux (task is runing in a docker container)

Experiments Enabled

No response

Example Taskfile

version: "3"

  cache_sync:
    method: timestamp
    sources:
      - "{{ .CACHE_PATH }}/media/**/*"
    cmds:
      - |
        docker run \
          -v "$CACHE_PATH/media:/src" \
          -v "$BACKING_DATA_PATH/media:/dest" \
          -e "PUID=${PUID}" \
          -e "PGID=${PGID}" \
          --rm \
          --pull=never \
          --name=cache_sync \
          rsync
@task-bot task-bot added the state: needs triage Waiting to be triaged by a maintainer. label Jan 29, 2025
@vmaerten vmaerten added area: variables Changes related to variables. area: fingerprinting Changes related to checksums and caching. type: performance Issues with optimisation or performance. and removed state: needs triage Waiting to be triaged by a maintainer. labels Jan 29, 2025
@pd93 pd93 removed the type: performance Issues with optimisation or performance. label Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: fingerprinting Changes related to checksums and caching. area: variables Changes related to variables.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants