Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upload-artifact fails on Ubuntu jobs, likely due to artifact name collision #1383

Closed
RyanGlScott opened this issue Jul 15, 2022 · 0 comments · Fixed by #1384
Closed

upload-artifact fails on Ubuntu jobs, likely due to artifact name collision #1383

RyanGlScott opened this issue Jul 15, 2022 · 0 comments · Fixed by #1384
Labels
CI Continuous integration

Comments

@RyanGlScott
Copy link
Contributor

Recently, the Ubuntu jobs have been failing nondeterministically with the following error. Here is an example:

##### Begin Diagnostic HTTP information #####
Status Code: 503
Status Message: Service Unavailable
Header Information: {
  "cache-control": "no-store,no-cache",
  "pragma": "no-cache",
  "content-length": "350",
  "content-type": "application/json; charset=utf-8",
  "strict-transport-security": "max-age=2592000",
  "x-tfs-processid": "cfd6d1a4-727d-417f-a00a-f2909b25aafe",
  "activityid": "37916b51-150b-4a1b-b337-132cba254678",
  "x-tfs-session": "37916b51-150b-4a1b-b337-132cba254678",
  "x-vss-e2eid": "37916b51-150b-4a1b-b337-132cba254678",
  "x-vss-senderdeploymentid": "d624195d-30e0-1768-06a5-b10a7879c7db",
  "x-frame-options": "SAMEORIGIN",
  "x-cache": "CONFIG_NOCACHE",
  "x-msedge-ref": "Ref A: 63AEA71D1445422FAD294415703061E1 Ref B: BN3EDGE0818 Ref C: 2022-07-08T21:06:38Z",
  "date": "Fri, 08 Jul 2022 21:06:38 GMT"
}
###### End Diagnostic HTTP information ######
Retry limit has been reached for chunk at offset 58720256 to https://pipelines.actions.githubusercontent.com/zYlJT5TUb9aMZSwKPCTReZcvRKmSeZNVv40BxwMUxavewh5n5l/_apis/resources/Containers/46140633?itemPath=cryptol-2.13.0.99-ubuntu-20.04-x86_64-with-solvers%2Fcryptol-2.13.0.99-ubuntu-20.04-x86_64-with-solvers.tar.gz
Warning: Aborting upload for /home/runner/work/cryptol/cryptol/cryptol-2.13.0.99-ubuntu-20.04-x86_64-with-solvers.tar.gz due to failure
Error: aborting artifact upload

Curiously, this only happens on the Ubuntu jobs. After searching the GitHub Actions upload-artifact issue tracker for a bit, I think I now know why this happens. Here is a relevant section of the upload-artifact README (which I discovered via actions/upload-artifact#171 (comment)):

Each artifact behaves as a file share. Uploading to the same artifact multiple times in the same workflow can overwrite and append already uploaded files:

I believe what is happening is that we are trying to upload certain artifacts from different jobs to the same name, which confuses upload-artifact and can result in various 500 errors, such as the 503 seen above. Indeed, we have the following upload-artifact steps in the Cryptol CI:

- uses: actions/upload-artifact@v2
with:
name: ${{ env.NAME }}
path: "${{ env.NAME }}.tar.gz*"
if-no-files-found: error
retention-days: ${{ needs.config.outputs.retention-days }}
- uses: actions/upload-artifact@v2
with:
name: ${{ env.NAME }}-with-solvers
path: "${{ env.NAME }}-with-solvers.tar.gz*"
if-no-files-found: error
retention-days: ${{ needs.config.outputs.retention-days }}

Where NAME is defined as:

NAME="${{ needs.config.outputs.name }}-${{ matrix.os }}-x86_64"

Notably, the NAME does not contain the GHC version, but because we have jobs that run different versions of Ubuntu, we have different Ubuntu jobs trying to upload artifacts to the same name, resulting in disaster.

It is somewhat curious that this error never arose for CI jobs in, say, saw-script, and the above explains why. In saw-script, we have this upload-artifact step:

      - if: matrix.ghc == '8.10.7'
        uses: actions/upload-artifact@v2
        with:
          name: ${{ steps.config.outputs.name }} (GHC ${{ matrix.ghc }})
          path: "${{ steps.config.outputs.name }}-with-solvers.tar.gz*"
          if-no-files-found: error
          retention-days: ${{ needs.config.outputs.retention-days }}

Note that the name is distinguished by the ${{ matrix.ghc }} version. I think we can fix the Cryptol CI by doing the same thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant