Skip to content
This repository has been archived by the owner on Feb 9, 2021. It is now read-only.

Prebuilt Ruby ADR #49

Open
wants to merge 5 commits into
base: adrs
Choose a base branch
from
Open

Prebuilt Ruby ADR #49

wants to merge 5 commits into from

Conversation

bryanmacfarlane
Copy link
Member

@bryanmacfarlane bryanmacfarlane commented Jan 12, 2020

👉 Rendered 👈

@bryanmacfarlane
Copy link
Member Author

Just starting. Need to flush out more and dive a bit more into design specifics

Copy link
Contributor

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me so far


Offering each version as an individual package / release offers queryable APIs and the ability to convey whether pre-release or not by version using the packaging and release features.

This is fairly straight forward to come up with a scheme to complete automate with a workflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not straightforward in the case building fails, as explained in #48 (comment).
How do you plan to address that? Modify the create-release and upload-release-asset to allow using them for patching an existing release?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the runner runs a graph of jobs (for each arch), each job uploads an artifact and then there's a job that depends on all those which finally aggregates all and creates the release. If any job fails the job that creates the release doesn't run.

We could also easily do another approach which is to create the release in pre-release and then each job modifies the release and then either a final job node flips it or there's another canary set of workflows that validates all E2E functionality which flips from pre-release to release (runner does that).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each job uploads an artifact and then there's a job that depends on all those which finally aggregates all and creates the release.

Uploading release assets requires the release to be created already. It could be in draft mode, but there is no premade action under github/actions to turn a draft release into a public release AFAIK.
Using download-artifact/upload-artifact workarounds that at the cost of extra complexity.

I see, https://github.com/actions/runner/blob/a727194742dd7b28c265527f71e2a3669b71516a/.github/workflows/release.yml does something like you mention.
https://github.com/actions/github-script seems a way to workaround the limitations of the create-release and upload-release-asset actions, but it's more verbose and it's JavaScript in a YAML string. It also includes the list of platforms/versions twice.

Anyway, since I guess this is a part GitHub will maintain, best to leave it to you.
Readability would still be good if the community wants to help building new versions or understand how all this works.


Option 1: GitHub GPR Universal Package

The `actions/build-ruby` repo offers packages for [example](https://github.com/actions/setup-ruby/packages).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between packages and releases?

From https://github.com/features/packages#pricing I can see 500MB (total) could quickly become too tight for hosting all Ruby versions. Releases don't have such restrictions AFAIK (just no single file >2GB).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm discussing with packaging folks. But essentially, packaging has full semver support without putting that burden on the action or some other index json or api.

I think if we went the packaging route, then each distribution, per version, per arch would be an independent package (how big is a discrete ruby? 10 - 20 MB?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/eregon/ruby-install-builder/releases/tag/builds-bundler1
So it's between 10MB and 150MB. ~35MB for Ruby on Linux.

I think Rubyists are BTW more used to 2.6 rather than 2.6.x. In use-ruby-action I just use startsWith on an ordered list of versions per engine, and it's very simple in the end:
https://github.com/eregon/use-ruby-action/blob/933f49684485836830e31e48a48835cba066dc55/index.js#L77-L79


### Setup action

Repo releases are also queryable via an [http api](https://developer.github.com/v3/repos/releases/). This allows the `actions/setup-ruby` action to query the versions. match the semver version spec against the list and resolve the latest matching the spec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the list of releases via https://developer.github.com/v3/repos/releases/#list-releases-for-a-repository might take some time, especially if there are many releases, the JSON responses are paginated, and therefore multiple requests are issued.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, another reason to use packages. However, in practice I don't believe it will be a problem. Note that the tool-cache lib queries the cache first and the VM would populate latest versions so it short circuits and never queries even if the cache can satisfy the semver. Also, exact version specs don't need to query at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forwarded this to our teams that gen images and also our packaging folks. There's other work going on in these areas so it would be great to drive to consensus with community ❤️

@bryanmacfarlane
Copy link
Member Author

@MSP-Greg
Copy link
Contributor

Comments from #55

Note ruby-build is a build solution, not a trusted distribution.

My point was that, given a known build system, a repo with a collection of individuals along with possibly some GH staff, might be considered a 'trusted source' for binaries...

I'm discussing with that team how we could consume the same store of binaries. The vm gen and this action should pull from the same trusted store.

Agreed. I may have missed it, but I don't think you've stated that before. It would certainly be the best solution (cache the latest teeny releases, but 'download-on-demand' for others).

@bryanmacfarlane
Copy link
Member Author

Updated with a build and cache option


Other tools like offer a [distribution](https://nodejs.org/dist/) which offers a [queryable endpoint](https://nodejs.org/dist/index.json) which allows the desired version [by a version spec which is semver](https://github.com/actions/setup-node#setup-node).

Other CI services like Travis support pulling a wide variety of [ruby versions offered here](http://rubies.travis-ci.org/).

Requirements:
- Offer prebuilt Ruby, JRuby, TruffleRuby versions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add TruffleRuby in the Goals above since it's removed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm focusing this change request on (1) getting versions of Ruby at runtime and (2) self-hosted scenarios for on-prem and GHES scenarios (hi pri right now). We can write another change request on adding variations. We still haven't done a good job with Ruby ...

**Cons**

Will not work with self-hosted scenarios
Will not work across all platforms the runner supports
Copy link
Contributor

@eregon eregon Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why it would not work on all platforms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It effectively won't work against all linux variants (rh, ubuntu, arch, arm etc.) times all versions (18.04, 18.10 etc) which our runner currently supports. From discussions, this is why Ruby only officially supports building from source - the environmental and dependency differences. That is unless you want to build an impossible matrix of platform, all versions, arch etc. prebuilt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I understood that as "all the platforms GitHub-hosted runners support".
So it's related to the point above about self-hosting and more platforms.


**Cons**

The first build on a miss will be slower but will not fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That strategy would be similar to https://github.com/clupprich/ruby-build-action.
Yes, that will be very slow, like 5 minutes, and requires installing extra system dependencies at runtime: https://github.com/eregon/ruby-install-builder/runs/380224237

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also do so many duplicate builds of the same version in many repositories, i.e., I think it would waste resources significantly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below. We need to look at first build per org and only on cache miss ( folks that bound to specific version ). The mainline minor binding bypasses all of this with vm cache of latest versions

Copy link
Member Author

@bryanmacfarlane bryanmacfarlane Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the miss on hosted (single use), it would be cached in the same data center for future builds and very fast downloads. On self-hosted, it would live past that job and be in the machines cache and instantly use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember that the hosted VMs are caching the latest (and possibly n -1 ) of every minor version so this is only first build for folks that pin back to an old specific version.


**Pros**

On a toolkit cache miss, the first build is slower but not that slow since it's pre-built.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It downloads in less than 5 seconds, I don't think any strategy would be faster (as long as there are not all versions in the base image).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s only the first build that’s slower. And it’s ultimately not about pure speed. It’s about the huge breadth of platforms, self hosted and GHES scenarios this affords. In addition alignment with the only officially supported dist model. This is early and we need numbers and analysis. To early to make a call

Copy link

@thejoebourneidentity thejoebourneidentity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the currently recommended approach between the two? Compatibility with self-hosted seems interesting. Is that the recommendation?

@ethomson
Copy link
Contributor

ethomson commented Apr 1, 2020

👋 Following up here: @eregon, we've set up all the necessary bits to use third-party actions in our starter workflow templates. Would you want to open a PR in https://github.com/actions/starter-workflows with a template that uses your action?

@eregon
Copy link
Contributor

eregon commented Apr 1, 2020

@ethomson That sounds great! PR created: actions/starter-workflows#448

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants