-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
The current binary packages provided by the Trino suffer from a number of issues.
- RPM packaging is barely used and tested, but creates significant work and has a negative impact on the build time.
- Tarball and docker container are large and contain all plugins, typical usage however only facilitates a small subset of the plugins
- The large artifacts result in longer download and installation times, which also affect deployments.
- Occasionally plugins maybe contain dependencies with reported security issues and therefore Trino might be flagged during security analysis at users (rightly or wrongly so).
- Custom package creation is not really supported or documented.
These issues will get worse in the near future with more connectors being merged and also more native binaries for multiple operating systems and processor architectures being required and included.
This roadmap items collects a number of related work tasks that we want to engage on. Numerous discussions took place prior to filing this issue on slack, in smaller conversations, and at the Trino Contributor Calls and Congregation.
Following are a number of sections that details tasks and ideas. Work on these can be done in parallel.
Pull out RPM ✅
The RPM is rarely used by now and we agree on removal of it from the core trino repo.
Since it is build from the tarball however it is possible to pull the rpm packaging aspects out of the core repo into a separate repository that users can use to build an RPM for any Trino version.
The repository https://github.com/simpligility/trino-packages is a first PoC implementation of this approach. The naming is generic since it can also be used for other package creation in separate modules.
Tasks to implement the removal are:
- Update and move trino-packages repo into trinodb org
- Update docs in repo to explain how to create RPM
- Update Trino website to remove RPM download and just link to docs
- Update docs to link to repo and explain that you must build the RPM yourself first before using it
- In the longer run we can maybe even remove the rpm docs from the trino docs completely and just rely on docs in the trino-packages repo
Figure out plugins for different packages ✅
We need to determine what different packages we want to offer and what each package should contain. Following is a first idea
- Minimal
- Only contains what is necessary to start Trino and use it
- So no plugins .. or maybe just memory to allow some initial testing
- Default
- Only contains commonly used plugins
- Definitely remove the following: atop, blackhole, example-http, exchange-filesystem, exchange-hdfs, geospatial, http-event-listener, http-server-event-listener, local-file, ml, mysql-event-listener, openlineage, raptor-legacy, teradata-functions, thrift
- Probably add all lakehouse so hive, hudi, delta-lake, iceberg
- Some others .. but which
Also create a docs page in installation for plugins that talks about adding and removing plugins.
Different tarballs - ✅
Once we figure desired plugins and archive variants, we should adjust the build and publishing process to publish these and update docs as well
We also then need to add docs on how to download and add additional plugins.
Different container images - ✅
Once we figure desired plugins and archive variants, we should adjust the build and publishing process to publish these and update docs as well
We also then need to add docs for:
- How to download and add additional plugins.
- How to add other packages (not package manager since that is removed...)
- How to create a docker container with different base OS from scratch maybe (could be example in packaging repo)
Plugin loading
Over time it might be even better to be able to define a URL or similar pointer to a running system and then load that plugin onto the servers and run it. of course security concerns and other aspects need to be figured out. API for these operations could (or maybe even should) be SQL command similar to the dynamic catalog management features.