Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/src/main/sphinx/appendix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ Appendix
.. toctree::
:maxdepth: 1

appendix/glossary
appendix/from-hive
appendix/legal-notices

86 changes: 0 additions & 86 deletions docs/src/main/sphinx/appendix/glossary.rst

This file was deleted.

203 changes: 203 additions & 0 deletions docs/src/main/sphinx/glossary.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
========
Glossary
========

The glossary contains a list of key Trino terms and definitions.

.. _glossCatalog:

Catalog
Catalogs define and name a configuration for connecting to a data source,
allowing users to query the connected data. Each catalog's configuration
specifies a :ref:`connector <glossConnector>` to define which data source
the catalog connects to. For more information about catalogs, see
:ref:`trino-concept-catalog`.

.. _glossCA:

Certificate Authority (CA)
A trusted organization that signs and issues certificates. Its signatures
can be used to verify the validity of :ref:`certificates <glossCert>`.

.. _glossCert:

Certificate
A public key `certificate
<https://en.wikipedia.org/wiki/Public_key_certificate>`_ issued by a
:ref:`CA <glossCA>`, sometimes abbreviated as cert, that verifies the
ownership of a server's private keys. Certificate format is specified in the
`X.509 <https://en.wikipedia.org/wiki/X.509>`_ standard.

Cluster
A Trino cluster provides the resources to run queries against numerous data
sources. Clusters define the number of nodes, the configuration for the JVM
runtime, configured data sources, and others aspects. For more information,
see :ref:`trino-concept-cluster`.

.. _glossConnector:

Connector
Translates data from a data source into Trino schemas, tables, columns,
rows, and data types. A :doc:`connector </connector>` is specific to a data
source, and is used in :ref:`catalog <glossCatalog>` configurations to
define what data source the catalog connects to. A connector is one of many
types of :ref:`plugins <glossPlugin>`

Container
A lightweight virtual package of software that contains libraries, binaries,
code, configuration files, and other dependencies needed to deploy an
application. A running container does not include an operating system,
instead using the operating system of the host machine. To learn more, read
read about `containers <https://kubernetes.io/docs/concepts/containers/>`_
in the Kubernetes documentation.

.. _glossDataVirtualization:

Data virtualization
`Data virtualization <https://en.wikipedia.org/wiki/Data_virtualization>`_
is a method of abstracting an interaction with multiple
:ref:`heterogeneous data sources <glossDataSource>`, without needing to know
the distributed nature of the data, its format, or any other technical
details involved in presenting the data.

.. _glossDataSource:

Data source
A system from which data is retrieved, for example, PostgreSQL or Iceberg on S3
data. In Trino, users query data sources with `catalogs <glossCatalog>`_
that connect to each source. See :ref:`trino-concept-data-sources` for more
information.

.. _glossGzip:

gzip
`gzip <https://en.wikipedia.org/wiki/Gzip>`_ is a compression format and
software that compresses and decompresses files. This format is used several
ways in Trino, including deployment and compressing files in
:ref:`object storage <glossObjectStorage>`. The most common extension for
gzip-compressed files is ``.gz``.

.. _glossHDFS:

HDFS
`Hadoop Distributed Filesystem (HDFS) <https://en.wikipedia.org/wiki/Apache_Hadoop#HDFS>`_
is a scalable :ref:`open source <glossOpenSource>` filesystem that was one
of the earliest distributed big data systems created to store large amounts
of data for the
`Hadoop ecosystem <https://en.wikipedia.org/wiki/Apache_Hadoop>`_.

.. _glossJKS:

Java KeyStore (JKS)
The system of public key cryptography supported as one part of the Java
security APIs. The legacy JKS system recognizes keys and
:ref:`certificates <glossCert>` stored in *keystore* files, typically with
the ``.jks`` extension, and by default relies on a system-level list of
:ref:`CAs <glossCA>` in *truststore* files installed as part of the current
Java installation.

Key
A cryptographic key specified as a pair of public and private strings
generally used in the context of :ref:`TLS <glossTLS>` to secure public
network traffic.

.. _glossLB:

Load Balancer (LB)
Software or a hardware device that sits on a network edge and accepts
network connections on behalf of servers behind that wall, distributing
traffic across network and server infrastructure to balance the load on
networked services.

.. _glossObjectStorage:

Object Storage
`Object storage <https://en.wikipedia.org/wiki/Object_storage>`_ is a file
storage mechanism that stores data in a flat namespace, as opposed
to hierarchical filesystems. Files written in object storage are immutable,
meaning you cannot update a file but just overwrite or replace the entire
file. In the context of Trino, object storage commonly refers to
`cloud storage <https://en.wikipedia.org/wiki/Object_storage#Cloud_storage>`_
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A link for a "cloud storage" definition feels like overkill given that we immediately name and link to the most common examples

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but I want to be as generic as possible before discussing the concrete examples. That wiki does a great job of that for any newbies that might get confused by Amazon, Google, and Azure marketing.

technologies such as `Amazon S3 <https://aws.amazon.com/s3>`_,
`Google Cloud Storage <https://cloud.google.com/storage>`_, and
`Azure Blob Storage <https://azure.microsoft.com/en-us/products/storage/blobs>`_.
In addition to cloud-hosted services, there are also local object storage
options such as `MinIO <https://min.io/>`_ and
`Ceph <https://docs.ceph.com>`_ that are compatible with S3. Object storage
became a popular replacement to :ref:`HDFS <glossHDFS>`.

.. _glossOpenSource:

Open-source
Typically refers to
`open-source software <https://en.wikipedia.org/wiki/Open-source_software>`_.
which is software that has the source code made available for others to see,
use, and contribute to. Allowed usage varies depending on the license that
the software is licensed under. Trino is licensed under the
`Apache license <https://en.wikipedia.org/wiki/Apache_License>`_, and is
therefore maintained by a community of contributors from all across the
globe.

.. _glossPlugin:

Plugin
A bundle of code implementing the Trino
:doc:`Service Provider Interface (SPI) </develop/spi-overview>` that is used
to add new :ref:`connectors <glossConnector>`,
:doc:`data types </develop/types>`, :doc:`functions`,
:doc:`access control implementations </develop/system-access-control>`, and
other features of Trino.

.. _glossPEM:

PEM file format
A format for storing and sending cryptographic keys and certificates. PEM
format can contain both a key and its certificate, plus the chain of
certificates from authorities back to the root :ref:`CA <glossCA>`, or back
to a CA vendor's intermediate CA.

.. _glossPKCS12:

PKCS #12
A binary archive used to store keys and certificates or certificate chains
that validate a key. `PKCS #12 <https://en.wikipedia.org/wiki/PKCS_12>`_
files have ``.p12`` or ``.pfx`` extensions. This format is a less popular
alternative to :ref:`PEM <glossPEM>`.

Presto and PrestoSQL
The old name for Trino. To learn more about the name change to Trino, read
`the history <https://en.wikipedia.org/wiki/Trino_(SQL_query_engine)#History>`_.

Query Federation
A type of :ref:`data virtualization <glossDataVirtualization>` that provides a
common access point and data model across two or more heterogeneous data
sources. A popular data model used by many query federation engines is
translating different data sources to :ref:`SQL <glossSQL>` tables.

.. _glossSSL:

Secure Sockets Layer (SSL)
Now superseded by :ref:`TLS <glossTLS>`, but still recognized as the term
for what TLS does.

.. _glossSQL:

Structured Query Language (SQL)
The standard language used with relational databases. For more information,
see :doc:`SQL </language>`.

Tarball
A common abbreviation for
`TAR file <https://en.wikipedia.org/wiki/Tar_(computing)>`_, which is a
common software distribution mechanism. This file format is a collection of
multiple files distributed as a single file, commonly compressed using
:ref:`gzip <glossGzip>` compression.

.. _glossTLS:

Transport Layer Security (TLS)
`TLS <https://en.wikipedia.org/wiki/Transport_Layer_Security>`_ is a
security protocol designed to provide secure communications over a network.
It is the successor to :ref:`SSL <glossSSL>`, and used in many applications
like HTTPS, email, and Trino. These security topics use the term TLS to
refer to both TLS and SSL.
1 change: 1 addition & 0 deletions docs/src/main/sphinx/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Trino documentation
language
sql
develop
glossary
appendix

.. toctree::
Expand Down
36 changes: 33 additions & 3 deletions docs/src/main/sphinx/overview/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,35 @@ general to most specific.
provide further information about Trino and the concepts in use.


Server types
.. _trino-concept-architecture:

Architecture
------------

There are two types of Trino servers: coordinators and workers. The
following section explains the difference between the two.
Trino is a distributed query engine that processes data in parallel across
multiple servers. There are two types of Trino servers,
:ref:`coordinators <trino-concept-coordinator>` and
:ref:`workers <trino-concept-worker>`. The following sections describe these
servers and other components of Trino's architecture.

.. _trino-concept-cluster:

Cluster
^^^^^^^

A Trino cluster consists of a :ref:`coordinator <trino-concept-coordinator>` and
many :ref:`workers <trino-concept-worker>`. Users connect to the coordinator
with their :ref:`SQL <glossSQL>` query tool. The coordinator collaborates with the
workers. The coordinator and the workers access the connected
:ref:`data sources <trino-concept-data-sources>`. This access is configured in
:ref:`catalogs <trino-concept-catalog>`.

Processing each query is a stateful operation. The workload is orchestrated by
the coordinator and spread parallel across all workers in the cluster. Each node
runs Trino in one JVM instance, and processing is parallelized further using
threads.

.. _trino-concept-coordinator:

Coordinator
^^^^^^^^^^^
Expand All @@ -52,6 +76,8 @@ Trino workers.

Coordinators communicate with workers and clients using a REST API.

.. _trino-concept-worker:

Worker
^^^^^^

Expand All @@ -68,6 +94,8 @@ for task execution.
Workers communicate with other workers and Trino coordinators
using a REST API.

.. _trino-concept-data-sources:

Data sources
------------

Expand Down Expand Up @@ -103,6 +131,8 @@ two Hive clusters, you can configure two catalogs in a single Trino
cluster that both use the Hive connector, allowing you to query data
from both Hive clusters, even within the same SQL query.

.. _trino-concept-catalog:

Catalog
^^^^^^^

Expand Down
2 changes: 1 addition & 1 deletion docs/src/main/sphinx/security/tls.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ the foundational layer.
This page discusses only how to prepare the Trino server for secure client
connections from outside of the Trino cluster to its coordinator.

See the :doc:`Glossary </appendix/glossary>` to clarify unfamiliar terms.
See the :doc:`Glossary </glossary>` to clarify unfamiliar terms.

.. _tls-version-and-ciphers:

Expand Down