From b1f72792458cfb6cf500208260305e92c52f15f1 Mon Sep 17 00:00:00 2001 From: bitsondatadev Date: Fri, 30 Sep 2022 06:37:16 -0500 Subject: [PATCH] Add basic concept and glossary entries --- docs/src/main/sphinx/appendix.rst | 1 - docs/src/main/sphinx/appendix/glossary.rst | 86 --------- docs/src/main/sphinx/glossary.rst | 203 +++++++++++++++++++++ docs/src/main/sphinx/index.rst | 1 + docs/src/main/sphinx/overview/concepts.rst | 36 +++- docs/src/main/sphinx/security/tls.rst | 2 +- 6 files changed, 238 insertions(+), 91 deletions(-) delete mode 100644 docs/src/main/sphinx/appendix/glossary.rst create mode 100644 docs/src/main/sphinx/glossary.rst diff --git a/docs/src/main/sphinx/appendix.rst b/docs/src/main/sphinx/appendix.rst index 462acfa2c89..a8dd5fb476b 100644 --- a/docs/src/main/sphinx/appendix.rst +++ b/docs/src/main/sphinx/appendix.rst @@ -5,7 +5,6 @@ Appendix .. toctree:: :maxdepth: 1 - appendix/glossary appendix/from-hive appendix/legal-notices diff --git a/docs/src/main/sphinx/appendix/glossary.rst b/docs/src/main/sphinx/appendix/glossary.rst deleted file mode 100644 index 42a15a7dd7a..00000000000 --- a/docs/src/main/sphinx/appendix/glossary.rst +++ /dev/null @@ -1,86 +0,0 @@ -======== -Glossary -======== - -The glossary contains a list of key Trino terms and definitions. - -Terms A-E ---------- - -.. _glossCA: - -CA - Certificate Authority, a trusted organization that examines and validates - organizations and their proposed server URIs, and issues digital - certificates verified as valid for the requesting organization. - -.. _glossCert: - -Certificate - A public key `certificate - `_ issued by a CA, - sometimes abbreviated as *cert*, that verifies the ownership of a - server's keys. Certificate format is specified in the `X.509 - `_ standard. - -Terms F-J ---------- - -.. _glossJKS: - -JKS - Java KeyStore, the system of public key cryptography supported as one part - of the Java security APIs. The legacy JKS system recognizes keys and - certificates stored in *keystore* files, typically with the ``.jks`` - extension, and relies on a system-level list of CAs in *truststore* files - installed as part of the current Java installation. - -Terms K-O ---------- - -.. _glossKey: - -Key - A cryptographic key specified as a pair of public and private keys. - -.. _glossLB: - -Load Balancer (LB) - Software or a hardware device that sits on a network's outer edge or - firewall and accepts network connections on behalf of servers behind that - wall. Load balancers carefully manage network traffic, and can accept TLS - connections from incoming clients and pass those connections transparently - to servers behind the wall. - -Terms P-T ---------- - -.. _glossPEM: - -PEM - Privacy-Enhanced Mail, a syntax for private key information, and a content - type used to store and send cryptographic keys and certificates. PEM format - can contain both a key and its certificate, plus the chain of certificates - from authorities back to the root :ref:`CA `, or back to a CA - vendor's intermediate CA. - -.. _glossPKCS12: - -PKCS #12 - A binary archive used to store keys and certificates or certificate chains - that validate a key. `PKCS #12 `_ - files have ``.p12`` or ``.pfx`` extensions. - -SSL - Secure Sockets Layer, now superceded by TLS, but still recognized as the - term for what TLS does now. - -.. _glossTLS: - -TLS - `Transport Layer Security - `_ is the successor - to SSL. These security topics use the term TLS to refer to both TLS and SSL. - -Terms U-Z ---------- diff --git a/docs/src/main/sphinx/glossary.rst b/docs/src/main/sphinx/glossary.rst new file mode 100644 index 00000000000..452d0a0c7d8 --- /dev/null +++ b/docs/src/main/sphinx/glossary.rst @@ -0,0 +1,203 @@ +======== +Glossary +======== + +The glossary contains a list of key Trino terms and definitions. + +.. _glossCatalog: + +Catalog + Catalogs define and name a configuration for connecting to a data source, + allowing users to query the connected data. Each catalog's configuration + specifies a :ref:`connector ` to define which data source + the catalog connects to. For more information about catalogs, see + :ref:`trino-concept-catalog`. + +.. _glossCA: + +Certificate Authority (CA) + A trusted organization that signs and issues certificates. Its signatures + can be used to verify the validity of :ref:`certificates `. + +.. _glossCert: + +Certificate + A public key `certificate + `_ issued by a + :ref:`CA `, sometimes abbreviated as cert, that verifies the + ownership of a server's private keys. Certificate format is specified in the + `X.509 `_ standard. + +Cluster + A Trino cluster provides the resources to run queries against numerous data + sources. Clusters define the number of nodes, the configuration for the JVM + runtime, configured data sources, and others aspects. For more information, + see :ref:`trino-concept-cluster`. + +.. _glossConnector: + +Connector + Translates data from a data source into Trino schemas, tables, columns, + rows, and data types. A :doc:`connector ` is specific to a data + source, and is used in :ref:`catalog ` configurations to + define what data source the catalog connects to. A connector is one of many + types of :ref:`plugins ` + +Container + A lightweight virtual package of software that contains libraries, binaries, + code, configuration files, and other dependencies needed to deploy an + application. A running container does not include an operating system, + instead using the operating system of the host machine. To learn more, read + read about `containers `_ + in the Kubernetes documentation. + +.. _glossDataVirtualization: + +Data virtualization + `Data virtualization `_ + is a method of abstracting an interaction with multiple + :ref:`heterogeneous data sources `, without needing to know + the distributed nature of the data, its format, or any other technical + details involved in presenting the data. + +.. _glossDataSource: + +Data source + A system from which data is retrieved, for example, PostgreSQL or Iceberg on S3 + data. In Trino, users query data sources with `catalogs `_ + that connect to each source. See :ref:`trino-concept-data-sources` for more + information. + +.. _glossGzip: + +gzip + `gzip `_ is a compression format and + software that compresses and decompresses files. This format is used several + ways in Trino, including deployment and compressing files in + :ref:`object storage `. The most common extension for + gzip-compressed files is ``.gz``. + +.. _glossHDFS: + +HDFS + `Hadoop Distributed Filesystem (HDFS) `_ + is a scalable :ref:`open source ` filesystem that was one + of the earliest distributed big data systems created to store large amounts + of data for the + `Hadoop ecosystem `_. + +.. _glossJKS: + +Java KeyStore (JKS) + The system of public key cryptography supported as one part of the Java + security APIs. The legacy JKS system recognizes keys and + :ref:`certificates ` stored in *keystore* files, typically with + the ``.jks`` extension, and by default relies on a system-level list of + :ref:`CAs ` in *truststore* files installed as part of the current + Java installation. + +Key + A cryptographic key specified as a pair of public and private strings + generally used in the context of :ref:`TLS ` to secure public + network traffic. + +.. _glossLB: + +Load Balancer (LB) + Software or a hardware device that sits on a network edge and accepts + network connections on behalf of servers behind that wall, distributing + traffic across network and server infrastructure to balance the load on + networked services. + +.. _glossObjectStorage: + +Object Storage + `Object storage `_ is a file + storage mechanism that stores data in a flat namespace, as opposed + to hierarchical filesystems. Files written in object storage are immutable, + meaning you cannot update a file but just overwrite or replace the entire + file. In the context of Trino, object storage commonly refers to + `cloud storage `_ + technologies such as `Amazon S3 `_, + `Google Cloud Storage `_, and + `Azure Blob Storage `_. + In addition to cloud-hosted services, there are also local object storage + options such as `MinIO `_ and + `Ceph `_ that are compatible with S3. Object storage + became a popular replacement to :ref:`HDFS `. + +.. _glossOpenSource: + +Open-source + Typically refers to + `open-source software `_. + which is software that has the source code made available for others to see, + use, and contribute to. Allowed usage varies depending on the license that + the software is licensed under. Trino is licensed under the + `Apache license `_, and is + therefore maintained by a community of contributors from all across the + globe. + +.. _glossPlugin: + +Plugin + A bundle of code implementing the Trino + :doc:`Service Provider Interface (SPI) ` that is used + to add new :ref:`connectors `, + :doc:`data types `, :doc:`functions`, + :doc:`access control implementations `, and + other features of Trino. + +.. _glossPEM: + +PEM file format + A format for storing and sending cryptographic keys and certificates. PEM + format can contain both a key and its certificate, plus the chain of + certificates from authorities back to the root :ref:`CA `, or back + to a CA vendor's intermediate CA. + +.. _glossPKCS12: + +PKCS #12 + A binary archive used to store keys and certificates or certificate chains + that validate a key. `PKCS #12 `_ + files have ``.p12`` or ``.pfx`` extensions. This format is a less popular + alternative to :ref:`PEM `. + +Presto and PrestoSQL + The old name for Trino. To learn more about the name change to Trino, read + `the history `_. + +Query Federation + A type of :ref:`data virtualization ` that provides a + common access point and data model across two or more heterogeneous data + sources. A popular data model used by many query federation engines is + translating different data sources to :ref:`SQL ` tables. + +.. _glossSSL: + +Secure Sockets Layer (SSL) + Now superseded by :ref:`TLS `, but still recognized as the term + for what TLS does. + +.. _glossSQL: + +Structured Query Language (SQL) + The standard language used with relational databases. For more information, + see :doc:`SQL `. + +Tarball + A common abbreviation for + `TAR file `_, which is a + common software distribution mechanism. This file format is a collection of + multiple files distributed as a single file, commonly compressed using + :ref:`gzip ` compression. + +.. _glossTLS: + +Transport Layer Security (TLS) + `TLS `_ is a + security protocol designed to provide secure communications over a network. + It is the successor to :ref:`SSL `, and used in many applications + like HTTPS, email, and Trino. These security topics use the term TLS to + refer to both TLS and SSL. diff --git a/docs/src/main/sphinx/index.rst b/docs/src/main/sphinx/index.rst index b4347a97351..3abf7a78c0a 100644 --- a/docs/src/main/sphinx/index.rst +++ b/docs/src/main/sphinx/index.rst @@ -16,6 +16,7 @@ Trino documentation language sql develop + glossary appendix .. toctree:: diff --git a/docs/src/main/sphinx/overview/concepts.rst b/docs/src/main/sphinx/overview/concepts.rst index 28f2a3f1532..55e7d9f0088 100644 --- a/docs/src/main/sphinx/overview/concepts.rst +++ b/docs/src/main/sphinx/overview/concepts.rst @@ -27,11 +27,35 @@ general to most specific. provide further information about Trino and the concepts in use. -Server types +.. _trino-concept-architecture: + +Architecture ------------ -There are two types of Trino servers: coordinators and workers. The -following section explains the difference between the two. +Trino is a distributed query engine that processes data in parallel across +multiple servers. There are two types of Trino servers, +:ref:`coordinators ` and +:ref:`workers `. The following sections describe these +servers and other components of Trino's architecture. + +.. _trino-concept-cluster: + +Cluster +^^^^^^^ + +A Trino cluster consists of a :ref:`coordinator ` and +many :ref:`workers `. Users connect to the coordinator +with their :ref:`SQL ` query tool. The coordinator collaborates with the +workers. The coordinator and the workers access the connected +:ref:`data sources `. This access is configured in +:ref:`catalogs `. + +Processing each query is a stateful operation. The workload is orchestrated by +the coordinator and spread parallel across all workers in the cluster. Each node +runs Trino in one JVM instance, and processing is parallelized further using +threads. + +.. _trino-concept-coordinator: Coordinator ^^^^^^^^^^^ @@ -52,6 +76,8 @@ Trino workers. Coordinators communicate with workers and clients using a REST API. +.. _trino-concept-worker: + Worker ^^^^^^ @@ -68,6 +94,8 @@ for task execution. Workers communicate with other workers and Trino coordinators using a REST API. +.. _trino-concept-data-sources: + Data sources ------------ @@ -103,6 +131,8 @@ two Hive clusters, you can configure two catalogs in a single Trino cluster that both use the Hive connector, allowing you to query data from both Hive clusters, even within the same SQL query. +.. _trino-concept-catalog: + Catalog ^^^^^^^ diff --git a/docs/src/main/sphinx/security/tls.rst b/docs/src/main/sphinx/security/tls.rst index 5b8adb3b7f4..171174d7067 100644 --- a/docs/src/main/sphinx/security/tls.rst +++ b/docs/src/main/sphinx/security/tls.rst @@ -17,7 +17,7 @@ the foundational layer. This page discusses only how to prepare the Trino server for secure client connections from outside of the Trino cluster to its coordinator. -See the :doc:`Glossary ` to clarify unfamiliar terms. +See the :doc:`Glossary ` to clarify unfamiliar terms. .. _tls-version-and-ciphers: