Skip to content
15 changes: 15 additions & 0 deletions doc/data_science.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Data Science with Exasol
-------------------------

Exasol has significant capabilities for implementing data science workflows - from classic machine learning to Gen AI and language model solutions.

The best way to get started is with `Exasol's AI Lab <https://github.com/exasol/ai-lab>`_.

This video walks through `getting started with AI Lab <https://www.youtube.com/watch?v=LkqdLlRF2Go>`_.

AI Lab includes various workbooks that you can run to load data into Exasol.
This video walks through `loading data <https://www.youtube.com/watch?v=-t1q6CeswJs&t=1s>`_ in more detail.

If you want to leverage Exasol to build Gen AI and LM-based solutions we recommend starting with the Exasol `Transformers Extension <https://github.com/exasol/transformers-extension>`_.

This video showcases the potential `applications of the Exasol Transformers Extension <https://www.youtube.com/watch?v=sHSnCR71kyc>`_ .
14 changes: 13 additions & 1 deletion doc/distributed_python/advanced.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,14 @@
Advanced
--------
--------

From a performance perspective, which programming language you should use in an UDF script depends on the purpose and context of the script, as specific elements may have different capacities in each language. For example, string processing can be faster in one language while XML parsing can be faster in another. This means that one language cannot be said to have better performance in all circumstances. However, if overall performance is the most important criteria, we recommend using Lua. Lua is integrated in Exasol in the most native way, and therefore, it has the smallest process overhead.

During the processing of a SELECT statement, multiple virtual machines are started for each script and node. These virtual machines process the data independently. For scalar functions, the input rows are distributed across those virtual machines to achieve maximum parallelism. For SET input tuples, the virtual machines are used per group if you specify a GROUP BY clause. Otherwise, there will be only one group, which means only one node and virtual machine can process the data.

The following pages contain information about more advanced UDF functionality:

* `UDF Instance Limiting <https://docs.exasol.com/db/latest/database_concepts/udf_scripts/udf_instance_limit.htm>`_

* `Hiding Access tokens and secrets <https://docs.exasol.com/db/latest/database_concepts/udf_scripts/hide_access_keys_passwords.htm>`_

* `Managing Script Language Containers <https://docs.exasol.com/db/latest/database_concepts/udf_scripts/adding_new_packages_script_languages.htm>`_
4 changes: 3 additions & 1 deletion doc/distributed_python/debugging.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
Debugging
---------
---------

For Python versions 3.x, we recommend using `pyexasol <https://exasol.github.io/pyexasol/master/index.html>`_ and the `script output functionality <https://exasol.github.io/pyexasol/master/user_guide/udf_script_output.html>`_ to debug your UDFs.
10 changes: 9 additions & 1 deletion doc/distributed_python/intro.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,10 @@
Intro to UDFs
-------------
-------------

UDF scripts allow you to program your own analysis, processing, and generation functions, and to execute these functions in parallel inside an Exasol cluster.
By using UDF scripts, you can solve problems that are not possible to solve with SQL statements.

Exasol supports the programming languages Java, Lua, R, and Python in UDF scripts. These languages provide different functionalities (for example, statistical functions in R) and different libraries.

UDFs are the key to unlocking much of Exasol's AI, ML and Data Science potential, as well as customizing Exasol to suit your unique use cases.
UDFs are executed by Exasol's massively parallel query engine and scale across available hardware in the same way SQL queries do - this gives them significant performance potential.
30 changes: 29 additions & 1 deletion doc/distributed_python/usage.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,30 @@
Creating and running UDFs
-------------------------
-------------------------

In the CREATE SCRIPT command, you must define the type of input and output values.
There are two types of UDF inputs (set and scalar) and two types of UDF outputs (returns and emits).
These can be combined as needed to suite your use case.

- Input values

- **SCALAR** Specifies that the script processes single input rows. The code is therefore called once per input row.

- **SET** Specifies that the processing refers to a set of input rows. Within the code, you can iterate through those rows.

- Output values

- **RETURNS** Specifies that the script returns a single value.

- **EMITS** Specifies that the script can create (emit) multiple result rows (tuples).

Each UDF script must contain the main function run(). This function is called with a parameter providing access to the input data of Exasol. If your script processes multiple input tuples (using SET), you can iterate through the single tuples using this parameter.
You can specify an ORDER BY clause either when creating a script or when calling it. This clause sorts the processing of the groups of SET input data. If it is necessary for the algorithm, you should specify this clause when creating the script to avoid wrong results due to misuse.

Input parameters in scripts are always case sensitive, similar to the script code. This is different to SQL identifiers, which are only case sensitive if they are delimited.

You can use this `UDF Generator <https://htmlpreview.github.io/?https://github.com/EXASOL/script-languages/blob/master/udf-script-signature-generator/udf-script-signature-generator.html>`_ to help you get started building your own UDFs.

Examples
^^^^^^^^^

You can view examples of UDFs `here <https://docs.exasol.com/db/latest/database_concepts/udf_scripts/udf_examples.htm>`_.
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Documentation and resources for data scientists and programmatic users to perfor
getting_started
data_ingestion
distributed_python/index.rst
data_science
examples
environments
integrations
Expand Down
2 changes: 1 addition & 1 deletion doc/integrations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,4 +60,4 @@ Ibis

Please refer to the `IBIS documentation <https://ibis-project.org/backends/exasol>`_.


You can also watch `this video <https://www.youtube.com/watch?v=0YaQo3o5ePI&t=2s>`_ for a step by step walk through of using Ibis with Exasol via AI Lab.
Loading