feat(E6data): initial implemantation for E6data SQL Analytics platform #9517

hackintoshrao · 2024-07-04T14:20:11Z

Description of changes

This pull request aims to integrate E6data, a distributed SQL analytics engine, with Ibis. E6data is also designed for high-performance analytics on large-scale data, and this integration will allow Ibis users to leverage E6data's capabilities seamlessly.

Key changes and considerations:

E6data Backend Implementation:
- A new E6data backend class was added, inheriting from SQLBackend.
- Implemented connection handling, query execution, and result fetching specific to E6data.
- Adapted the backend to handle E6data's unique features, such as catalog support and cluster management.
SQL Dialect Customization:
- Created an E6data dialect class extending from MySQL, as E6data shares similarities with MySQL syntax.
- Customized the Tokenizer to use double quotes for identifiers.
- Modified the Generator to map certain data types (VARCHAR, CHAR, TEXT) to STRING, aligning with E6data's type system.
- Custom TRANSFORMS for specific SQL functions like concat and length were added.
Compiler Modifications:
- Updated E6DataCompiler to use the new E6data dialect and E6DataType for type mapping.
- Retained existing rewrites, including a custom limit rewrite, to ensure compatibility with E6data's query execution model.
Connection String and Authentication:
- Implemented support for E6data's connection string format, including catalog name, secure connection, auto-resume, and cluster UUID parameters.
Schema and Metadata Handling:
- Adapted schema retrieval and table listing functions to work with E6data's multi-level hierarchy (catalog, database, table).
Testing:
I need some guidance on how to go about adding tests for the integration.
Documentation:
Could you give me some pointers on how to add relevant documentation?
Dependencies:
- Currently, the platform is not open for public access; existing users require authentication keys to use the analytics engine. I would like guidance on enabling the maintainers to test and provide credentials for automated tests once they are supported.
  We're also working on a Mini-kube-based single-node testing infrastructure, which might make adding testing automation for the CI easier.

I'm new to the Ibis community, and this PR could be much better. I appreciate the time and guidance from maintainers in improving it further. Any comments are welcome; again, I appreciate your time and patience.

…ince it doesn't support transactions.

- Add support for catalog, secure connection, auto-resume, and cluster UUID - Implement custom table() method to handle catalog and database hierarchy - Modify get_schema() to use E6Data-specific column information - Adjust execute() method for E6Data compatibility - Update _fetch_from_cursor() to handle E6Data result format

- Customize Tokenizer to use double quotes for identifiers - Modify Generator to map VARCHAR, CHAR, and TEXT to STRING - Add custom TRANSFORMS for concat and length functions

- Add E6data dialect - Use E6DataType for type mapping - Retain existing rewrites including custom limit rewrite - Keep other compiler configurations unchanged

cpcloud

Thanks for the PR! Did a first pass, and will add a separate comment addressing some of your questions.

cpcloud · 2024-07-04T15:24:16Z

pyproject.toml

@@ -87,6 +87,7 @@ shapely = { version = ">=2,<3", optional = true }
 # issues with versions <3.0.2
 snowflake-connector-python = { version = ">=3.0.2,<4,!=3.3.0b1", optional = true }
 trino = { version = ">=0.321,<1", optional = true }
+e6data-python-connector = { version = "2.2.0" }


Suggested change

e6data-python-connector = { version = "2.2.0" }

e6data-python-connector = { version = ">=2.2.0,<3", optional = true }

Unless there's a reason to pin to a single version, this should include a version range, and optional = true is necessary to ensure this is shipped as an extra, and not a required dependency to use any other Ibis backend(s).

cpcloud · 2024-07-04T15:26:00Z

ibis/backends/sql/dialects.py

@@ -403,6 +403,7 @@ class Generator(Postgres.Generator):
        JSON_TYPE_REQUIRED_FOR_EXTRACTION = True
        SUPPORTS_UNLOGGED_TABLES = True

+


Suggested change

Please revert this change.

cpcloud · 2024-07-04T15:28:05Z

ibis/backends/e6data/__init__.py

+from urllib.parse import parse_qs, urlparse
+
+import numpy as np
+import pymysql


Are you using this somewhere? If not, pleas remove it as an e6data backend dependency and also remove this import.

cpcloud · 2024-07-04T15:29:04Z

ibis/backends/e6data/tests/conftest.py

This doesn't seem like the correct implementation for an e6data conftest.py given that it's not running in a container.

Removed the tests until implemented correctly.

cpcloud · 2024-07-04T15:29:59Z

pyproject.toml

@@ -152,6 +153,7 @@ dask = ["dask", "regex", "packaging"]
 datafusion = ["datafusion"]
 druid = ["pydruid"]
 duckdb = ["duckdb"]
+e6data = ["e6data-python-connector","pymysql"]


Suggested change

e6data = ["e6data-python-connector","pymysql"]

e6data = ["e6data-python-connector"]

Is pymysql actually required? It doesn't seem to be used anywhere.

cpcloud · 2024-07-04T15:30:57Z

ibis/backends/e6data/compiler.py

+
+
+@public
+class E6DataCompiler(SQLGlotCompiler):


Why not inherit from MySQLCompiler?

cpcloud · 2024-07-04T15:31:41Z

ibis/backends/e6data/__init__.py

+    import polars as pl
+    import pyarrow as pa
+
+class Backend(SQLBackend, CanCreateDatabase):


With all the copypasting it might be worth considering inheriting from MySQLBackend.

cpcloud · 2024-07-04T15:32:34Z

ibis/backends/e6data/__init__.py

+        return ".".join(matched.groups())
+    def _from_url(self, url: str, **kwargs):


Please make sure to run your code through just fmt. There are likely other places where there are formatting inconsistencies.

cpcloud · 2024-07-04T16:06:21Z

Testing:
I need some guidance on how to go about adding tests for the integration.

The best place to start is to try and run pytest -m e6data, assuming you have a way to access it from your development host.

Documentation:
Could you give me some pointers on how to add relevant documentation?

Take a look at the individual backend docs pages in docs/backends/. Those are a good place to start with backend-specific docs.

Dependencies:

Currently, the platform is not open for public access; existing users require authentication keys to use the analytics engine. I would like guidance on enabling the maintainers to test and provide credentials for automated tests once they are supported.
We're also working on a Mini-kube-based single-node testing infrastructure, which might make adding testing automation for the CI easier.

We'll need to get whatever credentials/auth information is needed to login into a GitHub Actions secret. Let's chat in a DM on Zulip about this.

- Change Backend class to inherit from MySQLBackend instead of SQLBackend - Remove unnecessary imports and methods duplicated in MySQLBackend - Update connection handling to use E6data_python_connector - Modify schema and table retrieval methods for E6data compatibility - Replace MySQLPandasData with E6DataPandasData for data conversion - Clean up and streamline code, removing print statements and unused methods

hackintoshrao · 2024-07-11T11:29:18Z

Hey @cpcloud ,

I've addressed most of your comments apart from the tests and documentation. Please let me know if these changes look good. I'll be working on writing the tests now.

We'll need to get whatever credentials/auth information is needed to login into a GitHub Actions secret. Let's chat in a DM on Zulip about this.

I've Dm'ed you on Zulip, kindly need your assistance.

hackintoshrao added 12 commits June 10, 2024 19:24

Adding the compiler, and Backend implementation for E6data.

7b98b77

Add e6data_python_connector import and update do_connect method

13a196c

Add e6data-python-connector dependency

2f142b0

Update Backend name and add _from_url method

876820b

Remove unnecessary transaction rollback in Backend class for E6data s…

ccec1e0

…ince it doesn't support transactions.

Refactor database listing and query execution

6b762f4

Add E6data dialect to SQL generator

942277e

- Customize Tokenizer to use double quotes for identifiers - Modify Generator to map VARCHAR, CHAR, and TEXT to STRING - Add custom TRANSFORMS for concat and length functions

Adding E6Datatype class.

2332c5b

Update E6DataCompiler to use E6data dialect and E6DataType

6436b16

- Add E6data dialect - Use E6DataType for type mapping - Retain existing rewrites including custom limit rewrite - Keep other compiler configurations unchanged

remove debug print statements.

76aa687

Merge conflict fix.

729e1e9

cpcloud requested changes Jul 4, 2024

View reviewed changes

hackintoshrao added 7 commits July 8, 2024 09:05

Update e6data-python-connector version in pyproject.toml

c4c2644

Remove empty line in class definition

1ba682e

Removing the tests till they are implemented correctly.

daef522

Inherting compiler operations from MySQL.

7ce06cf

Creating a placeholder E6data data converter.

20f875f

Remove pymysql from e6data dependencies

c8520f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(E6data): initial implemantation for E6data SQL Analytics platform #9517

feat(E6data): initial implemantation for E6data SQL Analytics platform #9517

hackintoshrao commented Jul 4, 2024 •

edited

Loading

cpcloud left a comment

cpcloud Jul 4, 2024

cpcloud Jul 4, 2024

cpcloud Jul 4, 2024

cpcloud Jul 4, 2024

hackintoshrao Jul 8, 2024

cpcloud Jul 4, 2024

cpcloud Jul 4, 2024

cpcloud Jul 4, 2024

cpcloud Jul 4, 2024

cpcloud commented Jul 4, 2024

hackintoshrao commented Jul 11, 2024

	e6data-python-connector = { version = "2.2.0" }
	e6data-python-connector = { version = ">=2.2.0,<3", optional = true }

		@@ -403,6 +403,7 @@ class Generator(Postgres.Generator):
		JSON_TYPE_REQUIRED_FOR_EXTRACTION = True
		SUPPORTS_UNLOGGED_TABLES = True

	e6data = ["e6data-python-connector","pymysql"]
	e6data = ["e6data-python-connector"]

		return ".".join(matched.groups())
		def _from_url(self, url: str, **kwargs):

feat(E6data): initial implemantation for E6data SQL Analytics platform #9517

Are you sure you want to change the base?

feat(E6data): initial implemantation for E6data SQL Analytics platform #9517

Conversation

hackintoshrao commented Jul 4, 2024 • edited Loading

Description of changes

Key changes and considerations:

cpcloud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cpcloud commented Jul 4, 2024

hackintoshrao commented Jul 11, 2024

hackintoshrao commented Jul 4, 2024 •

edited

Loading