Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Failed to load Postgres table containing ENUM #9295

Closed
1 task done
hazirliver opened this issue Jun 3, 2024 · 5 comments · Fixed by #9769
Closed
1 task done

bug: Failed to load Postgres table containing ENUM #9295

hazirliver opened this issue Jun 3, 2024 · 5 comments · Fixed by #9769
Labels
bug Incorrect behavior inside of ibis datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) postgres The PostgreSQL backend
Milestone

Comments

@hazirliver
Copy link

What happened?

I encountered an issue when attempting to load a PostgreSQL table dataset_source that contains a custom ENUM data type dataset_source_enum. The error occurs both when using con.table('dataset_source') and con.sql("SELECT * FROM dataset_source;").
Steps to reproduce:

  1. Create a custom ENUM type in PostgreSQL:
CREATE TYPE "dataset_source_enum" AS ENUM (
    'GEO',
    'SRA',
);
  1. Create a table using the ENUM type:
CREATE TABLE "dataset_source"
(
    "dataset_source_id" uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    "dataset_source_name" dataset_source_enum UNIQUE,
    "dataset_source_desc" text
);
  1. Insert some data into the table:
INSERT INTO "dataset_source" ("dataset_source_name", "dataset_source_desc")
VALUES 
('GEO', 'Gene Expression Omnibus'),
('SRA', 'Sequence Read Archive'),
  1. Attempt to load the table in ibis:
import ibis
ibis.options.interactive = True

db_config = {
    'user': '***',
    'password': "***",
    'host': '***',
    'port': '***',
    'database': '***'
}

con = ibis.postgres.connect(**db_config)
con.table('dataset_source') # gives same error: con.sql("SELECT * FROM dataset_source;")

Error message is provided in Relevant log output section

What version of ibis are you using?

9.0.0

What backend(s) are you using, if any?

PostgeSQL

Relevant log output

---------------------------------------------------------------------------
ParseError                                Traceback (most recent call last)
File ~/Documents/BG/deconv_deploy/airflow_venv/lib/python3.12/site-packages/sqlglot/parser.py:1205, in Parser.parse_into(self, expression_types, raw_tokens, sql)
   1204 try:
-> 1205     return self._parse(parser, raw_tokens, sql)
   1206 except ParseError as e:

File ~/Documents/BG/deconv_deploy/airflow_venv/lib/python3.12/site-packages/sqlglot/parser.py:1244, in Parser._parse(self, parse_method, raw_tokens, sql)
   1243 if self._index < len(self._tokens):
-> 1244     self.raise_error("Invalid expression / Unexpected token")
   1246 self.check_errors()

File ~/Documents/BG/deconv_deploy/airflow_venv/lib/python3.12/site-packages/sqlglot/parser.py:1285, in Parser.raise_error(self, message, token)
   1284 if self.error_level == ErrorLevel.IMMEDIATE:
-> 1285     raise error
   1287 self.errors.append(error)

ParseError: Invalid expression / Unexpected token. Line 1, Col: 19.
  dataset_source_enum

The above exception was the direct cause of the following exception:

ParseError                                Traceback (most recent call last)
Cell In[83], line 1
----> 1 con.table('dataset_source')

File ~/Documents/BG/deconv_deploy/airflow_venv/lib/python3.12/site-packages/ibis/backends/sql/__init__.py:137, in SQLBackend.table(self, name, schema, database)
    134     catalog = table_loc.catalog or None
    135     database = table_loc.db or None
--> 137 table_schema = self.get_schema(name, catalog=catalog, database=database)
    138 return ops.DatabaseTable(
    139     name,
    140     schema=table_schema,
    141     source=self,
    142     namespace=ops.Namespace(catalog=catalog, database=database),
    143 ).to_expr()

File ~/Documents/BG/deconv_deploy/airflow_venv/lib/python3.12/site-packages/ibis/backends/postgres/__init__.py:589, in Backend.get_schema(self, name, catalog, database)
    584 if not rows:
    585     raise com.IbisError(f"Table not found: {name!r}")
    587 return sch.Schema(
    588     {
--> 589         col: type_mapper.from_string(typestr, nullable=nullable)
    590         for col, typestr, nullable in rows
    591     }
    592 )

File ~/Documents/BG/deconv_deploy/airflow_venv/lib/python3.12/site-packages/ibis/backends/sql/datatypes.py:426, in PostgresType.from_string(cls, text, nullable)
    423 if dtype := cls.unknown_type_strings.get(text.lower()):
    424     return dtype
--> 426 sgtype = sg.parse_one(text, into=sge.DataType, read=cls.dialect)
    427 return cls.to_ibis(sgtype, nullable=nullable)

File ~/Documents/BG/deconv_deploy/airflow_venv/lib/python3.12/site-packages/sqlglot/__init__.py:137, in parse_one(sql, read, dialect, into, **opts)
    134 dialect = Dialect.get_or_raise(read or dialect)
    136 if into:
--> 137     result = dialect.parse_into(into, sql, **opts)
    138 else:
    139     result = dialect.parse(sql, **opts)

File ~/Documents/BG/deconv_deploy/airflow_venv/lib/python3.12/site-packages/sqlglot/dialects/dialect.py:511, in Dialect.parse_into(self, expression_type, sql, **opts)
    508 def parse_into(
    509     self, expression_type: exp.IntoType, sql: str, **opts
    510 ) -> t.List[t.Optional[exp.Expression]]:
--> 511     return self.parser(**opts).parse_into(expression_type, self.tokenize(sql), sql)

File ~/Documents/BG/deconv_deploy/airflow_venv/lib/python3.12/site-packages/sqlglot/parser.py:1210, in Parser.parse_into(self, expression_types, raw_tokens, sql)
   1207         e.errors[0]["into_expression"] = expression_type
   1208         errors.append(e)
-> 1210 raise ParseError(
   1211     f"Failed to parse '{sql or raw_tokens}' into {expression_types}",
   1212     errors=merge_errors(errors),
   1213 ) from errors[-1]

ParseError: Failed to parse 'dataset_source_enum' into <class 'sqlglot.expressions.DataType'>

Code of Conduct

  • I agree to follow this project's Code of Conduct
@hazirliver hazirliver added the bug Incorrect behavior inside of ibis label Jun 3, 2024
@jcrist jcrist added postgres The PostgreSQL backend datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) labels Jun 5, 2024
@cpcloud
Copy link
Member

cpcloud commented Jun 10, 2024

@hazirliver Thanks for the report, looks like a bug!

@gforsyth
Copy link
Member

How do we want to approach this? All we have at parse time is (I think) the string of the datatype name, which in this case is "dataset_source_enum"

Do we want to have a naming convention around custom types? Or should we just fallback to treating things as strings if we (and sqlglot) can't recognize it?

@cpcloud
Copy link
Member

cpcloud commented Jun 12, 2024

Naturally, this is possible with a bit of elbow grease, and it's a bit annoying:

  1. Get the known types from the database (run psql -E and then run \dT+ to see the SQL used to list custom types)
  2. There are effectively three kinds of custom types: fixed-size (enums are one of these), variable length (e.g., text or binary), and tuple (named struct types essentially), so then use that to determine what the custom type corresponds to
  3. Map that into Ibis's sqlglot type converter, perhaps with a new context argument that is perhaps a mapping from custom typename -> underlying type that sqlglot knows about.

@gforsyth
Copy link
Member

This was probably resolved by #9576 if you'd like to test it out @hazirliver

@cpcloud
Copy link
Member

cpcloud commented Aug 5, 2024

This doesn't work yet, because our custom PostgresType subclass of SqlglotType doesn't handle the parse error from sqlglot, that's only handled in the parent class, and PostgresType is calling sg.parse_one directly, in its from_string implementation.

PR inbound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) postgres The PostgreSQL backend
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants