Skip to content

Map PostgreSQL JSON, JSONB to Presto JSON#81

Merged
electrum merged 1 commit intotrinodb:masterfrom
guyco33:add_support_for_postgres_json
Feb 23, 2019
Merged

Map PostgreSQL JSON, JSONB to Presto JSON#81
electrum merged 1 commit intotrinodb:masterfrom
guyco33:add_support_for_postgres_json

Conversation

@guyco33
Copy link
Member

@guyco33 guyco33 commented Jan 27, 2019

No description provided.

@findepi findepi requested a review from electrum January 27, 2019 15:28
@guyco33 guyco33 force-pushed the add_support_for_postgres_json branch from d80ab2c to 2e0f193 Compare February 1, 2019 09:55
@ebyhr
Copy link
Member

ebyhr commented Feb 4, 2019

Whereas I could SELECT json from postgresql using this commit, CTAS failed with below message.
→Resolved by appending ?stringtype=unspecified to connection-url

2019-02-04T20:42:16.812+0545	ERROR	remote-task-callback-19	io.prestosql.execution.StageStateMachine	Stage 20190204_145716_00013_4tx9w.1 failed
io.prestosql.spi.PrestoException: Batch entry 0 INSERT INTO "test"."public"."tmp_presto_c00e7b22b456488693c2b5c1a50793d2" VALUES ('{"customer":"John Doe","items":{"product":"Beer","qty":6}}') was aborted: ERROR: column "c1" is of type jsonb but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.
  Position: 83  Call getNextException to see other errors in the batch.
	at io.prestosql.plugin.jdbc.JdbcPageSink.finish(JdbcPageSink.java:186)
	at io.prestosql.operator.TableWriterOperator.finish(TableWriterOperator.java:193)
	at io.prestosql.operator.Driver.processInternal(Driver.java:397)
	at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
	at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
	at io.prestosql.operator.Driver.processFor(Driver.java:276)
	at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
	at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
	at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
	at io.prestosql.$gen.Presto_null__testversion____20190204_145200_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO "test"."public"."tmp_presto_c00e7b22b456488693c2b5c1a50793d2" VALUES ('{"customer":"John Doe","items":{"product":"Beer","qty":6}}') was aborted: ERROR: column "c1" is of type jsonb but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.
  Position: 83  Call getNextException to see other errors in the batch.
	at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:148)
	at org.postgresql.core.ResultHandlerDelegate.handleError(ResultHandlerDelegate.java:50)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2184)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:481)
	at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:840)
	at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1538)
	at io.prestosql.plugin.jdbc.JdbcPageSink.finish(JdbcPageSink.java:178)
	... 12 more
Caused by: org.postgresql.util.PSQLException: ERROR: column "c1" is of type jsonb but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.
  Position: 83
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
	... 16 more

@findepi
Copy link
Member

findepi commented Feb 4, 2019

@cla-bot cla-bot bot added the cla-signed label Feb 4, 2019
@trinodb trinodb deleted a comment from cla-bot bot Feb 4, 2019
@trinodb trinodb deleted a comment from cla-bot bot Feb 4, 2019
@trinodb trinodb deleted a comment from cla-bot bot Feb 4, 2019
@guyco33
Copy link
Member Author

guyco33 commented Feb 5, 2019

@ebyhr I already encountered this issue when I first used it to CTAS in postgres catalog and solved it by appending ?stringtype=unspecified to all postgres jdbc urls in the catalog config.
When CTAS into hive catalog I got different error: Unsupported Hive type: json but it makes sense since there is no Json type in hive.

@ebyhr
Copy link
Member

ebyhr commented Feb 5, 2019

Sorry, my update's notication might have not been sent you. I tried same thing as @guyco33 commented, it works in my environment too.

@guyco33 guyco33 force-pushed the add_support_for_postgres_json branch 3 times, most recently from 5085008 to a9bad83 Compare February 12, 2019 14:49
@guyco33
Copy link
Member Author

guyco33 commented Feb 13, 2019

@findepi Now after merging #109 the pushdown predicate in QueryBuilder (https://github.com/prestosql/presto/pull/109/files#diff-88e8a0a25b51f372e2a50ab027085b33) picks also JSON types (previously it was prevented by QueryBuilder#isAcceptedType) and it fails since JSON is not orderable. I think to make JSON orderable or to skip QueryBuilder#toPredicate for types that are not orderable

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.
We will need @electrum review for in/out conversions around JsonType, but please apply my feedback first. Please do all new changes as fixups, so i can re-review (i'll let you know when you can squash them).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall @electrum's comment saying the JsonType should not be moved to SPI

prestodb/presto#11913 (comment)

(i didn't review this class, i think there were some changes that would require review)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi Should I create a new JsonType in SPI and use the signature to correlate them ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we don't want to push down JSON predicates. Otherwise this requires extensive testing that this does not change query semantics, while this is likely not very needed feature.

To disable pushdown, use ColumnMapping.sliceMapping overload which takes UnaryOperator<Domain> pushdownConverter and pass domain -> Domain.all(domain.getType()) as the converter. Or better, rebase on #225 and use DISABLE_PUSHDOWN constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disabled pushdown and SELECT * FROM test_json where json_column = json'{"x":123}' still fails on Domain type must be orderable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi It works while skipping builder.add(toPredicate(column.getColumnName(), domain, column, accumulator)) in io.prestosql.plugin.jdbc.QueryBuilder when domain type is not orderable:

private List<String> toConjuncts(JdbcClient client, ConnectorSession session, List<JdbcColumnHandle> columns, TupleDomain<ColumnHandle> tupleDomain, List<TypeAndValue> accumulator)
{
    ImmutableList.Builder<String> builder = ImmutableList.builder();
    for (JdbcColumnHandle column : columns) {
        Domain domain = tupleDomain.getDomains().get().get(column);
        if (domain != null) {
            domain = pushDownDomain(client, session, column, domain);
            if (domain.getType().isOrderable()) {
                builder.add(toPredicate(column.getColumnName(), domain, column, accumulator));
            }
        }
    }
    return builder.build();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed a ticket -- #238

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
value -> format("JSON'%s'", value),
value -> format("JSON '%s'", value),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually applies to in a few places in your test queries. Also, by a convention, we always uppercase type name in type constructors like JSON '....' rather than json '...' (or json'...'.)

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PIng me when you address all my comments (or when you feel blocked).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace jsonPushdown() method with io.prestosql.plugin.jdbc.ColumnMapping#DISABLE_PUSHDOWN (static-import it)

here you're changing domain's type. Why so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the reason now. Can you check if #238 is sufficient? I didn't test that change with your code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your fix works for me. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: format with each arg on separate line, for readability

ColumnMapping.sliceMapping(
    JSON,
     (resultSet, columnIndex) -> jsonParse(utf8Slice(resultSet.getString(columnIndex))), 
     jsonWriteFunction(),
    jsonPushdown());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@findepi
Copy link
Member

findepi commented Feb 14, 2019

Thanks for the fixups, that was helpful. You can squash what you have so far.
Then we should move JsonType back to where it was. I will defer to @electrum on that one.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash the fixups you have so far.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
domain -> Domain.all(domain.getType()));
DISABLE_PUSHDOWN);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@guyco33 guyco33 force-pushed the add_support_for_postgres_json branch from 00c2319 to 6748f0b Compare February 15, 2019 09:37
@electrum
Copy link
Member

electrum commented Feb 15, 2019

Sorry for the confusion and delay on this PR. We'd like to get this in, but there are two main things to resolve:

  1. We shouldn't need to add the JSON type to the SPI. The intended way for connectors to access types is by looking them up via TypeManager. The name is available by StandardTypes.JSON. Instance checks can be converted to type.getTypeSignature().getBase().equals(StandardTypes.JSON).

  2. For constructing the object value for the JSON type, we should use the JSON type constructor. We need to expose this to the SPI. I will update [WIP] Add explicit support for type constructors #245 to do that after the massive function change Redesign function management abstraction #196 is landed, since it will conflict.

Please update the commit title to "Map PostgreSQL JSON to Presto JSON"

@guyco33 guyco33 changed the title map postgres json to presto json Map PostgreSQL JSON to Presto JSON Feb 15, 2019
@electrum
Copy link
Member

electrum commented Feb 15, 2019

For the JSON type constructor, it will likely be several weeks before the function and constructor PRs will land, so we can move forward on this PR using the duplicated code (and clean it up later).

We do still need to revert the part about moving JsonType into the SPI.

To get access to TypeManager:

  • Add it as a constructor parameter to PostgreSqlClient.
  • Change JdbcConnectorFactory to bind it in Guice:
Bootstrap app = new Bootstrap(
        binder -> binder.bind(TypeManager.class).toInstance(context.getTypeManager()),
        new JdbcModule(catalogName),
        module);

Then in the PostgreSqlClient constructor:

this.jsonType = typeManager.getType(new TypeSignature(StandardTypes.JSON));

@electrum
Copy link
Member

@guyco33 Thanks for your work on this so far. Please let me know when you've updated the PR so that we can do a final review and merge. Feel free to ping me on Slack if you have any questions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison below needs to be against JsonType.JSON

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, otherwise this looks good. Please address the comments and squash into a single commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this protected final (no static) since it is initialized in the constructor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need it to be static since it used in private static ColumnMapping jsonColumnMapping()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make that method non-static. Initializing a static field from a constructor is problematic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, got it, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the 2 at the end? I don't see any other usages of this name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tpch.postgresql_test_json used to be there before :) Fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove //

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add space after , and lowercase column and table names (but keep SQL keywords in uppercase)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: static import dataType and identity

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work if the value has single quotes, but that's probably fine for this test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value -> {
    checkArgument(!value.contains("'"));
    return format("JSON '%s'", value);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this private since it's not used elsewhere

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use JsonType.JSON here since it is available to test code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: static import jsonParse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, this can use JsonType.JSON

@guyco33 guyco33 force-pushed the add_support_for_postgres_json branch from a58aed2 to 8b27ae4 Compare February 18, 2019 16:12
Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready to merge. GitHub is showing a merge conflict, please rebase and then I'll merge it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to check the JDBC type here. The switch on the type name should be sufficient.

@guyco33 guyco33 force-pushed the add_support_for_postgres_json branch from 8b27ae4 to 653ee44 Compare February 23, 2019 10:27
@electrum electrum merged commit 1f70a1b into trinodb:master Feb 23, 2019
@findepi
Copy link
Member

findepi commented Feb 23, 2019

@guyco33 congrats!

@electrum electrum mentioned this pull request Feb 24, 2019
6 tasks
@findepi findepi changed the title Map PostgreSQL JSON to Presto JSON Map PostgreSQL JSON, JSONB to Presto JSON Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants