-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Venice support #78
Conversation
this.sinkOptions = addKeysAsOption(options, rowType); | ||
} | ||
|
||
private Map<String, String> addKeysAsOption(Map<String, String> options, RelDataType rowType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this approach, open to suggestions.
I looked into hints to solve this and did get them working to an extent (will open a separate PR) but this would require users to pass in key information into their SQL statement. I have not figured out a way to inject hints at runtime from VeniceDriver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm surprised we need to fully specify the keys in the options. The Kafka connector has similar properties (key.prefix
, key.fields
), but you don't need both. Is the Venice connector doing something different here? I'd expect key.prefix=key_
to be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how would the Venice connector behave if we grouped the keys in a Row(...)
object? Can we just have key.fields=KEY
and then KEY ROW(F1 VARCHAR, F2 INT)
etc?
(Not suggesting we do that, just asking if possible?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm surprised we need to fully specify the keys in the options. The Kafka connector has similar properties (
key.prefix
,key.fields
), but you don't need both. Is the Venice connector doing something different here? I'd expectkey.prefix=key_
to be sufficient.
Yea I confirmed it is an issue with Venice due do some additional avro schema validation they do. They pull the keySchema and validate it against key.fields
(separate from the prefix). The key.prefix
allow these names to be different like "id" vs "key_id"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into the ROW syntax and it doesn't seem that is possible in Flink, there is no way to get Flink to destruct that ROW
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does retaining partitioning mean for the Kafka -> Venice use case? It seems like it is more of a problem on the producer side. Users that expect the same partitioning behavior would have to key their Kafka topic using the same combination of keys as Venice. We did this in Brooklin by constructing the producer key as a simple string with key values separated by _
from the source keys. Of course this isn't the same as identity partitioning but it does ensure that downstream consumer tasks read the same combo of keys.
Even for the Kafka -> Kafka use case, we aren't the ones consuming, the partitioning behavior comes from Flink right? I haven't looked into it to be fair, not sure how the behavior changes if you define key.fields
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Kafka -> Kafka use case, insert into kafka.foo select * from kafka.bar
intuitively keeps the key the same, since nothing in the SQL explicitly specifies otherwise. The effect should be that foo is a mirror of bar, with the same key and partition semantics. This is indeed the case today (internally). To the SQL engine, the input table always has an implicit KEY
column, which gets selected by select *
.
A typical use case for Kafka->Kafka pipelines is actually dropping the key and re-partitioning as round-robin. This is done today via SELECT *, NULL AS KEY...
. Hoptimator explicitly supports NULL AS KEY
, since it is not really possible to express "select everything except the key" in SQL.
I'm less familiar with the Venice use case, but I want to make sure that both Kafka->Kafka and Kafka->Venice are intuitive and somewhat consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I get what you are saying. The Venice connector as written today doesn't look like it supports the same implicit keys. It relies on explicit keys via the key.fields
table option.
Kafka->Kafka does support this today, you could have something like:
Kafka topic bar
has a payload containing two fields id
& val
.
Specifying a flink table option of "key.fields": "id"
before applying insert into kafka.foo select * from kafka.bar
will use the column id
as the key for topic foo
when doing Kafka->Kafka. If there is an actual key present in the topic bar
I don't believe it'll actually get used anywhere, it'll effectively be dropped.
A simple statement like insert into venice.foo select * from kafka.bar
will still work as long as kafka.bar
contains all the columns Venice expects, both key & value columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently select * from kafka.bar
will always include a KEY VARCHAR
, since internally all our Kafka topics are keyed with a simple string (or null). This new Venice integration expects a KEY
field, which contains all the keys. Individually, both make sense. I'm just wondering if KEY VARCHAR
and KEY Row(A ..., B ...)
need some additional magic to work interchangeably. In particular, would Kafka->Venice end up with key.fields = KEY_KEY
? Or can we detect that there is only one key and have key.fields = KEY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think what you have here with KEY_
as a magic prefix is going to be more robust than what we currently do for Kafka. Rather than try to shoehorn our existing simplistic Kafka convention here, let's adopt your approach for Kafka->Kafka as well. I think the only thing we'd need to do is change Kafka's magic column from KEY
to KEY_STRING
or something, and your code will just work. Kafka keys will just pop out as KEY Row(STRING VARCHAR)
and the connector will see key.fields = KEY_STRING
.
6128407
to
4c2ffb9
Compare
hoptimator-venice/src/main/java/com/linkedin/hoptimator/venice/VeniceStore.java
Outdated
Show resolved
Hide resolved
Makefile
Outdated
@@ -9,8 +9,8 @@ build: | |||
|
|||
bounce: build undeploy deploy deploy-samples deploy-config deploy-demo | |||
|
|||
# Integration tests expect K8s and Kafka to be running | |||
integration-tests: deploy-dev-environment deploy-samples | |||
# Integration tests expect K8s, Kafka, and Venice to be running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 🔥 🔥
this.sinkOptions = addKeysAsOption(options, rowType); | ||
} | ||
|
||
private Map<String, String> addKeysAsOption(Map<String, String> options, RelDataType rowType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm surprised we need to fully specify the keys in the options. The Kafka connector has similar properties (key.prefix
, key.fields
), but you don't need both. Is the Venice connector doing something different here? I'd expect key.prefix=key_
to be sufficient.
this.sinkOptions = addKeysAsOption(options, rowType); | ||
} | ||
|
||
private Map<String, String> addKeysAsOption(Map<String, String> options, RelDataType rowType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how would the Venice connector behave if we grouped the keys in a Row(...)
object? Can we just have key.fields=KEY
and then KEY ROW(F1 VARCHAR, F2 INT)
etc?
(Not suggesting we do that, just asking if possible?)
@@ -0,0 +1,11 @@ | |||
{ | |||
"type": "record", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to get create table venice.foo
working :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I will spend some time looking into this when I can. Should be a simple API call just as I'm doing to fetch schemas, just different than the current paradigm since it isn't managed via K8s.
// Without forced projection this will get optimized to: | ||
// INSERT INTO `my-store` (`KEYFIELD`, `VARCHARFIELD`) SELECT * FROM `KAFKA`.`existing-topic-1`; | ||
// With forced project this will resolve as: | ||
// INSERT INTO `my-store` (`KEY_id`, `stringField`) SELECT `KEYFIELD` AS `KEY_id`, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat!
hoptimator-util/src/main/java/com/linkedin/hoptimator/util/planner/ScriptImplementor.java
Outdated
Show resolved
Hide resolved
hoptimator-venice/src/main/java/com/linkedin/hoptimator/venice/LocalControllerClient.java
Outdated
Show resolved
Hide resolved
hoptimator-venice/src/main/java/com/linkedin/hoptimator/venice/VeniceDriver.java
Outdated
Show resolved
Hide resolved
hoptimator-venice/src/main/java/com/linkedin/hoptimator/venice/VeniceStore.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if (schema.startsWith("VENICE")...)
logic needs to be fixed, but I think we can accept the TODO
and fix later.
8c6b927
to
152c9cf
Compare
152c9cf
to
1973fe0
Compare
Adds Venice support to Hoptimator.
KEY$
to prevent collisionsKEY
fields through to Sink options (intended to be used by flink under key.fields connector property)insert into "VENICE-CLUSTER0"."test-store-1" ("KEY$id", "stringField") SELECT ...
Other changes in this PR:
Implemented the Venice driver/schema classes with separate overridable functions to be able to handle company-internal connection components via a simple override
See included tests for more samples