Conversation
|
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
|
Thank you for the work, @haibow |
highker
left a comment
There was a problem hiding this comment.
overall looks good. Did a very fast skim through only on coding style. Will get into logic shortly.
There was a problem hiding this comment.
Added it here to help testing locally. Is it ok to keep it here?
presto-pinot/pom.xml
Outdated
There was a problem hiding this comment.
- We usually put the dependency declaration + their version numbers in the root pom.xml.
- There are too many dependencies. I will need to dig deeper into that later on. One quick comment on this is netty has bee included by presto already. Can we reuse it?
presto-pinot/src/main/java/com/facebook/presto/pinot/PinotClusterInfoFetcher.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/main/java/com/facebook/presto/pinot/PinotClusterInfoFetcher.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/main/java/com/facebook/presto/pinot/PinotColumnUtils.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/main/java/com/facebook/presto/pinot/PinotColumnUtils.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/main/java/com/facebook/presto/pinot/PinotConfig.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/main/java/com/facebook/presto/pinot/PinotConnection.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/main/java/com/facebook/presto/pinot/PinotPageSource.java
Outdated
Show resolved
Hide resolved
bd5555a to
db95b89
Compare
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
db95b89 to
25b39bc
Compare
highker
left a comment
There was a problem hiding this comment.
First thorough pass. > 95% are minor comments. There are a few essential comments buried among all these nits. Make sure those are address.
presto-pinot/src/main/java/com/facebook/presto/pinot/Types.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/main/java/com/facebook/presto/pinot/Types.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/main/java/com/facebook/presto/pinot/PinotSplitManager.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/main/java/com/facebook/presto/pinot/PinotSplit.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/main/java/com/facebook/presto/pinot/PinotSplit.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/test/java/com/facebook/presto/pinot/PinotQueryGeneratorTest.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/test/java/com/facebook/presto/pinot/PinotSplitManagerTest.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/test/java/com/facebook/presto/pinot/PinotSplitManagerTest.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/test/java/com/facebook/presto/pinot/TestPinotColumnHandle.java
Outdated
Show resolved
Hide resolved
presto-pinot/src/test/java/com/facebook/presto/pinot/TestPinotTableHandle.java
Outdated
Show resolved
Hide resolved
cba1708 to
991e636
Compare
highker
left a comment
There was a problem hiding this comment.
There are too many unnecessary dependencies. Let's strip them out slowly. Let's focus on http client and json for this iteration. We can come back to netty in the next round.
| */ | ||
| package com.facebook.presto.pinot; | ||
|
|
||
| import com.alibaba.fastjson.JSONArray; |
There was a problem hiding this comment.
Let's use jackson (e.g., in JsonFunctions) or smile (e.g., in io.airlift.jaxrs.SmileMapper) library instead of introducing new dependency. (Also, you have already used jackson in this class).
|
|
||
| String responseBody = sendHttpGet(url); | ||
| JSONObject resp = JSONObject.parseObject(responseBody); | ||
| JSONArray routingTableSnapshots = resp.getJSONArray("routingTableSnapshot"); |
| log.debug("Ignoring routingTable for %s", tableNameWithType); | ||
| continue; | ||
| } | ||
| JSONArray routingTableEntriesArray = snapshot.getJSONArray("routingTableEntries"); |
| String url = String.format(ROUTING_TABLE_API_TEMPLATE, getBrokerHost(tableName), tableName); | ||
|
|
||
| String responseBody = sendHttpGet(url); | ||
| JSONObject resp = JSONObject.parseObject(responseBody); |
| { | ||
| final String url = String.format(GET_ALL_TABLES_API_TEMPLATE, getControllerUrl()); | ||
| String responseBody = sendHttpGet(url); | ||
| Map<String, List<String>> responseMap = objectMapper.readValue(responseBody, Map.class); |
There was a problem hiding this comment.
nit new TypeReference<Map<String, List<String>>>() {}
| PINOT_FAILURE_GETTING_TABLE(1, EXTERNAL), | ||
| PINOT_FAILURE_GETTING_SCHEMA(2, EXTERNAL), | ||
| PINOT_FAILURE_QUERYING_DATA(3, EXTERNAL), | ||
| PINOT_FAILURE_INITIATING_RESOURCES(4, INTERNAL_ERROR); |
| requestIdGenerator = new AtomicLong(0); | ||
| prestoHostId = getDefaultPrestoId(); | ||
|
|
||
| final MetricsRegistry registry = new MetricsRegistry(); |
There was a problem hiding this comment.
remove final; please exam all other local variables that have final
| <dep.httpclient.version>4.5.3</dep.httpclient.version> | ||
| <dep.yammer.version>2.2.0</dep.yammer.version> | ||
| <dep.fastjson.version>1.1.24</dep.fastjson.version> | ||
| <dep.netty.version>3.10.6.Final</dep.netty.version> |
There was a problem hiding this comment.
There are too many dependencies. @wenleix @hellium01, could you help taking a look if there is anything we could remove? For example, is it possible to use com.facebook.drift.transport.netty lib? We have explicitly excluded netty-all dep in this file.
There was a problem hiding this comment.
Instead of trying to force the specific version module / project wise - the pinot client has to be shaded (see https://github.com/prestodb/presto-cassandra-server, https://github.com/prestodb/presto-cassandra-driver)
| return zookeeperUrl; | ||
| } | ||
|
|
||
| @Config("zookeeper-uri") |
There was a problem hiding this comment.
put pinot. at the beginning of all configs. Examples can be found in CassandraClientConfig
| <dependency> | ||
| <groupId>javax.validation</groupId> | ||
| <artifactId>validation-api</artifactId> | ||
| <version>${dep.validation-api.version}</version> |
There was a problem hiding this comment.
Thanks for the contribution!
I agree with @highker there are many dependencies added. Instead of adding many dependencies, (e.g. httpclient, yammer, netty) that might conflicts with other modules.
I am wondering if it's possible to create a project like presto-pinot-client and includes such library in a shaded way? For example, take a look at https://github.com/prestodb/presto-cassandra-server, https://github.com/prestodb/presto-cassandra-driver.
@arhimondr What do you think?
|
Like what @wenleix has suggested, there are many shaded libraries in prestodb account. We will need to create another repo to put the shaded pinot. Then in prestodb/presto, we depend on the shaded version. |
arhimondr
left a comment
There was a problem hiding this comment.
Did a quick pass. Please address and let me have another look
| <groupId>org.apache.pinot</groupId> | ||
| <artifactId>pinot-api</artifactId> | ||
| <version>${dep.pinot.version}</version> | ||
| <exclusions> |
There was a problem hiding this comment.
As @wenleix mentioned, instead of having all the exclusions here - it would be nice to have a separate projects that does the exclusions / shadings for the conflicting library (similarly to https://github.com/prestodb/presto-cassandra-server, https://github.com/prestodb/presto-cassandra-driver)
| <dep.httpclient.version>4.5.3</dep.httpclient.version> | ||
| <dep.yammer.version>2.2.0</dep.yammer.version> | ||
| <dep.fastjson.version>1.1.24</dep.fastjson.version> | ||
| <dep.netty.version>3.10.6.Final</dep.netty.version> |
There was a problem hiding this comment.
Instead of trying to force the specific version module / project wise - the pinot client has to be shaded (see https://github.com/prestodb/presto-cassandra-server, https://github.com/prestodb/presto-cassandra-driver)
| <exclusions> | ||
| <exclusion> | ||
| <groupId>org.slf4j</groupId> | ||
| <artifactId>jcl-over-slf4j</artifactId> |
There was a problem hiding this comment.
We don't exclude this library. The other library that conflicts with this must be excluded.
| <artifactId>javax.servlet-api</artifactId> | ||
| <scope>test</scope> | ||
| </dependency> | ||
| <dependency> |
There was a problem hiding this comment.
These dependencies are no longer test. Should these be moved to the top?
| private static final String TABLE_SCHEMA_API_TEMPLATE = "http://%s/tables/%s/schema"; | ||
| private static final String ROUTING_TABLE_API_TEMPLATE = "http://%s/debug/routingTable/%s"; | ||
| private static final String TIME_BOUNDARY_API_TEMPLATE = "http://%s/debug/timeBoundary/%s"; | ||
| private static final CloseableHttpClient HTTP_CLIENT = HttpClients.createDefault(); |
There was a problem hiding this comment.
Don't initialize it as a static thing. Please use injection
|
|
||
| public class TestPinotQueryBuilder | ||
| { | ||
| private static List<PinotColumnHandle> columnHandles = Arrays.asList( |
| public void testGetPinotQuerySelectAll() | ||
| { | ||
| String expectedQuery = "SELECT varchar, int, secondsSinceEpoch, boolean, double FROM table LIMIT 10"; | ||
| Assert.assertEquals(expectedQuery, getPinotQuery(new PinotConfig(), columnHandles, "", "", "table", 10)); |
|
|
||
| public class TestPinotSplitManager | ||
| { | ||
| private static final PinotSplitManager pinotSplitManager = new PinotSplitManager(new PinotConnection(new PinotClusterInfoFetcher(new PinotConfig()))); |
There was a problem hiding this comment.
constants should be spelled like PINOT_SPLIT_MANAGER, etc ..
|
|
||
| public class TestPinotSplitManager | ||
| { | ||
| private static final PinotSplitManager pinotSplitManager = new PinotSplitManager(new PinotConnection(new PinotClusterInfoFetcher(new PinotConfig()))); |
There was a problem hiding this comment.
However please avoid having static fields. This is a memory leak. Pleas use instance filed + BeforeClass + AfterClass
| TupleDomain<ColumnHandle> constraintSummary = TupleDomain.withColumnDomains(domainMap); | ||
|
|
||
| String expectedFilter = "((city_id < 10)) AND (country_name IN (\"cn\",\"us\"))"; | ||
| Assert.assertEquals(expectedFilter, pinotSplitManager.getPinotPredicate(constraintSummary)); |
|
Thanks for the comments @highker @wenleix @arhimondr . I will get back on this a bit later. |
|
superseded by #13504 |
This PR introduces the presto-pinot connector. It subsumes the earlier effort in PR prestodb#13413. This presto-pinot connector works as follows: * It builds upon the new plan connector optimizer framework to push the maximal query to the pinot system * By default, aggregation and limit queries are send to the pinot broker, while just filter queries are send to the servers parallelly. This is useful when joining against another (say hive) table wherein the join needs to be parallelized and segments from pinot (fetched directly from the pinot servers) need to be joined with hive splits. If you wish to not send any queries to the broker, please set the catalog session property prefer_broker_queries to false. * You can disable parallel queries (useful for low latency deployment) we do at Uber, by setting the catalog property forbid_segmnt_queries=false. * The connector needs a way to talk to the controller (to fetch the brokers and the table schema). You can either provide the controller URI or the URI of the Muttley proxy (an RPC proxy mechanism we use at Uber). It is also possible to use the pinot-rest-proxy instead of going to the pinot-broker (again something we use for our test pinot clusters at Uber). The connector is organized into two submodules: presto-pinot-lib and presto-pinot-plugin. We chose this multi module organization because at Uber we have another internal connector that federates over presto-pinot-lib and presto-aresdb-lib (another low latency database like pinot). This is needed because each plugin is in its own classpath otherwise. This implementation detail is hidden from users of the pinot plugin. In addition the pinot connector would split the filter expressions such that expressions that can be pushed down to pinot are done so, and those that aren't are handled in presto. It handles the full range of pinot PQL and respects pinot's weirdnesses to guarantee that all results are still in the expected SQL semantics. Usage: Place the catalog file like so in the etc/catalog directory. The schema name is immaterial and it can be anything. (since pinot does not have a notion of schema). Example catalog files can be found below: ``` deveshagrawal-C02VP027J9J5:presto deveshagrawal$ cat local_run/etc/catalog/pinotnotparallel.properties connector.name=pinot pinot.controller-url=controller_host:9000 pinot.scan-parallelism-enabled=false deveshagrawal-C02VP027J9J5:presto deveshagrawal$ cat local_run/etc/catalog/pinotparallel.properties connector.name=pinot pinot.controller-url=controller_host:9000 pinot.scan-parallelism-enabled=true ``` Now given a file like this, you can query like so: ``` select date_trunc('hour', from_unixtime(time_column)), count(1) from pinotnotparallel.whatever.pinot_table_name where col = 5 group by 1 order by 2 desc ``` or ``` select * from pinotnotparallel.whatever.pinot_table_name where col = 5 limit 10 ``` or a join with a hive dimension table: ```select hive_table.city_name, count(1) from hive.warehouse.dim_table hive_table join pinotparallel.whatever.pinot_table_name pinot.city_id = hive_table.city_id and pinot.ts between cast(to_unixtime(now() - interval '1' hour) as bigint) and cast(to_unixtime(now()) as bigint) pinot.col = 5 group by 1 order by 2 desc limit 10 ``` To keep the presto side dependancies clean, the pinot dependancies were shaded and packaged into a separate driver artifact released from https://github.com/prestodb/presto-pinot-driver.
Adding Pinot connector that allows using Presto to query Apache Pinot™ (Incubating), a realtime distributed OLAP database.
This is a snapshot version running at Uber till 04/2019, minus internal features that touches core Presto engine, e.g. limit pushdown, simple aggregation pushdown (COUNT/SUM/MAX/MIN), plus a few touchups for open source. It was serving ~40k queries per day.
Note there has been a lot more features and improvements on top of this version including advanced aggregation pushdown (GROUPBY, etc). There are plans to open source the next version in the foreseeable future, with possibility of incremental contribution after de-coupling some components.
Getting Started (will prepare a .rst file for the connector documentation)
Credits to
Prerequisite
Setting up Pinot
Download and build Pinot
Start Pinot
Prepare Pinot Data
Create some Pinot segments:
Create Table:
Upload Segments:
Verify Pinot is running
Open the following url in the browser, and verify Pinot is running and serving data
Setting up Presto
Build Presto
Running Presto in IDE:
Use the following options to create a run configuration:
The working directory should be the
presto-mainsubdirectory. In IntelliJ, using$MODULE_DIR$accomplishes this automatically.Running Presto CLI
Query Pinot table through Presto