Skip to content

[Part 1] Add geo support#5654

Merged
Jackie-Jiang merged 10 commits intoapache:masterfrom
yupeng9:geo
Jul 23, 2020
Merged

[Part 1] Add geo support#5654
Jackie-Jiang merged 10 commits intoapache:masterfrom
yupeng9:geo

Conversation

@yupeng9
Copy link
Contributor

@yupeng9 yupeng9 commented Jul 2, 2020

Description

First part of #5280. Design doc

This PR added the following

add geo-spatial data model

The data model includes both geometry and geography, which is differentiated by a spatial reference identifier (SRID). Notably, uses SRID=4326 as the coordinate system of lat/lng per https://epsg.io/4326.

add serde

Added the serialization/deserialization from geo-spatial value to bytes with kryo library. Also added a benchmark for performance evaluation

Benchmark result: https://gist.github.com/yupeng9/8e2b081ffb372593492ebb6a41da97fd

add geospatial functions

geo constructors

  • ST_GEOG_FROM_TEXT
  • ST_GEOG_FROM_WKB
  • ST_GEOM_FROM_TEXT
  • ST_GEOM_FROM_WKB
  • ST_POINT
  • ST_POLYGON

geo measurements

  • ST_AREA (the geography area implementation is similar to Presto's)
  • ST_DISTANCE (the great circle distance implementation is similar to Presto's)
  • ST_GEOMETRY_TYPE

geo outputs

  • ST_AS_BINARY
  • ST_AS_TEXT

geo relationship

  • ST_CONTAINS
  • ST_EQUALS

Updates to MeetupRsvp quickstart example

Added a new location field from the longitude and latitude of the event, using an inbuilt stPoint transform function

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • No

Does this PR fix a zero-downtime upgrade introduced earlier?

  • No

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Yes, added a new experimental feature

Documentation

If you have introduced a new feature or configuration, please add it to the documentation as well.
See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document

@yupeng9 yupeng9 changed the title add geo support [Part 1] Add geo support Jul 2, 2020
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz ignore this, seems some search/replace error

Copy link
Member

@kishoreg kishoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! Looks much simpler than I thought. Will the BYTES column for StPoint be dictionary encoded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this function do? please add java docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. This is an abstract class for implementing the geo constructor functions like StGeomFromText and StGeogFromText

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function -> functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, then I cannot define scalar function in this package for inbuilt transformation functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kishoreg
Copy link
Member

kishoreg commented Jul 4, 2020

Add sample queries to the description and also update the java docs.

@siddharthteotia
Copy link
Contributor

@yupeng9 , I would like to review this PR. Please give a day or two to go over the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Eclipse license ok to add? So far we have taken Apache/MIT/Gnu.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is okay to include.
Per https://www.apache.org/legal/resolved.html#category-a, Eclipse Distribution License 1.0 can be included. And JTS is dual-licensed under Eclipse Public License 2.0 and Eclipse Distribution License 1.0 (https://github.com/locationtech/jts#license)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tab space indicates not following Pinot code-styling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize list with size if known.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Pinot code style (name of member variables starts with _ to avoid qualifying with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a Static Map, if this list has a chance to grow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is unlikely to grow, given the OGC geo is a well-defined standard per https://www.ogc.org/standards/sfa

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utils.rethrow will preserve the original exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to see this util

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add the benchmark results in the PR description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if LOGGER should be used here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed. it was for debugging purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated to this PR? Would be good to call it out in the description, along with the motivation for the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Reverted this since it was for local debugging purpose.

@yupeng9
Copy link
Contributor Author

yupeng9 commented Jul 5, 2020

@kishoreg @mayankshriv thanks for the review. Will address the comments.
@siddharthteotia No hurry. Please take your time.

Copy link
Contributor Author

@yupeng9 yupeng9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments addressed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is okay to include.
Per https://www.apache.org/legal/resolved.html#category-a, Eclipse Distribution License 1.0 can be included. And JTS is dual-licensed under Eclipse Public License 2.0 and Eclipse Distribution License 1.0 (https://github.com/locationtech/jts#license)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is unlikely to grow, given the OGC geo is a well-defined standard per https://www.ogc.org/standards/sfa

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. This is an abstract class for implementing the geo constructor functions like StGeomFromText and StGeogFromText

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to see this util

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed. it was for debugging purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Reverted this since it was for local debugging purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the Pinot coding convention of using underscore as the prefix for member variables. Same for other classes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example of Pinot coding convention

Suggested change
this.multitype = multitype;
_multitype = multitype;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a pass, updated the style

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jackie-Jiang any reason we set checkstyle severity to warning but not error. I saw we have such a rule for member variable, but the maven checkstyle does not fail the build

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. We can switch it to error once we fixed all the existing code with wrong code styles. Opened issue #5675 to track this

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level question, why are we using JTS library to handle both geometry as well as geography? Shouldn't we use ESRI for geography?

@yupeng9
Copy link
Contributor Author

yupeng9 commented Jul 9, 2020

High level question, why are we using JTS library to handle both geometry as well as geography? Shouldn't we use ESRI for geography?

Good call.
There are two reasons, as I found out during prototyping:

  • Performance. I found the serialization performance JTS is better than ESRI. This is also confirmed by Presto project, as they are migrating from ESRI to JTS: Use JTS instead of ESRI for geometries, Parts 1 and 2 prestodb/presto#13604
  • Simplicity on the dependency. With one less dependency, it's easier to manage the implementation. As in my current way, the geometry to geography conversion is just a simple change of coordinate system (SRID).

The tradeoff of taking this approach is that JTS is a library for Euclidean planar linear geometry, so all the geography-related operations have to be implemented using JTS's primitives. That's why there is some lengthy logic on geography measurement functions. Those implementations are similar to what Presto is doing.

Will update the design doc to reflect this change

@Jackie-Jiang
Copy link
Contributor

High level question, why are we using JTS library to handle both geometry as well as geography? Shouldn't we use ESRI for geography?

Good call.
There are two reasons, as I found out during prototyping:

  • Performance. I found the serialization performance JTS is better than ESRI. This is also confirmed by Presto project, as they are migrating from ESRI to JTS: prestodb/presto#13604
  • Simplicity on the dependency. With one less dependency, it's easier to manage the implementation. As in my current way, the geometry to geography conversion is just a simple change of coordinate system (SRID).

The tradeoff of taking this approach is that JTS is a library for Euclidean planar linear geometry, so all the geography-related operations have to be implemented using JTS's primitives. That's why there is some lengthy logic on geography measurement functions. Those implementations are similar to what Presto is doing.

Will update the design doc to reflect this change

For ser-de, we are using the customized serializer, so I don't think there will be performance difference between these 2 libraries?
For geography operations, we should leverage ESRI for the calculation (or are you planing to implement all geography operations yourself?).
We need to support both geometry (SRID 0) and geography (SRID 4326) operations:

  • If ESRI supports both geometry and geography operations, we can use only ESRI to do the calculation
  • If not, we should use JTS for geometry and ESRI for geography calculation

@yupeng9
Copy link
Contributor Author

yupeng9 commented Jul 10, 2020

High level question, why are we using JTS library to handle both geometry as well as geography? Shouldn't we use ESRI for geography?

Good call.
There are two reasons, as I found out during prototyping:

  • Performance. I found the serialization performance JTS is better than ESRI. This is also confirmed by Presto project, as they are migrating from ESRI to JTS: prestodb/presto#13604
  • Simplicity on the dependency. With one less dependency, it's easier to manage the implementation. As in my current way, the geometry to geography conversion is just a simple change of coordinate system (SRID).

The tradeoff of taking this approach is that JTS is a library for Euclidean planar linear geometry, so all the geography-related operations have to be implemented using JTS's primitives. That's why there is some lengthy logic on geography measurement functions. Those implementations are similar to what Presto is doing.
Will update the design doc to reflect this change

For ser-de, we are using the customized serializer, so I don't think there will be performance difference between these 2 libraries?

Though we use customized serializer, there could be some difference due to the internal representation of the fields, their accessor implementations. The PR I linked above shows about the 20% difference.

Another notable reason is that JTS conforms to the ISO standards better. I believe this is the primary reason that Presto community decided to move from ESRI to JTS. I suggest we take the lessons learned from them.

Lastly, many users query Pinot via the Presto connector, so it's also a desirable property that Pinot geo functions return same or similar results as Presto's for better unification.

For geography operations, we should leverage ESRI for the calculation (or are you planing to implement all geography operations yourself?).

There are not too many geographical functions, so I believe its implementation is still manageable.

We need to support both geometry (SRID 0) and geography (SRID 4326) operations:

  • If ESRI supports both geometry and geography operations, we can use only ESRI to do the calculation
  • If not, we should use JTS for geometry and ESRI for geography calculation

@yupeng9
Copy link
Contributor Author

yupeng9 commented Jul 17, 2020

@Jackie-Jiang I changed the relationship functions to work only for the Geometry objects, to align with Presto's behavior. PTAL

// Special type for annotation based scalar functions
SCALAR("scalar");
SCALAR("scalar"),
// geo constructors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Add an empty line in front, and capitalize the comment

// geo measurements
ST_AREA("ST_Area"),
ST_DISTANCE("ST_Distance"),
ST_GEOMETRY_TYPE("ST_GEOMETRY_TYPE"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ST_GEOMETRY_TYPE("ST_GEOMETRY_TYPE"),
ST_GEOMETRY_TYPE("ST_GeometryType"),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a standard function to return the SRID of the geometry? (Identify whether it is geometry or geography)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it returns the type of the geometry as a string. EG: 'ST_Linestring', 'ST_Polygon','ST_MultiPolygon' etc

* <ul>
* <li>For dimension, time, date time fields, support {@link DataType}: INT, LONG, FLOAT, DOUBLE, STRING</li>
* <li>For dimension, time, date time fields, support {@link DataType}: INT, LONG, FLOAT, DOUBLE, STRING, BYTES</li>
* <li>For non-derived metric fields, support {@link DataType}: INT, LONG, FLOAT, DOUBLE</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the javadoc. Add BYTES here as well

<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.21</version>
<version>${jmh.version}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these to the root pom and specify the version there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already there

return null;
}

validateGeographyType("ST_Distance", leftGeometry, EnumSet.of(GeometryType.POINT));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified to geometry instanceof Point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to reuse the same error msg template of several functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be quite big performance difference, especially for per-value check

return _results;
}

public static void checkLatitude(double latitude) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make all helper methods private

for (int i = 0; i < projectionBlock.getNumDocs(); i++) {
Geometry firstGeometry = GeometrySerializer.deserialize(firstValues[i]);
Geometry secondGeometry = GeometrySerializer.deserialize(secondValues[i]);
if (GeometryUtils.isGeography(firstGeometry) || GeometryUtils.isGeography(secondGeometry)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equals should work on geography as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it should

/**
* Constructor function for polygon object from text.
*/
public class StPolygonFunction extends ConstructFromTextFunction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right that St_Polygon is the same as ST_GeomFromText

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, added the constraint of checking polygon type

protected static final TransformResultMetadata BYTES_SV_NO_DICTIONARY_METADATA =
new TransformResultMetadata(DataType.BYTES, true, false);

private boolean[] _booleanValuesSV;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

Copy link
Contributor Author

@yupeng9 yupeng9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jackie-Jiang Thanks for the detailed review. Comments addressed

// geo measurements
ST_AREA("ST_Area"),
ST_DISTANCE("ST_Distance"),
ST_GEOMETRY_TYPE("ST_GEOMETRY_TYPE"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it returns the type of the geometry as a string. EG: 'ST_Linestring', 'ST_Polygon','ST_MultiPolygon' etc

<artifactId>lucene-analyzers-common</artifactId>
<version>${lucene.version}</version>
</dependency>
<dependency>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it to pinot-perf project.


POINT(false, "ST_Point"),
MULTI_POINT(true, "ST_MultiPoint"),
LINE_STRING(false, "ST_LineString"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LINEAR_RING is a subtype of LINEAR_STRING

_name = name;
}

public boolean isMultitype() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in this PR. It's useful in function like https://postgis.net/docs/ST_GeometryN.html

/**
* Provides methods to efficiently serialize and deserialize geometry types.
*/
public class GeometrySerde extends Serializer {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not exactly same, in particular, the differences are:

  • Presto uses schema to indicate geometry vs geography info, while we encode this in the type byte.
  • Presto serializes additional information such as envelope to be compatible with ESRI serialization, but the serde here does not, which is simpler and faster

Added this to the comments

return null;
}

validateGeographyType("ST_Distance", leftGeometry, EnumSet.of(GeometryType.POINT));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to reuse the same error msg template of several functions

* This assumes a spherical Earth, and uses the Vincenty formula. (https://en.wikipedia
* .org/wiki/Great-circle_distance)
*/
public static double greatCircleDistance(double latitude1, double longitude1, double latitude2, double longitude2) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (int i = 0; i < projectionBlock.getNumDocs(); i++) {
Geometry firstGeometry = GeometrySerializer.deserialize(firstValues[i]);
Geometry secondGeometry = GeometrySerializer.deserialize(secondValues[i]);
if (GeometryUtils.isGeography(firstGeometry) || GeometryUtils.isGeography(secondGeometry)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it should

/**
* Constructor function for polygon object from text.
*/
public class StPolygonFunction extends ConstructFromTextFunction {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, added the constraint of checking polygon type

<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.21</version>
<version>${jmh.version}</version>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already there

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some comments.
Let me know when you address the comments and ready to merge

*/
public enum GeometryType {

POINT(false, 0,"ST_Point"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) reformat

MULTI_POLYGON(true, 5,"ST_MultiPolygon"),
GEOMETRY_COLLECTION(true, 6,"ST_GeomCollection");

private final boolean _multitype;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) _multiType? (IDE identify multitype as typo)

/**
* The geometry type used in serialization
*/
public enum GeometrySerializationType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this class

* @return the serialization type
*/
public static GeometryType fromID(int id) {
switch (id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep an static GeometryType array
private static final GeometryType[] ID_TO_TYPE_MAP = new GeometryType[] {POINT, MULTI_POINT, ...};
Then you can avoid the switch branching for better performance
return ID_TO_TYPE_MAP[id];

* - The envelope info is not serialized
*/
public class GeometrySerde {
private static final Logger LOGGER = LoggerFactory.getLogger(GeometrySerde.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Remove the unused LOGGER (we don't want to log within serde as it is per-value based and can easily flood the log)

@ScalarFunction
public static byte[] stPoint(double longitude, double latitude) {
return GeometrySerializer
.serialize(GeometryUtils.GEOMETRY_FACTORY.createPoint(new Coordinate(longitude, latitude)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Major) Should this be GEOGRAPHY_FACTORY for longitude and latitude?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. Changed the argument to x,y

Comment on lines 80 to 82
Geometry geometry;
for (int i = 0; i < projectionBlock.getNumDocs(); i++) {
geometry = GeometrySerializer.deserialize(values[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary. I did some benchmark on this and there is performance difference

return null;
}

validateGeographyType("ST_Distance", leftGeometry, EnumSet.of(GeometryType.POINT));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be quite big performance difference, especially for per-value check

_results[i] = sphericalDistance(firstGeometry, secondGeometry);
} else {
_results[i] =
firstGeometry.isEmpty() || secondGeometry.isEmpty() ? null : firstGeometry.distance(secondGeometry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can return Double.NaN here to indicate empty geometry

Comment on lines 63 to 64
Utils.rethrowException(
new RuntimeException(String.format("Failed to parse geometry from string: %s", argumentValues[i])));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Utils.rethrowException(
new RuntimeException(String.format("Failed to parse geometry from string: %s", argumentValues[i])));
throw new RuntimeException(String.format("Failed to parse geometry from string: %s", argumentValues[i]));

@yupeng9
Copy link
Contributor Author

yupeng9 commented Jul 23, 2020

@Jackie-Jiang thanks for taking another pass. Comments addressed, and feel free to merge

@Jackie-Jiang Jackie-Jiang merged commit 1124897 into apache:master Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants