Skip to content
This repository was archived by the owner on Jun 26, 2024. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions gateway-service-impl/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ dependencies {
implementation("com.google.guava:guava:32.1.2-jre")
implementation("com.google.inject:guice:5.0.1")

implementation("com.fasterxml.jackson.core:jackson-annotations:2.15.2")
implementation("com.fasterxml.jackson.core:jackson-databind:2.15.2")
implementation("com.fasterxml.jackson.core:jackson-annotations:2.16.0")
implementation("com.fasterxml.jackson.core:jackson-databind:2.16.0")

testImplementation("org.junit.jupiter:junit-jupiter:5.8.2")
testImplementation("org.mockito:mockito-junit-jupiter:5.4.0")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import java.util.Set;
import java.util.function.Function;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import org.hypertrace.gateway.service.common.util.ExpressionReader;
import org.hypertrace.gateway.service.v1.common.Expression;
import org.hypertrace.gateway.service.v1.common.Filter;
Expand All @@ -29,6 +30,7 @@
*/
public class TheRestGroupRequestHandler {
private static final String OTHER_COLUMN_VALUE = "__Other";
private static final List<String> LIST_WITH_NULL_STRING = List.of("null");
private final RequestHandlerWithSorting requestHandler;

TheRestGroupRequestHandler(RequestHandlerWithSorting requestHandler) {
Expand Down Expand Up @@ -224,10 +226,28 @@ private Set<String> getExcludedValues(
String resultName, ExploreResponse.Builder originalResponse) {
return originalResponse.getRowBuilderList().stream()
.map(rowBuilder -> rowBuilder.getColumnsMap().get(resultName))
.map(Value::getString)
.flatMap(this::getStringValues)
.collect(ImmutableSet.toImmutableSet());
}

private Stream<String> getStringValues(Value value) {
switch (value.getValueType()) {
case STRING:
return Stream.of(value.getString());
case STRING_ARRAY:
// Handle special case when "null" string is returned as string array value(default value
// scenario). This will be converted to empty list in value converter. In such a case use
// list with "null" value
if (value.getStringArrayList().isEmpty()) {
return LIST_WITH_NULL_STRING.stream();
} else {
return value.getStringArrayList().stream();
}
default:
return Stream.of();
}
}

private Filter.Builder createExcludedChildFilter(
Expression groupBySelectionExpression, Set<String> excludedValues) {
return Filter.newBuilder()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ public class TimeAggregationsWithGroupByRequestHandler implements IRequestHandle
private static final Logger LOG =
LoggerFactory.getLogger(TimeAggregationsWithGroupByRequestHandler.class);

private static final List<String> LIST_WITH_NULL_STRING = List.of("null");

private final AttributeMetadataProvider attributeMetadataProvider;
private final RequestHandler normalRequestHandler;
private final TimeAggregationsRequestHandler timeAggregationsRequestHandler;
Expand Down Expand Up @@ -228,7 +230,14 @@ private void buildInClauseFilterValue(Value value, Value.Builder valueBuilder) {
valueBuilder.addStringArray(value.getString());
break;
case STRING_ARRAY:
valueBuilder.addAllStringArray(value.getStringArrayList());
// Handle special case when "null" string is returned as string array value(default value
// scenario). This will be converted to empty list in value converter. In such a case use
// list with "null" value
if (value.getStringArrayList().isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so one behavior discrepancy this would lead to to - let's say we have 10 rows, and 9 of them have no values for a multivalued col X and 1 of them does. Both before and after this change, if we grouped by X we'd get a single returned grouped with a count of 1. Now, let's say we filter for where X = null. Before this change, we'd error out (the in [] case we're trying to fix), after we'd return a new group of [] -> 9. So this is problematic for two different reasons

  1. Filtering is actually introducing rather than removing results
  2. There's a group that's only discoverable when no other values are present.

WDYT about just always adding null here (if not already present)?

Copy link
Contributor

@kotharironak kotharironak Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so one behavior discrepancy this would lead to to - let's say we have 10 rows, and 9 of them have no values for a multivalued col X and 1 of them does. Both before and after this change, if we grouped by X we'd get a single returned grouped with a count of 1.

From the Query Service (QS) point of view, it will return two groups. Let's assume that 1 row which is not empty has value - ["hello"]. The output will be as below.
Returned groups:

{
  {"null", 9},
  {"hello", 1}
}

And if the none empty row of X has more than one value, say - ["hello", "world"], QS will return 3 groups:

{
  {"null", 9},
  {"hello", 1},
  {"world", 1}
}

Moving on to Pinot, if the column (or multi-value column) has no value, it will return the default value. The default value for a String column is null. For a multi-value column, if not preset, it uses the default of a single-value column of the same type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The described behavior makes sense, the first issue to resolve however is how GW should behave because it's not consistent. If we do a group by with time series, it won't return that null (or empty list); if we do it without, it returns an empty list as one value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that both should return empty list

valueBuilder.addAllStringArray(LIST_WITH_NULL_STRING);
} else {
valueBuilder.addAllStringArray(value.getStringArrayList());
}
break;
case LONG:
valueBuilder.addLongArray(value.getLong());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -945,6 +945,17 @@
"QS"
]
},
{
"fqn": "Service.userRoles",
"valueKind": "TYPE_STRING_ARRAY",
"key": "userRoles",
"displayName": "Service User Roles",
"scopeString": "SERVICE",
"type": "ATTRIBUTE",
"sources": [
"QS"
]
},
{
"fqn": "Service.start_time_millis",
"valueKind": "TYPE_INT64",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
{
"row": [
{
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615593600000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY"
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "1616"
}
}
},
{
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615593600000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY",
"string_array": ["nginx-traceshop"]
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "1989"
}
}
},
{
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615597200000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY"
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "2525"
}
}
},
{
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615597200000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY",
"string_array": ["nginx-traceshop"]
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "1977"
}
}
},
{
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615593600000"
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "27485"
},
"SERVICE.userRoles": {
"valueType": "STRING",
"string": "__Other"
}
}
},
{
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615597200000"
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "27192"
},
"SERVICE.userRoles": {
"valueType": "STRING",
"string": "__Other"
}
}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
{
"row": [{
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615593600000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY"
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "1616"
}
}
}, {
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615593600000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY",
"string_array": ["nginx-traceshop"]
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "1989"
}
}
}, {
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615597200000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY"
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "2525"
}
}
}, {
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615597200000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY",
"string_array": ["nginx-traceshop"]
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "1977"
}
}
}, {
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615600800000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY",
"string_array": ["nginx-traceshop"]
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "1965"
}
}
}, {
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615600800000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY"
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "2323"
}
}
}]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
{
"row": [{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the query look like for this for gateway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! okey, it will be an internal query. Or would be IN query from GraphQL to gateway too. That is currently not expressible, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if a gateway client needs to explicitly ask for a null group, it should create a filter with the right-hand side (RHS) containing only the valueType. However, it can't query the null group along with other groups explicitly in the IN clause, right? This is same refer in the comment by @aaron-steinfeld here - #186 (comment)

WDYT about just always adding null here (if not already present)?

This will be opposite to the current case - where we don't want the null group; then we will always return it. I think both seem the same to me. So, I would say we can stick to the current behavior here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have all queries in the test

  1. GraphQL query to GW
  2. GW query to QS and its response
  3. GW response to GraphQL

"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615593600000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY"
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "1616"
}
}
}, {
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615597200000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY"
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "2525"
}
}
}, {
"columns": {
"INTERVAL_START_TIME": {
"valueType": "LONG",
"long": "1615600800000"
},
"SERVICE.userRoles": {
"valueType": "STRING_ARRAY"
},
"SUM_SERVICE.numCalls_[]": {
"valueType": "LONG",
"long": "2323"
}
}
}]
}
Loading