Skip to content

For non retryable API call failures on a lookup join, allow the error to be in the output in the stream #154

@davidradl

Description

@davidradl

Requirement
The original requirement was to expose error information include error messages and error http status codes into the flow. We then extended the scope to include HTTP Status codes, headers which could be useful for non-error cases. When we prototyped that, we found that we needed another metadata column completion state, to indicate whether the http call was successful or not, as the http status code could have values in success and failure scenarios.
A side effect of this feature is that it will be useful in debugging.

Proposed design

A new config option CONTINUE_ON_ERROR which is a boolean that defaults to false. When true the job will not end, but if requests fails then the job will continue with extra information in any metadata columns that have been defined. The lookup join values from the lookup table (the table defined with the HTTP connector) will be null for nullable files, or the default for non nullable fields.

Metadata keys (read only so they cannot be inserted into)
ERROR-MESSAGE string - message from the client side exception or the response body for failed HTTP requests. This only has a value in failure situations
HTTP-HEADERS - map<STRING, LIST<>> http headers
HTTP-STATUS-CODE - integer - http status code
HTTP-COMPLETION-STATE - information about how the last http call completed. The string will have values from an enum:
SUCCESS - HTTP call succeeded
HTTP_ERROR_STATUS - HTTP request failed with a http status code
EXCEPTION - there was an Exception and no HTTP status code.

Notes:
If you are using the Table API TableResult and have an await with a timeout, this Timeout exception will cause the job to terminate, even if there are metadata columns defined.
The config flag determines the behaviour.
The metadata columns are populated when there is something meaningful to put in them

A new java bean is added

public class HttpRowDataWrapper {
    private final Collection<RowData> data;
    private final String errorMessage;
    private final Map<String, List<String>> httpHeadersMap;
    private final Integer httpStatusCode;
    private final HttpCompletionState httpCompletionState;
} 

Allowing the lower HTTP layers to communicate up the http content to the lookup code.
Here is a picture of its lifecycle.
Image

Rejected other designs:

  1. Have a configuration boolean fail_on_error, that defaults to true, when true HTTP errors (potentially after retry processing) result in exceptions that end the job. When false. New metadata columns will be populated,
    ERROR_MESSAGE string
    HEADERS
    ERROR_STATUS_CODE
  2. We then thought if this is just happening in error cases , then we do not need the config flag, as we can populate the metadata columns if they are there
  3. We then extended the scope to include non error cases, so the metadata fields are:
    ERROR-MESSAGE string
    HTTP-HEADERS
    HTTP-STATUS-CODE
    And realised that we needed a configuration option again.
    This also involved amending the lower level pull method to return not just the Collection but also information from the HTTP Response.
  4. Added metadata key COMPLETION_STATE for more information
    Metadata keys (read only so they cannot be inserted into)
    ERROR-MESSAGE string - message from the client side exception or the response body for failed HTTP requests. This only has a value in failure situations
    HTTP-HEADERS - map<STRING, LIST<>> http headers
    HTTP-STATUS-CODE - integer - http status code
    HTTP-COMPLETION-STATE information about how the http call ended The string will have values from an enum:
    HTTP_ERROR_STATUS - the http call failed after retrying
    EXCEPTION - error occurred but there wa no http status code
    CLIENT_SIDE_EXCEPTION - no HTTP call completed, and there was an Exception

The problem with this is that we cannot determine HTTP_FAILED_AFTER_RETRY as this happens in the HTTP Client with a Retry utility.
Also CONTINUE_ON_ERROR_INFO is dependent on the CONTINUE_ON_ERROR config being set. Ut would seem more intuitive to just have the http completion state and include SUCCESS
Also if we fail while deserialising the http bad response , we get an IOException - I am not sure if we think of this as a client side Exception - I think Exception is clearer for this field.

Background- this was the original text with which this issue was raised:

We are thinking of a solution like this :

create table api_lookup ( 
  `customer_id` STRING NOT NULL,
  `customer_records_from_backend` STRING,
  `errorMessage` STRING METADATA FROM `error-_string`,
  `responseCode` INT METADATA FROM `error_code`)
WITH (   
  `connector` = 'rest_lookup',
  ...
)

The sample above shows us defining two METADATA fields 'error_string' and 'error_code' which provide a way to surface this data from the connector back to SQL. Allowing the flow to be able to act on this error content driving the appropriate error logic without having to come out of the stream to look at logs, open telemetry etc.

The request values would be in the output, but the response columns (from the lookup table) would be nulls.

The docs describe only how errors can cause retries up to a limit before failing the job. A new gid.connector.http.source.lookup. might be required in order to enable the 'fail and produce an error message' behaviour as opposed to the currently documented 'fail and retry or crash'. When there is an error of this type then the response columns would be null.

Investigation to see if this is feasible

Would the lookup join code path call SupportsReadingMetadata (which is what Kafka uses).
flink has

 public interface SupportsReadingMetadata {
    Map<String, DataType> listReadableMetadata();

    void applyReadableMetadata(List<String> var1, DataType var2);

    default boolean supportsMetadataProjection() {
        return true;
    }

If this was driven at runtime in the lookup join case then the metadata should come through.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions