For non retryable API call failures on a lookup join, allow the error to be in the output in the stream


**Requirement**
The original requirement was to expose error information include error messages and error http status codes into the flow. We then extended the scope to include HTTP Status codes, headers which could be useful for non-error cases. When we prototyped that, we found that we needed another metadata column completion state, to indicate whether the http call was successful or not, as the http status code could have values in success and failure scenarios.  
A side effect of this feature is that it will be useful in debugging.

**Proposed design**

A new config option CONTINUE_ON_ERROR which is a boolean that defaults to false. When true the job will not end, but if requests fails then the job will continue with extra information in any metadata columns that have been defined. The lookup join values from the lookup table (the table defined with the HTTP connector)  will be null for nullable files, or the default for non nullable fields. 

Metadata keys (read only so they cannot be inserted into) 
ERROR-MESSAGE string - message from the client side exception or the response body for failed HTTP requests. This only has a value in failure situations 
HTTP-HEADERS - map<STRING, LIST<<STRING>>> http headers 
HTTP-STATUS-CODE - integer - http status code
HTTP-COMPLETION-STATE - information about how the last http call completed. The string will have values from an enum:
SUCCESS - HTTP call succeeded
HTTP_ERROR_STATUS - HTTP request failed with  a http status code
EXCEPTION -  there was an Exception and no HTTP status code. 

**Notes:**
If you are using the Table API `TableResult` and have an `await` with a timeout, this Timeout exception will cause the job to terminate, even if there are metadata columns defined.
The config flag  determines the behaviour.
The metadata columns are populated when there is something meaningful to put in them 

A new java bean is added 
```
public class HttpRowDataWrapper {
    private final Collection<RowData> data;
    private final String errorMessage;
    private final Map<String, List<String>> httpHeadersMap;
    private final Integer httpStatusCode;
    private final HttpCompletionState httpCompletionState;
} 
```

Allowing the lower HTTP layers to communicate up the http content to the lookup code.
Here is a picture of its lifecycle. 
<img width="3600" height="2502" alt="Image" src="https://github.com/user-attachments/assets/35a3856a-6147-4016-bffd-da5a1e362229" />

**Rejected other designs:**

1) Have a configuration boolean fail_on_error, that defaults to true, when true HTTP errors (potentially after retry processing)  result in exceptions that end the job. When false. New metadata columns will be populated,
ERROR_MESSAGE string
HEADERS
ERROR_STATUS_CODE
 2) We then thought if this is just happening in error cases , then we do not need the config flag, as we can populate the metadata columns if they are there
3) We then extended the scope to include non error cases, so the metadata fields are:
ERROR-MESSAGE string
HTTP-HEADERS
HTTP-STATUS-CODE
And realised that we needed a configuration option again.
This also involved amending the lower level pull method to return not just the Collection<RowData> but also information from the HTTP Response.
4) Added metadata key COMPLETION_STATE for more information
Metadata keys (read only so they cannot be inserted into) 
ERROR-MESSAGE string - message from the client side exception or the response body for failed HTTP requests. This only has a value in failure situations 
HTTP-HEADERS - map<STRING, LIST<<STRING>>> http headers 
HTTP-STATUS-CODE - integer - http status code
HTTP-COMPLETION-STATE information about how the http call ended The string will have values from an enum:
HTTP_ERROR_STATUS - the http call failed after retrying
EXCEPTION -  error occurred but there wa no http status code
CLIENT_SIDE_EXCEPTION - no HTTP call completed, and there was an Exception

The problem with this is that we cannot determine HTTP_FAILED_AFTER_RETRY as this happens in the HTTP Client with a Retry utility.
Also   CONTINUE_ON_ERROR_INFO is dependent on the   CONTINUE_ON_ERROR config being set. Ut would seem more intuitive to just have the http completion state and include SUCCESS
Also if we fail while deserialising the http bad response , we get an IOException - I am not sure if we think of this as a client side Exception - I think Exception is clearer for this field. 





**Background- this was the original text with which this issue was raised:** 

We are thinking of a solution like this :

```
create table api_lookup ( 
  `customer_id` STRING NOT NULL,
  `customer_records_from_backend` STRING,
  `errorMessage` STRING METADATA FROM `error-_string`,
  `responseCode` INT METADATA FROM `error_code`)
WITH (   
  `connector` = 'rest_lookup',
  ...
)
```

The sample above shows us defining two METADATA fields 'error_string' and 'error_code' which provide a way to surface this data from the connector back to SQL. Allowing the flow to be able to act on this error content driving the appropriate error logic without having to come out of the stream to look at logs, open telemetry etc.  

The request values would be in the output, but the response columns (from the lookup table) would be nulls. 

The [docs](https://github.com/getindata/flink-http-connector?tab=readme-ov-file#retries-lookup-source)  describe only how errors can cause retries up to a limit before failing the job. A new gid.connector.http.source.lookup.<property> might be required in order to enable the 'fail and produce an error message' behaviour as opposed to the currently documented 'fail and retry or crash'. When there is an error of this type then the response columns would be null.   


**Investigation to see if this is feasible**

Would the lookup join code path call **SupportsReadingMetadata**  (which is what Kafka uses). 
flink has

```
 public interface SupportsReadingMetadata {
    Map<String, DataType> listReadableMetadata();

    void applyReadableMetadata(List<String> var1, DataType var2);

    default boolean supportsMetadataProjection() {
        return true;
    }
```
 
If this was driven at runtime in the lookup join case then the metadata should come through.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

For non retryable API call failures on a lookup join, allow the error to be in the output in the stream #154

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

For non retryable API call failures on a lookup join, allow the error to be in the output in the stream #154

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions