Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated SQL: Timed out responses are no longer recorded as "SUCCESSFUL" #93

Merged
merged 1 commit into from
May 20, 2023

Conversation

TheOtherBrian1
Copy link
Contributor

@TheOtherBrian1 TheOtherBrian1 commented May 18, 2023

What kind of change does this PR introduce?

Bug fix

What is the current behavior?

Timed out responses are labeled as successful.

Please link any relevant issues here: issue 74

What is the new behavior?

Time out responses are now recognized as "ERRORS"

Additional context

The unexpected test failures in my recent pull requests (1, 2), which only modified documentation, prompted me to investigate the error. The root cause appears to be timeouts during test endpoint calls to https://httpbin.org. If the response exceeds the default 2-second timeout, net._http_collect_response returns a misleading success status:

| status  | message | response |
| ------- | ------- | -------- |
| SUCCESS | ok      | (,,)     |

When the tests encounter a SUCCESS status, they anticipate the response field to be populated with the tuple (<status_code>, <headers>, <body>). However, because timed-out responses are mislabeled as successful, this field remains unpopulated, leading to discrepancies. One fix would be to adjust the worker.c to yield an 'ERROR' code upon timeouts, but ongoing discussions around a potential breaking update in issue 74 and issue 62 suggest caution.

While looking for an alternative solution, I ran two GET requests using the Postman Echo API:

-- REQUEST_ID 81: TIMED OUT
SELECT net.http_get (
            'https://postman-echo.com/get?foo1=bar1&foo2=bar2',
            timeout_milliseconds := 1
) AS request_id;

-- REQUEST_ID 82: SUCCESSFUL
SELECT net.http_get (
            'https://postman-echo.com/get?foo1=bar1&foo2=bar2'
) AS request_id;

Here are the requests' records in the net._http_response table:

id status_code content_type                     headers         content                                                                                                                                                                                                                                                                                                                                                                                                   timed_out error_msg           created                      
81                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Timeout was reached 2023-05-18 04:01:07.054579+00
82 200         application/json; charset=utf-8 [object Object] { "args": { "foo1": "bar1", "foo2": "bar2" }, "headers": { "x-forwarded-proto": "https", "x-forwarded-port": "443", "host": "postman-echo.com", "x-amzn-trace-id": "Root=1-6465a4be-32a2128c278ce40d3879ff2f", "accept": "*/*", "api-key-header": "<API KEY>", "user-agent": "pg_net/0.7.1" }, "url": "https://postman-echo.com/get?foo1=bar1&foo2=bar2" } false                         2023-05-18 04:08:30.839851+00

I discovered that net._http_collect_response, while determining success a response's status, overlooks error messages and assumes any recorded response denotes a successful execution. My proposed solution inspects the error message field for each request, and if a message is found or the request is absent, the function will return an 'ERROR' status. I modified only two lines within the function (denoted by comments) which should resolve this issue.

-- Collect responses of an http request
-- API: Private
create or replace function net._http_collect_response(
    -- request_id reference
    request_id bigint,
    -- when `true`, return immediately. when `false` wait for the request to complete before returning
    async bool default true
)
    -- http response composite wrapped in a result type
    returns net.http_response_result
    strict
    volatile
    parallel safe
    language plpgsql
    security definer
as $$
declare
    rec net._http_response;
    req_exists boolean;
begin

    if not async then
        perform net._await_response(request_id);
    end if;

    select *
    into rec
    from net._http_response
    where id = request_id;
    
    --  If no response or an error exists, return ERROR
    -- BELOW LINE MODIFIED: ADDED OR CONDITION
    if rec is null OR rec.error_msg IS NOT NULL then 
        -- The request is either still processing or the request_id provided does not exist

        -- TODO: request in progress is indistinguishable from request that doesn't exist

        -- No request matching request_id found
        return (
            'ERROR',
            -- BELOW LINE MODIFIED: ADDED COALESCE
            COALESCE(rec.error_msg, 'request matching request_id not found'),
            null
        )::net.http_response_result;

    end if;


    -- Return a valid, populated http_response_result
    return (
        'SUCCESS',
        'ok',
        (
            rec.status_code,
            rec.headers,
            rec.content
        )::net.http_response
    )::net.http_response_result;
end;
$$;

Also, I would like to ask: is there any reason why this function is commented as "Private"? It seems like it shouldn't be.

…times out

Currently, when a request times out, the function returns a "SUCCESS" response. My changes fix this bug.
@TheOtherBrian1 TheOtherBrian1 changed the title _http_collect_response now returns an 'ERROR' response when requests … Updated SQL: Timed out responses are no longer recorded as "SUCCESSFUL" May 18, 2023
@steve-chavez
Copy link
Member

Also, I would like to ask: is there any reason why this function is commented as "Private"? It seems like it shouldn't be.

net._http_collect_response uses an infinte loop(_await_response) to wait for a response.

I recall an user was using net._http_collect_response on triggers, which of course caused timeouts on slow http responses.

So yeah, that function is only meant for development purposes. If we really want to expose sync requests, we need to come up with a C implementation.

Copy link
Member

@steve-chavez steve-chavez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR, well explained 💯.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants