Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_stac of unsigned job results fails with 401 Unauthorized #792

Closed
bossie opened this issue Jun 6, 2024 · 6 comments · Fixed by #794, #796 or #798
Closed

load_stac of unsigned job results fails with 401 Unauthorized #792

bossie opened this issue Jun 6, 2024 · 6 comments · Fixed by #794, #796 or #798
Assignees
Labels

Comments

@bossie
Copy link
Collaborator

bossie commented Jun 6, 2024

Reported by @VictorVerhaert.

Main job j-240605ef616e4f65bf465d15c84030bc on cdse-prod by user faba0f95-04b7-4560-aeb4-c0172dd9de09 is attempting a load_stac of job results by means of the unsigned URL:

main_job

Dependency job j-240605a639644099b2d8d8cabe62b971 is a job by the same user, also on cdse-prod:

dependency_job

Considering both jobs are by the same user and on the same back-end, using the unsigned URL should Just Work without actually fetching the URL, but instead it fails with a 401 Unauthorized:

Traceback (most recent call last):
  File "/opt/openeo/lib/python3.8/site-packages/pystac/stac_io.py", line 300, in read_text_from_href
    with urlopen(req) as f:
  File "/usr/lib64/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib64/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib64/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib64/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib64/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 1375, in <module>
    main(sys.argv)
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 1040, in main
    run_driver()
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 1011, in run_driver
    run_job(
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/utils.py", line 56, in memory_logging_wrapper
    return function(*args, **kwargs)
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 1104, in run_job
    result = ProcessGraphDeserializer.evaluate(process_graph, env=env, do_dry_run=tracer)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 377, in evaluate
    result = convert_node(result_node, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1581, in apply_process
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1581, in <dictcomp>
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 416, in convert_node
    return convert_node(processGraph['node'], env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1613, in apply_process
    return process_function(args=ProcessArgs(args, process_id=process_id), env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 2234, in load_stac
    return env.backend_implementation.load_stac(url=url, load_params=load_params, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py", line 760, in load_stac
    return load_stac.load_stac(url, load_params, env, layer_properties={}, batch_jobs=self.batch_jobs)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/load_stac.py", line 162, in load_stac
    stac_object = pystac.read_file(href=url)
  File "/opt/openeo/lib/python3.8/site-packages/pystac/__init__.py", line 161, in read_file
    return stac_io.read_stac_object(href)
  File "/opt/openeo/lib/python3.8/site-packages/pystac/stac_io.py", line 234, in read_stac_object
    d = self.read_json(source, *args, **kwargs)
  File "/opt/openeo/lib/python3.8/site-packages/pystac/stac_io.py", line 205, in read_json
    txt = self.read_text(source, *args, **kwargs)
  File "/opt/openeo/lib/python3.8/site-packages/pystac/stac_io.py", line 282, in read_text
    return self.read_text_from_href(href)
  File "/opt/openeo/lib/python3.8/site-packages/pystac/stac_io.py", line 303, in read_text_from_href
    raise Exception("Could not read uri {}".format(href)) from e
Exception: Could not read uri https://openeo.dataspace.copernicus.eu/openeo/1.2/jobs/j-240605a639644099b2d8d8cabe62b971/results

From the logs I gather that it doesn't consider both jobs from the same user and therefore proceeds with actually fetching the STAC URL, which (rightfully) doesn't work as it is unsigned:

main_job_logs

@bossie bossie added the bug label Jun 6, 2024
@bossie bossie self-assigned this Jun 6, 2024
@bossie
Copy link
Collaborator Author

bossie commented Jun 6, 2024

It does seem to work if the load_stac is in a sync call instead of a batch job but fails further on (indeed in the "load_stac of own job results" branch):

Traceback (most recent call last):
  File "/opt/openeo/lib/python3.8/site-packages/flask/app.py", line 1484, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/openeo/lib/python3.8/site-packages/flask/app.py", line 1469, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/users/auth.py", line 95, in decorated
    return f(*args, **kwargs)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/views.py", line 673, in result
    result = backend_implementation.processing.evaluate(process_graph=process_graph, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 301, in evaluate
    return evaluate(process_graph=process_graph, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 377, in evaluate
    result = convert_node(result_node, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1581, in apply_process
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1581, in <dictcomp>
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 416, in convert_node
    return convert_node(processGraph['node'], env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1613, in apply_process
    return process_function(args=ProcessArgs(args, process_id=process_id), env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 2234, in load_stac
    return env.backend_implementation.load_stac(url=url, load_params=load_params, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py", line 760, in load_stac
    return load_stac.load_stac(url, load_params, env, layer_properties={}, batch_jobs=self.batch_jobs)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/load_stac.py", line 145, in load_stac
    pystac_item = pystac.Item(id=asset_id, geometry=asset["geometry"], bbox=asset["bbox"],
KeyError: 'geometry'

bossie added a commit that referenced this issue Jun 6, 2024
@bossie bossie linked a pull request Jun 6, 2024 that will close this issue
bossie added a commit that referenced this issue Jun 7, 2024
Traceback (most recent call last):
  File "/opt/openeo/lib/python3.8/site-packages/flask/app.py", line 1484, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/openeo/lib/python3.8/site-packages/flask/app.py", line 1469, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/users/auth.py", line 95, in decorated
    return f(*args, **kwargs)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/views.py", line 673, in result
    result = backend_implementation.processing.evaluate(process_graph=process_graph, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 301, in evaluate
    return evaluate(process_graph=process_graph, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 377, in evaluate
    result = convert_node(result_node, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1581, in apply_process
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1581, in <dictcomp>
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 416, in convert_node
    return convert_node(processGraph['node'], env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1613, in apply_process
    return process_function(args=ProcessArgs(args, process_id=process_id), env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 2234, in load_stac
    return env.backend_implementation.load_stac(url=url, load_params=load_params, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py", line 760, in load_stac
    return load_stac.load_stac(url, load_params, env, layer_properties={}, batch_jobs=self.batch_jobs)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/load_stac.py", line 145, in load_stac
    pystac_item = pystac.Item(id=asset_id, geometry=asset["geometry"], bbox=asset["bbox"],
KeyError: 'geometry'
@bossie
Copy link
Collaborator Author

bossie commented Jun 7, 2024

It does seem to work if the load_stac is in a sync call instead of a batch job

I see no trace of a functional ElasticJobRegistry in a batch job driver (envars nor logs); this might explain why it works in the web app but not in a batch job (TBC).

@bossie bossie reopened this Jun 7, 2024
@bossie
Copy link
Collaborator Author

bossie commented Jun 7, 2024

Ok:

backend_implementation = GeoPySparkBackendImplementation(
use_job_registry=False,
)

To support loading unsigned job results, load_stac in a batch job driver needs to determine whether: yes, this is my own job so it needs a functional ElasticJobRegistry.

Note: this use case is covered by the integration tests on Terrascope but there it works because it still uses ZK for batch jobs.

bossie pushed a commit to Open-EO/openeo-geopyspark-integrationtests that referenced this issue Jun 7, 2024
@bossie
Copy link
Collaborator Author

bossie commented Jun 7, 2024

Sync requests work now (tested on openeo-staging):
load_stac_sync

bossie added a commit that referenced this issue Jun 7, 2024
@bossie bossie linked a pull request Jun 7, 2024 that will close this issue
bossie added a commit that referenced this issue Jun 7, 2024
bossie added a commit that referenced this issue Jun 7, 2024
@bossie
Copy link
Collaborator Author

bossie commented Jun 7, 2024

Fixed by supporting EJR in batch job driver (tested on openeo-staging).

@bossie
Copy link
Collaborator Author

bossie commented Jun 7, 2024

Batch job integration tests on Terrascope are failing because they lack EJR configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment