-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ODRC: implement Document creation/upload #19
Comments
The Documents API integration fields do not hold any semantic meaning, so it's easier to understand them when grouped together with related fields and track them separately from the 'normal' model fields. Eventually, the lock ID will need to get stored too, the code is currently commented out since PR #96 needs to be merged first, as it also has new migrations.
Rather than embedding this everywhere and repeating the credentials, we can simplify the test setup by using a factory with a trait for our Open Zaak docker compose configuration. If/when we alter the credentials/fixture, we only have a single place to update, while keeping the semantic meaning in tests clear enough through the parameter name.
Added the initial step to create a document in the Documents API, using the large file upload parts mechanism. This sets up the public API to use in the future API endpoints, allowing us to keep reasoning (mostly) in our own Python domain without worrying about implementation details and API versions.
The file uploads that we will receive in our endpoint can be proxied to the Documents API from the client, which has the API credentials and receives the lock value from our local state.
This is the file step to be able to 'use' the document - once all document parts are uploaded (via our proxy endpoint), we can ensure that the document is unlocked and then ready for use.
The Documents API integration fields do not hold any semantic meaning, so it's easier to understand them when grouped together with related fields and track them separately from the 'normal' model fields. Eventually, the lock ID will need to get stored too, the code is currently commented out since PR #96 needs to be merged first, as it also has new migrations.
Rather than embedding this everywhere and repeating the credentials, we can simplify the test setup by using a factory with a trait for our Open Zaak docker compose configuration. If/when we alter the credentials/fixture, we only have a single place to update, while keeping the semantic meaning in tests clear enough through the parameter name.
Added the initial step to create a document in the Documents API, using the large file upload parts mechanism. This sets up the public API to use in the future API endpoints, allowing us to keep reasoning (mostly) in our own Python domain without worrying about implementation details and API versions.
The file uploads that we will receive in our endpoint can be proxied to the Documents API from the client, which has the API credentials and receives the lock value from our local state.
This is the file step to be able to 'use' the document - once all document parts are uploaded (via our proxy endpoint), we can ensure that the document is unlocked and then ready for use.
Since multiple requests will be involved in the document creation and file uploads, we need some persistent storage for the document lock value, created by the Document create call.
The Documents API integration fields do not hold any semantic meaning, so it's easier to understand them when grouped together with related fields and track them separately from the 'normal' model fields. Eventually, the lock ID will need to get stored too, the code is currently commented out since PR #96 needs to be merged first, as it also has new migrations.
Rather than embedding this everywhere and repeating the credentials, we can simplify the test setup by using a factory with a trait for our Open Zaak docker compose configuration. If/when we alter the credentials/fixture, we only have a single place to update, while keeping the semantic meaning in tests clear enough through the parameter name.
Added the initial step to create a document in the Documents API, using the large file upload parts mechanism. This sets up the public API to use in the future API endpoints, allowing us to keep reasoning (mostly) in our own Python domain without worrying about implementation details and API versions.
* Test for invalid upload data, leading to validation error * Test for generic IO-layer error (can't connect to host) * Test for error response returned from Documenten API
* After registration of the metata, we expect to be able to upload the binary content of all file parts * The uploads should be forwarded to the underlying Documents API * Once all parts are complete, the document must be unlocked * When we download the file data from the Documents API, this must match what we uploaded.
Using the new paths + path converters makes it easier to deal with viewset actions for additional routes without needing to validate the path parameter manually, and keeps everything consistent. This requires disabling the 'format' endpoint variations, otherwise they show up in the API specification (looks like drf-spectacular can't filter those out properly yet), which requires setting the attribute on the router as there is no __init__ kwarg for it. For proper type casting and annotations, we need to specify the converter on the viewset too.
Wire up the detail route to accept PUT requests with file part upload data. The parser must be set to multipart to properly handle the file uploads from the request data, and accordingly the serializer must mark the appropriate fields as read/write only. We opt to return a generic status code of 204, as the response data of the upstream request has little value and this simplifies having to relay the upstream 200 or 201 depending on whether the part data was created or updated.
To write tests with multiple chunks that still remain fast, we need to significantly lower the chunk size from 4GB to 100B (for example) so that we don't have to send multiple GB of data and record that in the VCR cassettes.
Added a test for an upload with more than one part and assert that the response data of the chunk upload properly communicates the document status as a whole.
* Test for invalid upload data, leading to validation error * Test for generic IO-layer error (can't connect to host) * Test for error response returned from Documenten API
@MarcoKlerks enige dat jij kan nalopen is de API documentatie, het technische testen zal voor @felixcicatt en collega's zijn :) |
@sergei-maertens Moeten we nog aparte user stories aanmaken voor de identificatie en validatie van het bestandformaat? @felixcicatt Wil je nu al testen of pas wanneer jullie het ODPC hieraan gaan koppelen? |
@MarcoKlerks als het voor @sergei-maertens geen probleem is als de feedback iets later komt, heeft het mijn voorkeur om te wachten met testen tot we ODPC #42 oppakken |
@MarcoKlerks ze staan als task/checkbox in #36, dus ik denk het niet |
Het is in DiWoo inderdaad een optioneel veld, dus een lagere prio dan P0. |
Acceptance criteria
Copied/moved from #2
Document creation + uploads
We make use of the file parts mechanism of the Documenten API, always.
Preparing a document upload
The client must pass the necessary metadata and then based on the response of that,
cut up the file upload in parts that can be submitted individually.
The request body schema of a Document POST would look something like:
This translates to a request of ODRC -> Documenten API with schema:
The Documenten API will return a
lock
and list of BestandsDelen for upload, eachbestandsdeel will have the shape:
The ODRC will then expose endpoints for these part uploads so the publication component
can upload the parts:
URL:
/api/v1/documenten/:uuid/bestandsdeel/:index
A bestandsdeel wil simply be multipart/form-data, with the API key as auth header.
The request will be transformed by the ODRC, which adds the lock ID & JWT for the
Documenten API, and streams the file part down to the Documenten API.
Once all parts are received, we unlock the created document.
Tasks
POST /api/v1/documenten
PUT /api/v1/documenten/:uuid/bestandsdelen/:index
/api/v1/bestandsdelen/:uuid
documentFinalized: true|false
so that the client can be informed that they can refresh the document resource if needed, as file uploads will likely happen in parallelThe first part of the document contains the magic bytes that can be used to validate against thepostponed to ODRC: add model/admin interface for 'formaat' waardelijst #36format
in the metadata. Check the upload validation in Open Forms for inspiration (and edge cases).The text was updated successfully, but these errors were encountered: