Skip to content
This repository was archived by the owner on Aug 4, 2025. It is now read-only.

[PF-308] Add GCS bucket and bigquery data references#212

Merged
zloery merged 27 commits intodevfrom
pf308
Feb 12, 2021
Merged

[PF-308] Add GCS bucket and bigquery data references#212
zloery merged 27 commits intodevfrom
pf308

Conversation

@zloery
Copy link
Contributor

@zloery zloery commented Feb 1, 2021

pair programmed with @jaycarlton

This change adds support for GCS bucket and BigQuery dataset data references in addition to the existing Data Repo snapshot references. It also introduces a new pattern for reference endpoints broken out by reference type. Cleanup of the old endpoints after migration from the old API to the new API is tracked in PF-404.

To preserve backwards compatibility, the existing endpoints and interface objects are left unchanged. This means the existing GET and enumerate endpoints will not work with bucket and dataset references, as that would require changing the response object.

This includes basic reference validation for the new types, similar to what we do for TDR snapshots. In this case we make a GET call for the specified bucket or dataset on behalf of the user, which requires that the user have appropriate permission on the object. This does not do any further validation, such as examining bucket contents or BQ tables.

This change is not feature gated behind the MC_WORKSPACE. Rawls will have access to these new reference types.

This requires publishing a new client library version, which I'll do after approvals but before merging.

Copy link
Contributor

@jaycarlton jaycarlton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some notes

referenceRequest.workspaceId(),
referenceRequest.referenceType(),
referenceRequest.name());
throw new DuplicateDataReferenceException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to include the name and type and workspace iD in the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with that is that name is a user provided string, and the input from security has been to avoid directly returning user-provided strings where possible. A common pattern we use instead is returning an error message without the values and logging the actual values, which I've added here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will be returning the name string in many cases. It is the user's name for the object and we will need to return it in get and enumerate requests. If we need to constrain the contents of the name, we should do that. That seems to be Google's approach; for example, BigQuery dataset naming

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there is nothing dangerous about returning the reference type or workspaceId is there?
So if we fixed all of that, you could skip the logging and just throw and let the global exception handler do the logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll add bucket and dataset name validation and then return the names as part of the error response.

new DataRepoSnapshot().snapshot("snapshot-name").instanceName("instance-name");
CreateDataReferenceRequestBody deprecatedRequest =
new CreateDataReferenceRequestBody()
.referenceType(ReferenceTypeEnum.DATA_REPO_SNAPSHOT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not too late to rename ReferenceTypeEnum to ReferenceType. Clients have access to that name but since the ordinals are the same it should just work, right?

@zloery zloery changed the title DRAFT [PF-308] Add GCS bucket and bigquery data references [PF-308] Add GCS bucket and bigquery data references Feb 2, 2021
@zloery zloery marked this pull request as ready for review February 2, 2021 18:06
Copy link
Contributor

@jaycarlton jaycarlton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed responses!

@davidangb
Copy link
Contributor

fyi @MatthewBemis for BigQuery references and @zarsky-broad for APIs related to managing references

Copy link
Contributor

@davidangb davidangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 overall, some non-blocking comments inline.

switch (referenceType()) {
case DATA_REPO_SNAPSHOT:
// TODO: we populate both the deprecated reference field and the new referenceInfo fields
// until all clients have migrated to use the new field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will require more than just adopting the new client library - for instance, the Terra UI is looking inside the reference key to find snapshot ids. We can migrate but it'll be more work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good to know. I think we would still want to migrate, but it sounds like it's going to be worth coordinating more on this. I've currently been tracking this as PF-404, but we may end up with AS tickets as well.

} catch (BigQueryException e) {
throw new InvalidDataReferenceException("Error while trying to access BigQuery dataset", e);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if each ReferenceObject subclass should define their own validate() method inside their own class - similar to how toJson() is defined. Then we'd just delegate to their own implementations. This might require passing a CrlService or DataRepoService to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to pass the services to the objects was the reason I didn't want to take that approach - it feels like we're drifting further away from these being value classes. (They already kind of weren't with things like toApiModel(), but adding more members makes serialization/deserialization trickier).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of them being value classes if we have to write switch statements to dispatch methods on them?

I think it would be a good separation of concerns to have the reference objects encapsulate the knowledge of the cloud resource they reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is that they can be serialized into the database and back out, which is how we store the actual references. If we require the reference objects to have a DataRepoService field, that gets more complicated - do we serialize the DataRepoService field with the reference? and if not, how do you deserialize into a valid reference object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a middle ground. I wrote a doc for further discussion: Workspace Manager Object Hierarchy

# PLACEHOLDER_VALUE should be removed when we have an actual second value
# to use.
enum: ['DATA_REPO_SNAPSHOT', 'PLACEHOLDER_VALUE'] #eventually include GCS bucket, etc.
enum: ['DATA_REPO_SNAPSHOT', 'GOOGLE_BUCKET', 'BIG_QUERY_DATASET']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glad to have PLACEHOLDER_VALUE out of here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, that truncation is still one of the strangest opinions I've seen swagger take.

# type: string
# datasetName:
# description: The name of this dataset
# type: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, good catch

Copy link
Contributor

@ddietterich ddietterich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an upward compatibility issue and maybe a validation issue.

description: The ID of the Data Repo snapshot
type: string

# GoogleBucket:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we replaced those definitions with shared ones from the common library and I forgot to remove them

CreateDataReferenceRequestBody:
type: object
required: [name, cloningInstructions]
required: [name, referenceType, cloningInstructions]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that cause an upward compatibility problem with Rawls?
If they can specify the type, then they can also change from using reference to using referenceInfo.

Otherwise, I think we need to leave it optional - maybe with a default value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good point. default it to data repo snapshot and leave it optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rawls is already specifying type, and this was previously enforced by rejecting requests with no referenceType field in ControllerValidationUtils.
We did that validation as part of the app logic instead of putting it in Swagger because the request shape was (kind of) also used for controlled resource requests, so only common fields between the two were made required. Now that we're removing all the unused controlled resource stuff here, we can move referenceType to required. I'd also like to move referenceInfo there once we stop using the reference field.

final DataReferenceInfo referenceInfo = body.getReferenceInfo();
final ReferenceObject referenceObject;
if (body.getReference() != null) {
referenceObject =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the referenceType is required in the OpenAPI yaml, so should we verify that it is set to data repo snapshot in this case?

I see later that there is a request validator, but it will error if the referenceType is not specified which seems incompatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also previously errored in ControllerValidationUtils if referenceType was not specified, so that is actually compatible.

// TODO(PF-404): requests using the deprecated reference field will not set the referenceInfo
// field. referenceInfo can be made required and this check can be removed once clients migrate.
DataReferenceInfo info =
body.getReferenceInfo() == null ? new DataReferenceInfo() : body.getReferenceInfo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that reference is not filled in with anything. Either this is broken or I am confused...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handles the existing case where a client specifies the old reference field but does not specify the new referenceInfo field. You can't have bucket or dataset references this way (as getGoogleBucket() and getBigQueryDataset() on this reference will be null), but you can specify a snapshot in the old reference field this way, and we want to support that until we move clients. That's why we check either the referenceInfo or the reference field when validating snapshots here.
(As above, existing clients do already use the DATA_REPO_SNAPSHOT referenceType, so that hasn't changed)

referenceRequest.workspaceId(),
referenceRequest.referenceType(),
referenceRequest.name());
throw new DuplicateDataReferenceException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will be returning the name string in many cases. It is the user's name for the object and we will need to return it in get and enumerate requests. If we need to constrain the contents of the name, we should do that. That seems to be Google's approach; for example, BigQuery dataset naming

referenceRequest.workspaceId(),
referenceRequest.referenceType(),
referenceRequest.name());
throw new DuplicateDataReferenceException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there is nothing dangerous about returning the reference type or workspaceId is there?
So if we fixed all of that, you could skip the logging and just throw and let the global exception handler do the logging.

typeMap.put(DataReferenceType.GOOGLE_BUCKET, ReferenceTypeEnum.GOOGLE_BUCKET);
typeMap.put(DataReferenceType.BIG_QUERY_DATASET, ReferenceTypeEnum.BIG_QUERY_DATASET);

sqlMap.put(DATA_REPO_SNAPSHOT, "DATA_REPO_SNAPSHOT");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map is unnecessary. The Enum's built-in name() and validOf methods will do the same thing. And you can convert to them without changing your interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of the sql map is to have invariant names, so that even if they refactor the enum names in Java, we can serialize correctly. .name() is too smart for this purpose, and would return the new name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cutting out the typeMap seems fine. I'm a little less sure about sqlMap for the reason Jay stated, since this value does get serialized directly.


static {
typeMap.put(DataReferenceType.DATA_REPO_SNAPSHOT, ReferenceTypeEnum.DATA_REPO_SNAPSHOT);
typeMap.put(DataReferenceType.GOOGLE_BUCKET, ReferenceTypeEnum.GOOGLE_BUCKET);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this map is overkill. Maybe I will write you an "enum" class that will use a list search - which is generally faster than a hash for lists smaller than 10 items...

I know, it probably isn't important and it is all static. If we were running on a PDP-11/70 you would thank me :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} catch (BigQueryException e) {
throw new InvalidDataReferenceException("Error while trying to access BigQuery dataset", e);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of them being value classes if we have to write switch statements to dispatch methods on them?

I think it would be a good separation of concerns to have the reference objects encapsulate the knowledge of the cloud resource they reference.

@jaycarlton
Copy link
Contributor

#212 (comment)

Yeah, and we could have a parent class with just the type as a public member in case we need it.

@ddietterich
Copy link
Contributor

As posted in slack:

Small brainstorm especially of interest to @zloery and @jaycarlton. What about making the new references endpoints use the path /api/workspaces/v1/resources/references and keep all of the WM resources within one collection. We would leave the current datareferences as is for backward compatibility.

It might make the new version cleaner because we could require referenceType and remove the reference field. And we can make a general GET for /resources that returns everything we’ve got. I know this is a late-breaking change, but I just thought of it after Jay’s suggestion on the controlled resources spec.

@zloery
Copy link
Contributor Author

zloery commented Feb 3, 2021

re: moving to /api/workspaces/v1/resources/references, I'm definitely thinking about it.

I'm also thinking about our model of resources - the DB structure suggests that we have data references to all resources (both controlled and uncontrolled), so should the common GET be on .../datareferences? Some parts of that don't really make sense though, e.g. I don't know what the referenceId of a reference to a controlled resource really means. probably worth more discussion

@zloery zloery requested a review from ddietterich February 4, 2021 18:13
@zloery
Copy link
Contributor Author

zloery commented Feb 4, 2021

Per earlier discussion, we'll break out each type into a separate endpoint, while still supporting the existing data reference APIs for compatibility in the short-term. Moving this to a draft while I work on that

@zloery zloery marked this pull request as draft February 4, 2021 19:26
@zarsky-broad
Copy link
Contributor

@zloery
Copy link
Contributor Author

zloery commented Feb 4, 2021

...huh. I'm wary of blocking on something that's been in progress for almost a year, but I want to keep an eye on that. thanks for the pointer!

@gjuggler
Copy link

gjuggler commented Feb 5, 2021

...huh. I'm wary of blocking on something that's been in progress for almost a year, but I want to keep an eye on that. thanks for the pointer!

Also, don't we use swagger-codegen and not openapi-generator? ::shakes fist at whoever forked the projects, however good their intentions may have been::

@zloery
Copy link
Contributor Author

zloery commented Feb 8, 2021

Update: I've broken out the create, get, get-by-name, and delete endpoints for each resource type separately, and the new interface no longer uses any polymorphic types. I'm not totally sure this structure is right (delete and get-by-id don't really depend on type and could be methods on /api/workspaces/v1/{id}/resources/references, but I think this is ready for more eyes.
I haven't replaced the enumerate endpoint yet, which we will need to do before fully migrating. I think that's out of scope for this PR

@zloery zloery marked this pull request as ready for review February 8, 2021 20:22
Copy link
Contributor

@ddietterich ddietterich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lotta good stuff in here! I want to discuss object model before we merge on it and maybe remove some endpoints.

} catch (BigQueryException e) {
throw new InvalidDataReferenceException("Error while trying to access BigQuery dataset", e);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a middle ground. I wrote a doc for further discussion: Workspace Manager Object Hierarchy

'500':
$ref: '#/components/responses/ServerError'

/api/workspaces/v1/{id}/resources/references/datarepo/snapshot/name/{name}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is name URL safe? Or you just assume it is encoded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Names must be URL safe and they're checked for validity in Controller methods

Copy link
Contributor

@davidangb davidangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments inline!

.cloningInstructions(
CloningInstructions.fromApiModel(body.getMetadata().getCloningInstructions()))
.referenceObject(referenceObject)
.build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a rebase now that #221 merged; this should include .referenceDescription(body.getReferenceDescription())

referenceId.toString(), id.toString(), userReq.getEmail()));
return new ResponseEntity<>(HttpStatus.NO_CONTENT);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also from #221 this needs to include updateDataReference(...)

.toApiModel())
.metadata(ControllerTranslationUtils.metadataFromDataReference(ref));
return new ResponseEntity<>(response, HttpStatus.OK);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to private void deleteDataReference and private DataReference createDataReference, could the get-by-id and get-by-name methods be factored into a common method?

// TODO(PF-404): this is a workaround until clients migrate off this endpoint.
if (ref.referenceType() == DataReferenceType.DATA_REPO_SNAPSHOT) {
responseList.addResourcesItem(ref.toApiModel());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this omits non-snapshot references from its response payload, it could easily return incorrect pagination - user requested a limit of 5, but we only return 3.

I don't think this is worth fixing in this PR - we should encourage clients to move to the new endpoints instead. But let's remain aware of this behavior in case it causes strange problems elsewhere, potentially in automated tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think the main way to mitigate this is not adopting the new endpoints (including creating non-snapshot references) until the enumerate is ready, but that's a risk.

*/
public static final Pattern BQ_DATASET_NAME_VALIDATION_PATTERN =
Pattern.compile("^[_a-zA-Z0-9]{1,1024}$");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do any of the Google client libs offer name validation that we could delegate to, instead of writing our own regexes? (honest question, I don't know if they do or not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the client libs don't validate the names, you'll just get a 40X response from GCP. Agreed that rolling our own regexes for a "necessary but not sufficient" check is a little messy, but I think it's better than passing around totally unverified strings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devil's advocate: what are we ever going to use the string for except to pass to Google to make a resource? What's the harm in late validation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to send it back to the user in an error message. Security doesn't want us putting unvalidated user input into an output string.

throw new InvalidDataReferenceException(
String.format("Error while trying to access GCS bucket %s", ref.bucketName()), e);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the innards of CRL - do you know what happens here if the bucket is requester-pays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen that case specifically, but CRL is a really thin wrapper around the Google client library. If we're calling GET on a bucket (which I think is a class B operation), it looks like this will fail with a 400 UserProjectMissing and we'd end up throwing. Is that a usecase we need to support?


private AuthenticatedUserRequest getAuthenticatedInfo() {
return authenticatedUserRequestFactory.from(request);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataRepoReferenceController and GoogleReferenceController share boilerplate - is there opportunity for a ReferenceController base class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I'm not sure I took the cleanest approach given the way I'm passing beans around, but it does cut down on boilerplate. I might try to revisit this a bit to shrink the argument list


/api/workspaces/v1/{id}/resources/references/datarepo/snapshots:
parameters:
- $ref: '#/components/parameters/Id'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this PR: I really wish this was named workspaceId instead of the totally-ambiguous id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it really should be, that's tripped people up in client tests before when id gets mixed up between workspaces and references.

application/json:
schema:
$ref: '#/components/schemas/BigQueryDatasetReference'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will also need updates based on #221 - adding description to DataReferenceMetadata ( I assume that's where it would go ), plus patch APIs for each reference type.

@zloery
Copy link
Contributor Author

zloery commented Feb 10, 2021

Update: the current approach is per-type create, patch, and get (and get-by-name) endpoints. Delete and enumerate will be shared at a .../resources level, out of scope for this PR. Reference descriptions are also merged in, PTAL.

@zloery zloery requested a review from ddietterich February 11, 2021 16:55
Copy link
Contributor

@ddietterich ddietterich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. A scattering of comments.

I want to understand moving the untar of the common swagger into the source tree. I'd like to avoid that if possible

GoogleBucketReference response =
new GoogleBucketReference()
.bucket(
((bio.terra.workspace.service.datareference.model.GoogleBucketReference)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have some prefix or suffix like API on the generated code so we don't run into these name collisions.
Duplicate names will make it easy to screw up and confusing to understand and debug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to set up a global prefix in Swagger.

Copy link
Contributor Author

@zloery zloery Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I think making that change (especially if we add a global prefix) ends up out of scope for this PR.

new GoogleBucketReference()
.bucket(
((bio.terra.workspace.service.datareference.model.GoogleBucketReference)
ref.referenceObject())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave this for now, but this is where having the class hierarchy would make things cleaner.

}

logger.info(
String.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the almost all the time. You can use placeholders {} in the log message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed this one. thanks!

from zipTree(configurations.cloudResourceSchema.singleFile).matching {
include 'cloud_resources_uid.yaml'
}
into "${resourceDir}/api/common"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to unpack this into the source tree?
Is an alternative to copy our resources into the build tree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to deploy our yaml file into a build dir so we could use relative (non-upward) paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I unpacked it into the source tree so that our swagger-ui (also in the source dir) can parse references to it. otherwise, those schemas won't show up in swagger-ui, which breaks example values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave it for now. I'll make a ticket.
Another approach might be to move all of the parts into a build subdirectory and do the codegen from there.
I'll make a ticket.

from zipTree(configurations.cloudResourceSchema.singleFile).matching {
include 'cloud_resources_uid.yaml'
}
into "${resourceDir}/api/common"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to deploy our yaml file into a build dir so we could use relative (non-upward) paths.


// Internal dependencies
// Note The Open API schema depends on an external library - cloud-resource-schema, so need to unzip it first.
swaggerSources.server.code.dependsOn tasks.unzipCloudResourceSchema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the swagger client tasks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took longer to investigate and fix than I want to admit, but you're right that swagger client tasks didn't depend on this. Should be fixed now

GoogleBucketReference response =
new GoogleBucketReference()
.bucket(
((bio.terra.workspace.service.datareference.model.GoogleBucketReference)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to set up a global prefix in Swagger.


@Override
public ResponseEntity<Void> updateGoogleBucketReference(
UUID id, UUID referenceId, UpdateDataReferenceRequestBody body) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method can just be updateBucketReference because it's on the Google controller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public void testInvalidCharInBucketName() {
assertThrows(
InvalidDataReferenceException.class,
() -> DataReferenceValidationUtils.validateBucketName("INVALIDBUCKETNAME"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have this validation at a higher level so controlled resources can use it too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in a validation util class called by the controller, I think it's actually reasonable for the various controlled resource controllers to use this bean as well

from zipTree(configurations.cloudResourceSchema.singleFile).matching {
include 'cloud_resources_uid.yaml'
}
into "${resourceDir}/api/common"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave it for now. I'll make a ticket.
Another approach might be to move all of the parts into a build subdirectory and do the codegen from there.
I'll make a ticket.

@zloery
Copy link
Contributor Author

zloery commented Feb 12, 2021

Published version 0.12.0-SNAPSHOT from this.

@zloery zloery merged commit d3fa9a8 into dev Feb 12, 2021
@zloery zloery deleted the pf308 branch February 12, 2021 19:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants