Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/96 validators #100

Closed
wants to merge 1 commit into from
Closed

Conversation

mwitkow
Copy link

@mwitkow mwitkow commented Aug 30, 2015

* work in progress*

Implementation of the validator gadget. Currently supporting regex and integer greater than/lesser than.

TODO:

  • tests
  • comments

#96

@@ -67,6 +67,8 @@ extend google.protobuf.FileOptions {
optional bool goproto_extensions_map_all = 63025;
optional bool goproto_unrecognized_all = 63026;
optional bool gogoproto_import = 63027;

optional bool validator_all = 63028;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will you use file level validation for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally thought about doing a validator for every file. But that's not needed since the Validator interface was declared.

@mwitkow
Copy link
Author

mwitkow commented Aug 31, 2015

Done some major changes:

  • msg_exists in proto3 validates whether the protobuf contains a nested message... since there's no required
  • fixed handling of nullable=false for proto2 and proto3

Still need some hints as to how to test this? Manually written tests in tests/validator?

}
}

func (p *plugin) generateProto2Message(file *generator.FileDescriptor, message *generator.Descriptor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to fold proto2 and proto3 into one method generateMethod with some ifs for proto3, check some of the other plugins.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that initially, but the code turned out incredibly spaghetti due to the proto2 handling of nullable. I decided to split it up to make it easier to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should check again.

} else if nullable && !proto3 {

@awalterschulze
Copy link
Member

Manually written tests in test/validator is a start.

How about using the populate plugin.
Then some manual code that checks whether the randomly generated structure should validate or not vs the generated validate method.

I actually can't think of a way to do this more automatically :(

@awalterschulze
Copy link
Member

If you only support limited functions that have some duality, checking and random generation then you could extend the populate plugin to only generate valid messages.
Then you could easily write a test generator.

@mwitkow
Copy link
Author

mwitkow commented Aug 31, 2015

Can you provide pointers to examples of how to use the populate plugins? I think initially I'll stick to manual tests that validate the manually written protos.

@awalterschulze
Copy link
Member

The amount of specific functions could become quite large. For each function a new extension will be needed. Check for example the amount of functions that json schema has: http://json-schema.org/latest/json-schema-validation.html

@mwitkow
Copy link
Author

mwitkow commented Aug 31, 2015

Re json schema bloat:
Most of this huge and bloated spec are validators of message structure (optional vs required, one of, is a repeatable field etc), something that protobuf takes care by itself. There are only two sections for field content validation there:

  • strings (sec 5.2):
    • pattern - already implemented in regex
    • maxLength minLength - I'd argue unnecessary as it is doable through regex
    • default - present in proto2 and not doable in proto3
  • numerics (sec 5.1) :
    • multipleOf - who knows why this is useful to anyone
    • minimumValue, maximumValue - done but only for int types, and could be extended for float

On top of that the only thing I can think anyone would need is max_length for string and byte.

I genuinely think that this, coupled with the msg_exists for proto3, should cover most of the use cases for out of the box validators. For the rest, we can rely on users implementing their own Validate method in the auto-generated package.

@mwitkow
Copy link
Author

mwitkow commented Aug 31, 2015

Added manual tests. They're not well broken up into individual per-field functions, but they do test all the edge cases I could think of.

@awalterschulze
Copy link
Member

Cool tests.

I had another thought.
required could be done correctly (for any field) as part of unmarshaling, as it is currently done for proto2.
So maybe this could be a nice extensions only used in proto3 for gogoproto.

gogoproto.required=true

Then I started thinking more about unmarshaling.
If we build customtype for all types, not just byte silces as in the current implementation, then as part of the Unmarshal method of that customtype it could return a validation error.

Example:

message ValidatorMessage3 {
    message Embedded {
        string Identifier = 1 [(gogoproto.customtype) = "Id"];
        int64 SomeValue = 2 [(gogoproto.customtype) = "MaxHundred"];
    }
    string SomeString = 1 [(gogoproto.customtype) = "SmallString"];

    uint32 SomeInt = 6  [(gogoproto.customtype) = "Min10"];

    Embedded someEmbedded = 10;
    repeated Embedded someEmbeddedRep = 14;
}
type Id string

func (this *Id) Unmarshal(buf []byte) error {
  //cast bytes to string, because that is now a string is unmarshaled.
  //do regex on string
  //return an error if regex fails
}

What do you think?

@mwitkow
Copy link
Author

mwitkow commented Sep 5, 2015

Apologies for the delay, work got in the way. :(

It feels like custom types would be quite intrusive in the normal proto3 flow that people get used to when reading official docs. I think keeping it as a clear extension that deals with validation is clearer.

However, I do like the idea of extracting the required into marshalling outside of validation. However, required, AFAIK, is a keyword and can't be used for an extension field name. :(

Also, as we build more and more services, this becomes an ever more important feature for us. Can we treat the current implementation as an MVP (with a possible addition of floats), and later maybe add custom types?

For now I'm kinda puzzled why this change causes the maps test to fail:

--- FAIL: TestAllMapsProto (0.00s)
panic: runtime error: slice bounds out of range [recovered]
    panic: runtime error: slice bounds out of range

:(

@awalterschulze
Copy link
Member

No problem.

Regarding clearer docs, yes.
I guess its a lot like using a variable to store a regex and reusing it over and over.
It would be clearer to read if it was written out everywhere, but it is also less prone to mistakes to only write it once.
Having one comment above explaining what an Id is, etc. would do wonders.

We could just find a new name for required like for example, req, proto2required, etc.

MVP is great for proprietary code, when you can keep a handle on all your users, but for an open source tool and library I am quite scared to apply the same logic. I think at least the vision needs to be clear and it should be known how backwards compatibility will not be broken.
In my opinion what we have here is great proof of concept.
I would really be more comfortable with first gathering more data.
Typically I would do this by introducing the feature inside my team with their very demanding usecases and seeing where my logic is still failing and where more features or more general features are better.
You have a fork, why not use this inside your team and see where it leads?
I would also recommend adding some tests with the current implementation of customtype and customname and just some other features mingled into the validation.
Even if it is just to add a check that tells the user that this is not a valid use of validation.

Regarding the failed test, please try getting the newest master.
I have added some fuzz tests and those intermittently find some bugs of which I have fixed some quite recently.

@mwitkow mwitkow force-pushed the feature/96_validators branch from 0b4bdd7 to 3626925 Compare September 24, 2015 08:15
@mwitkow
Copy link
Author

mwitkow commented Sep 24, 2015

Thanks for the response Walter, apologies I haven't got back to you in a while.. vacation 😎

I've rebased it on current master and we'll start using this internally, and see what else is needed.

After a week I'll come back and report progress and possibly implement the customtype and customname tests.

As for proto3required, how about validator.present just for submessages?

@awalterschulze
Copy link
Member

No problem hope you enjoyed the holiday :)

I look forward to hearing about your experiences.
I don't know how quickly your teams adopt new technology, but maybe a week is a little short to truly get a full experience of their possible extra requirements.

customtype, customname and casttype tests sound good :)

In my opinion required only for submessages is a hack, sorry, but we have discussed a perfectly full feature where required checking happens during unmarshaling, just as it is being done for proto2 required.

@mwitkow mwitkow force-pushed the feature/96_validators branch 2 times, most recently from e619e86 to e4838ff Compare October 21, 2015 10:50
@mwitkow mwitkow force-pushed the feature/96_validators branch from e4838ff to 9aa83b2 Compare October 21, 2015 10:52
@mwitkow
Copy link
Author

mwitkow commented Oct 28, 2015

Sorry for the long time without response.
We've been playing around with validators for a while now. The idea proved incredibly useful when building our user-facing APIs: the automatic sanitization of gRPC inputs regardless of where the given protobuf is used helped remove code duplication and silly mistakes (e.g. one place allowing 32 characters, while another one 31).

The msg_exists proved especially useful, to avoid code that sanity checks whether a given value is not nil. I do agree with your assessment that it probably should be promoted out of Validate into an unmarshal message. However, I don't see how we could extend the required annotations to string or number fields that have default values.

If you're interested in me upstreaming this work, please provide pointers as to what improvements you'd like to see other than customtype/cuystomname tests. :)

@awalterschulze
Copy link
Member

Great to hear from you again :)
I think this has been a nice amount of time for a more proper evaluation.
I am glad you found it so useful :)

Did you find that you needed any other functions, did your users wish they could do more validation?

I think the best would be to start with promoting msg_exists to something resembling the proto2 required functionality. This should be a totally separate pull request. That way it will make my life a lot easier. Would this be possible for you?

Next you have the regex, min and max functions, which are pure validators and nothing more.
These extensions could be used by a separate code generator plugin, as you have done.

This is where some tests are needed:
Look at thetest.proto there are structs like CustomNameNinOptNative, CustomNameCustomType and NidOptCustom which should inspire some new creative tests by copying and adding some validators.

Please tell me if I have missed something?

@awalterschulze
Copy link
Member

Seems there is at least one more person interested in validating protobufs golang/protobuf#52
@mwmahlberg maybe you want to discuss your requirements here?

@mwmahlberg
Copy link

Thanks @awalterschulze for pointing me here. I would like to note some general thoughts here. @mwitkow Please do not see the them as critique towards your impressive work 👏 , but as suggestions for the general route.

I think we have the same idea: define protobuf messages, use them as our models for gRPC based services (and other "stuff") and have easy means validating messages. So my idea was to leverage existing validation solutions. Those are (almost) all based on struct tags and reflection. So I searched to find a way to add custom struct tags on generation, which is a feature request denied in golang/protobuf#52.

This approach would have quite some advantages, however:

  1. We could leverage one of the excellent existing validation solutions.
  2. It would reduce the code to be maintained and eliminate the need to "reinvent the wheel".
  3. Most importantly: It would be useful for a lot of other purposes since it is more generic.
    For example for xml serialization or MongoDB persistence via go-mgo/bson. SQL, too, for that matter. Basically everything which leverages struct tags.

The only problem I see is to prevent the need to have a protobuf extension which will only make sense with gogo/protobuf's generator. I am not too sure here, but my thought from a user's point of view is to have comments with a semantic meaning, much like //go:generate…

My vision (and top priority on my wishlist for Santa 😍 ) is something like this

syntax = "proto3";

message Foo {
    string Id = 1; //gogo:tags `bson:"_id" valid:"hexadecimal, required" foo:"bar"`
}

and have those tags added to the generated code:

type Foo struct {
    Id string  `protobuf:"bytes,1,opt,name=id" json:"id,omit empty" bson:"_id" valid:"hexadecimal, required" foo:"bar"`
}

which would allow easy validation (using govalidator, for instance):

// …
result, err := govalidator.ValidateStruct(inMsg)
if err != nil {
   return nil, grpc.Errorf(codes.InvalidArgument, "%s", err.Error())
}
// …

Since I am very tied into several projects business wise, I sadly can not really help with this feature for at least half a year. If my suggestion is accepted however, I will do my best to take my share.

@mwitkow
Copy link
Author

mwitkow commented Jan 24, 2016

@mwmahlberg so just a quick update on where the current Validators are.

We've been using them in production for all of our gRPC services and they saved hundreds of boilerplate validation code. The only change we did is we moved out the validation from the proto Codec into custom-built interceptors for gRPC handlers, so that, for example, we can log validation errors. We found that the narrow feature set of msg_exists, regex and less_than is enough for 99% of our use cases. The only problem we encountered was with a Time field (we didn't want to allow values before now()), and we fixed that by just implementing the Validate method on that massage manually :)

As such, I think that Validators, in whatever shape or form, would be a great addition to Gogo.

As per your proposal, we thought about doing it with struct tags but there were a couple of problems:

  1. Regex validation (which we need and rely on heavily) proved very, very slow. There was some cost to the reflection nature of struct tags (as opposed to generated code), but the biggest drawback was the compilation of the regex "on the fly", checking the field against the regex, and discarding the compiled regex... per message. This could be avoided with some clever caching, but that's prone to multithreading issues.
  2. We didn't want to use pre-existing Go validator libraries (such as https://github.com/asaskevich/govalidator) since we are about to standardize on the validator field annotations for other languages (Java) and generate checks there as well. As such, if we went with an existing Go validation library (and these are usually fairly rich), we'd spend a ton of time to rich feature parity with that library, without much benefit.

On a more upbeat note, you don't need to wait for Santa to try out your stuff. There's already a way to pass struct tags from .proto files by using the gogo.moretags annotations: http://gogo.github.io/doc/
That's what we use sometimes to declare Google Datastore entities. Please let me know how that goes.

@mwmahlberg
Copy link

@mwitkow I totally missed that feature. It works like a charm (except that I can't seem to get my fairly complex regex – speed is no concern for me – properly escaped 😊 ), and I was able to get rid of quite some code.

However, I understand the approach. For most applications, speed is crucial

Setting a standard for validation annotations seems like a great idea. I will try to dig into the code when I get some spare time.

@awalterschulze
Copy link
Member

Wow I didn't even know there was something like govalidator. Good to know.
What other validation libraries are out there?

So @mwmahlberg you found that moretags is all you need?

If you really don't want to use tags and prefer comments you spend quite a bit of time to develop a new protoc-gen-gogoyournamehere binary that reads the comments in the descriptor (which is not super easy) and then transcribes these into moretags before calling all the code generation. I think using moretags is much easier :)

@mwmahlberg
Copy link

@awalterschulze & @mwitkow Actually, that works pretty excellent. I have fairly complex validation running (including a domain name regex) over nested structs within 2ms. Good enough for me.

If you are interested, I can set up a sample project (the project I am working on is closed source).

@awalterschulze
Copy link
Member

I am just glad its working for you.
I am also working on another validation project or rather language http://katydid.github.io/play/
But there is no reason I can't make a translator between struct tags and the language and have a compilable version of the validation scheme.
The same obviously goes for the gogovalidator extensions which I could also translate to the language and validate the protocol buffers.
I just first need to "finish" the language.
But an example would be interesting, since I can always use more usecases.

@awalterschulze
Copy link
Member

You could probably split this out into a separate protoc-gen-validator binary and package that only depends on gogoprotobuf and is not a fork.
See https://github.com/gogo/letmegrpc/blob/master/protoc-gen-letmegrpc/main.go for an example of such a code generating binary.
This should take you a few minutes vs the load of work required to get this into a merge-able plugin.

@mwitkow
Copy link
Author

mwitkow commented Mar 9, 2016

How much work do you think it'd be to be a margeable plugin? Is it just the
missing tests and custom message handling you're worried about, or do you
fundamentally disagree with the approach?

On Wed, 9 Mar 2016, 12:29 Walter Schulze, [email protected] wrote:

You could probably split this out into a separate protoc-gen-validator
binary and package that only depends on gogoprotobuf and is not a fork.
See
https://github.com/gogo/letmegrpc/blob/master/protoc-gen-letmegrpc/main.go
for an example of such a code generating binary.
This should take you a few minutes vs the load of work required to get
this into a merge-able plugin.


Reply to this email directly or view it on GitHub
#100 (comment).

@awalterschulze
Copy link
Member

I just think if something can be separate it should be separate.
We don't have to create a bigger monolith.

I would rather ask is there any reason this should be part of gogoprotobuf.
For example customname extensions, I can't really say no to them, because they influence pretty much all other plugins.
But on the other hand letmegrpc extensions can go into their own repo and keep all their dependencies with them.
I think validators should serve you and others much better as a separate repo, but maybe I am wrong?

Its not that I don't like the work you've done here.
I am a little worried about msg_exists, since I would rather want a "required" extension for proto3, but it won't work for native fields, since if they have a default value it won't marshal it. So required becomes really hard. msg_exists makes much more sense than "required", but something about it just does not feel right. I wish I had a better idea.
I think the main thing would be more testing for customname, customtypes, etc.

@mwitkow
Copy link
Author

mwitkow commented Mar 30, 2016

I agreee, it makes sense to make it separate. It's especially useful since we're also considering building Java validators for them, so it made sense to move the proto annotations to a separate package.

I move the contents of this PR into https://github.com/mwitkow/go-proto-validators.

@mwitkow mwitkow closed this Mar 30, 2016
@awalterschulze
Copy link
Member

Well done it looks awesome :)

@awalterschulze
Copy link
Member

My validation language is finally ready http://katydid.github.io/
But I still think your validators are better for your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants