Conversation
| return fmt.Sprintf("relative path is invalid, target doesn't exist or is bigger than %s", semantic.RelativePathFileMaxSize) | ||
| } else if description == "Does not match format '"+semantic.DataStreamNameFormat+"'" { | ||
| return "data stream doesn't exist" | ||
| } |
There was a problem hiding this comment.
Avoiding the need of these adjustments would be another reason to look for another jsonschema library with better customization options.
|
@mtojek ideas for testing this? I wouldn't like to commit files that exceed the limits 🙂 An idea could be to add a generator of packages for testing, and generate the packages that exceed the limits on test time. |
mtojek
left a comment
There was a problem hiding this comment.
@mtojek ideas for testing this? I wouldn't like to commit files that exceed the limits 🙂
An idea could be to add a generator of packages for testing, and generate the packages that exceed the limits on test time.
My first thought would be mocking, so you don't have to generate big files even temporarily. You check if you can mimic the FS interface and return some big size().
Another alternative would be setting really low limits just for testing.
First round of the code review done.
I admit that I got scared with this PR... It was supposed to be an easy size check to implement, but we ended up with complex "spectyping". I'm wondering if we shouldn't rethink the idea or adjust the existing design.
What we can consider here is a "file limits" file, which compacts all global limitations we have for folders and their content, so that it's easy accessible and not hidden in the Go code. Keep in mind about the original (but dead today) portability idea :)
| return err | ||
| } | ||
| if mime.FormatMediaType(mediatype, params) == "" { | ||
| // Bug in mime library? Happens when parsing something like "0;*0=0" |
There was a problem hiding this comment.
I don't have any deeper knowledge of this. Did you find it by accident with unit tests?
There was a problem hiding this comment.
Gave a try to 1.18's fuzzy testing in https://github.com/elastic/package-spec/pull/278/files#diff-2a7131ad04e8aa8fcd479f359de4357a6dbe6316b61c5f6966079c4c486611f2 and found this.
|
I will move b524c92 to a different PR. |
| "github.com/elastic/package-spec/code/go/internal/fspath" | ||
| ) | ||
|
|
||
| const maxFieldsPerDataStream = 1024 |
There was a problem hiding this comment.
Not sure if moving this too to somewhere in the spec 🤔
There was a problem hiding this comment.
I would consider every number/count/etc. as a constant value, so - in the spec :)
a271592 to
8b6e763
Compare
8b6e763 to
3ca2145
Compare
|
Tests and changelog added, opening for review. |
Do you think you can mark it as ready for review or there is a potential blocker? |
Ah yes, I wanted to do that after my previous comment 🤦 |
mtojek
left a comment
There was a problem hiding this comment.
I left a few minor comments to be addressed/responded to, but nothing serious. You did a pretty good job adapting the spec.
What makes me worried is the fact, that the implementation will become more tangled than it was before. I think we should open a new issue to research alternative spec parsers to slim down the current codebase.
| return fmt.Errorf("invalid format for file size (%s): %w", text, err) | ||
| } | ||
|
|
||
| unit := match[2] |
There was a problem hiding this comment.
Maybe we should use a library like: https://github.com/dustin/go-humanize
There was a problem hiding this comment.
It could be a good idea to have less code here.
But there are a couple of small concerns I have with humanize:
- It is strict with the interpretation of unit prefixes (Kilo means 1000, KiByte needs to be used for 1024), we would need to change from MB to MiB for example.
- Related to previous point, given that the parser supports units with both prefixes, we may end up having a mix in the spec (250MB and 250MiB would be correctly parsed, but they don't mean the same).
- It rounds-up when giving the string representation ("1024B" and "1025B" are both converted to "1.0 KiB"). We need to be able to convert back and forth because we parse yaml and convert it to json, and we would be losing precision. We could workaround this problem by marshalling always to the number of bytes, and using
humanizeto parse and forString(). Or we could ignore this lose of precission.
Wdyt, should I give a try to humanize?
There was a problem hiding this comment.
Related to previous point, given that the parser supports units with both prefixes, we may end up having a mix in the spec (250MB and 250MiB would be correctly parsed, but they don't mean the same).
they would be parsed correctly, but would mean different amount of units: 25010241024 vs 25010001000), so I would say it isn't a bug.
It rounds-up when giving the string representation ("1024B" and "1025B" are both converted to "1.0 KiB"). We need to be able to convert back and forth because we parse yaml and convert it to json, and we would be losing precision. We could workaround this problem by marshalling always to the number of bytes, and using humanize to parse and for String(). Or we could ignore this lose of precission.
It rounds-up when giving the string representation ("1024B" and "1025B" are both converted to "1.0 KiB").
Definitely, we can't lose precision. I would give the go-humanize library a try only if there is an option to strictly convert values, without any rounding. If we can't achieve this, let's stick to what you implemented.
BTW I'm fine with improving it in follow-ups as what you have here works well.
There was a problem hiding this comment.
Related to previous point, given that the parser supports units with both prefixes, we may end up having a mix in the spec (250MB and 250MiB would be correctly parsed, but they don't mean the same).
they would be parsed correctly, but would mean different amount of units: 250_1024_1024 vs 250_1000_1000), so I would say it isn't a bug.
Well, 250MB are parsed as 250000000 bytes, while 250MiB are parsed as 262144000 bytes, a difference of 11MiB, or 12MB.
Let's revisit this in follow ups if needed.
| FieldsPerDataStreamLimit int `yaml:"fieldsPerDataStreamLimit"` | ||
| } | ||
|
|
||
| func (l *commonSpecLimits) update(o commonSpecLimits) { |
There was a problem hiding this comment.
In general, I'm hesitant to use reflection unless we can't achieve something any other way round. I'm wondering if we can go with a standard copy operation between fields.
There was a problem hiding this comment.
I agree, but for this case I didn't find any other way around:
- Preinitializing the structs with default values and a standard copy is not possible, because these objects are created when unmarshaling the objects. And these "default" values are obtained in the same unmarshal operation.
- I could do some kind of unmarshaling in two passes, one to read the limits on the parent and preinitialize the children contents, and another one to unmarshal the children, bit this can be worse than using reflection
- I could do the good old set field by field, but I consider this error prone, we may forget of adding a field here after adding it to the struct.
There was a problem hiding this comment.
I could do some kind of unmarshaling in two passes, one to read the limits on the parent and preinitialize the children contents, and another one to unmarshal the children, bit this can be worse than using reflection
Yes, I used to follow this option when interacting with complex structures, ignoring the performance issues. Regarding reflection, I'm not sure if it still works when we add slices or pointers.
Let's stick to reflection, but maybe make sure if it's supported if we decide to build a WebAssembly out of the package-spec. It might be a blocker then.
There was a problem hiding this comment.
Yes, I used to follow this option when interacting with complex structures, ignoring the performance issues. Regarding reflection, I'm not sure if it still works when we add slices or pointers.
Yep, with more complex types this may not work, but this object is intended to store limits, that are most likely going to be numeric values.
Let's stick to reflection, but maybe make sure if it's supported if we decide to build a WebAssembly out of the package-spec. It might be a blocker then.
I wouldn't expect compatibility issues because of using reflection, this is also used by core packages such as encode/json.
(In any case, I have tried, and tests pass with WebAssembly - GOOS=js GOARCH=wasm go test -exec="$(go env GOROOT)/misc/wasm/go_js_wasm_exec" ./...)
I guess that with any "validation" feature we add we are adding more code to the mix, not sure what we can do about this... |
mtojek
left a comment
There was a problem hiding this comment.
LGTM, any leftovers can be implemented as follow-ups.
I guess that with any "validation" feature we add we are adding more code to the mix, not sure what we can do about this...
What do you think about adding an issue to the backlog to write down all non-standard features we're using that are not present in the jsonschema library? It will help us understand what exactly are we looking for to make the spec clear.
Issue created for this #287. |
What does this PR do?
folderItemSpec.validateandloadItemContentto split its responsibilities.Why is it important?
Ensure that packages and the resources they contain fit under controlled limits.
Checklist
test/packagesthat prove my change is effective.versions/N/changelog.yml.Related issues