Implement package limits by jsoriano · Pull Request #278 · elastic/package-spec

jsoriano · 2022-02-17T19:02:05Z

What does this PR do?

Implement package limits:
- Global - Package size - 250MB
- Global - Total filesize - 150MB
- Global - Total number of files in a package - 65535
- Global - Total number of files in a directory - 65535
- Local - Number of data streams - 500
- Local - Number of fields in data stream - 1024
- Local - Size of graphic resources - 3MB
- Local - Size of configuration files (yml files, manifests, fields) - 5MB
- Local - Ingest pipelines - 3MB
- Local - Config templates (hbs files) - 2MB
Add helper types for content media types and file sizes.
Refactor folderItemSpec.validate and loadItemContent to split its responsibilities.

Why is it important?

Ensure that packages and the resources they contain fit under controlled limits.

Checklist

I have added test packages to test/packages that prove my change is effective.
I have added an entry in versions/N/changelog.yml.

Related issues

Relates to Define guidelines about what kind of assets should be included in packages #162

elasticmachine · 2022-02-17T19:06:00Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-03-01T16:06:13.729+0000
Duration: 6 min 58 sec

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.

code/go/internal/validator/semantic/format_checkers.go

jsoriano · 2022-02-17T19:10:24Z

code/go/internal/validator/folder_item_spec_errors.go

+		return fmt.Sprintf("relative path is invalid, target doesn't exist or is bigger than %s", semantic.RelativePathFileMaxSize)
 	} else if description == "Does not match format '"+semantic.DataStreamNameFormat+"'" {
 		return "data stream doesn't exist"
 	}


Avoiding the need of these adjustments would be another reason to look for another jsonschema library with better customization options.

code/go/internal/validator/semantic/types.go

jsoriano · 2022-02-17T19:16:45Z

@mtojek ideas for testing this? I wouldn't like to commit files that exceed the limits 🙂

An idea could be to add a generator of packages for testing, and generate the packages that exceed the limits on test time.

mtojek

@mtojek ideas for testing this? I wouldn't like to commit files that exceed the limits 🙂
An idea could be to add a generator of packages for testing, and generate the packages that exceed the limits on test time.

My first thought would be mocking, so you don't have to generate big files even temporarily. You check if you can mimic the FS interface and return some big size().

Another alternative would be setting really low limits just for testing.

First round of the code review done.

I admit that I got scared with this PR... It was supposed to be an easy size check to implement, but we ended up with complex "spectyping". I'm wondering if we shouldn't rethink the idea or adjust the existing design.

What we can consider here is a "file limits" file, which compacts all global limitations we have for folders and their content, so that it's easy accessible and not hidden in the Go code. Keep in mind about the original (but dead today) portability idea :)

versions/1/data_stream/spec.yml

mtojek · 2022-02-21T12:34:56Z

code/go/internal/spectypes/contenttype.go

+		return err
+	}
+	if mime.FormatMediaType(mediatype, params) == "" {
+		// Bug in mime library? Happens when parsing something like "0;*0=0"


I don't have any deeper knowledge of this. Did you find it by accident with unit tests?

Gave a try to 1.18's fuzzy testing in https://github.com/elastic/package-spec/pull/278/files#diff-2a7131ad04e8aa8fcd479f359de4357a6dbe6316b61c5f6966079c4c486611f2 and found this.

code/go/internal/validator/semantic/format_checkers.go

code/go/internal/spectypes/filesize.go

code/go/internal/validator/common_spec.go

code/go/internal/validator/folder_item_content.go

code/go/internal/validator/folder_item_spec.go

jsoriano · 2022-02-21T16:34:04Z

My first thought would be mocking, so you don't have to generate big files even temporarily. You check if you can mimic the FS interface and return some big size().

Good point, I will give it a try. Though we may need to adapt some calls that are done directly with os or ioutil.

jsoriano · 2022-02-22T20:45:46Z

I will move b524c92 to a different PR.

jsoriano · 2022-02-23T12:07:36Z

code/go/internal/validator/semantic/validate_fields_limits.go

+	"github.com/elastic/package-spec/code/go/internal/fspath"
+)
+
+const maxFieldsPerDataStream = 1024


Not sure if moving this too to somewhere in the spec 🤔

I would consider every number/count/etc. as a constant value, so - in the spec :)

Moved 9453025

jsoriano · 2022-02-28T17:21:35Z

Tests and changelog added, opening for review.

mtojek · 2022-03-01T08:09:37Z

Tests and changelog added, opening for review.

Do you think you can mark it as ready for review or there is a potential blocker?

jsoriano · 2022-03-01T08:50:29Z

Tests and changelog added, opening for review.

Do you think you can mark it as ready for review or there is a potential blocker?

Ah yes, I wanted to do that after my previous comment 🤦

mtojek

I left a few minor comments to be addressed/responded to, but nothing serious. You did a pretty good job adapting the spec.

What makes me worried is the fact, that the implementation will become more tangled than it was before. I think we should open a new issue to research alternative spec parsers to slim down the current codebase.

code/go/internal/spectypes/filesize.go

mtojek · 2022-03-01T10:29:38Z

code/go/internal/spectypes/filesize.go

+		return fmt.Errorf("invalid format for file size (%s): %w", text, err)
+	}
+
+	unit := match[2]


Maybe we should use a library like: https://github.com/dustin/go-humanize

It could be a good idea to have less code here.

But there are a couple of small concerns I have with humanize:

It is strict with the interpretation of unit prefixes (Kilo means 1000, KiByte needs to be used for 1024), we would need to change from MB to MiB for example.

Related to previous point, given that the parser supports units with both prefixes, we may end up having a mix in the spec (250MB and 250MiB would be correctly parsed, but they don't mean the same).

It rounds-up when giving the string representation ("1024B" and "1025B" are both converted to "1.0 KiB"). We need to be able to convert back and forth because we parse yaml and convert it to json, and we would be losing precision. We could workaround this problem by marshalling always to the number of bytes, and using humanize to parse and for String(). Or we could ignore this lose of precission.

Wdyt, should I give a try to humanize?

Related to previous point, given that the parser supports units with both prefixes, we may end up having a mix in the spec (250MB and 250MiB would be correctly parsed, but they don't mean the same).

they would be parsed correctly, but would mean different amount of units: 25010241024 vs 25010001000), so I would say it isn't a bug.

It rounds-up when giving the string representation ("1024B" and "1025B" are both converted to "1.0 KiB"). We need to be able to convert back and forth because we parse yaml and convert it to json, and we would be losing precision. We could workaround this problem by marshalling always to the number of bytes, and using humanize to parse and for String(). Or we could ignore this lose of precission.

It rounds-up when giving the string representation ("1024B" and "1025B" are both converted to "1.0 KiB").

Definitely, we can't lose precision. I would give the go-humanize library a try only if there is an option to strictly convert values, without any rounding. If we can't achieve this, let's stick to what you implemented.

BTW I'm fine with improving it in follow-ups as what you have here works well.

Related to previous point, given that the parser supports units with both prefixes, we may end up having a mix in the spec (250MB and 250MiB would be correctly parsed, but they don't mean the same).

they would be parsed correctly, but would mean different amount of units: 250_1024_1024 vs 250_1000_1000), so I would say it isn't a bug.

Well, 250MB are parsed as 250000000 bytes, while 250MiB are parsed as 262144000 bytes, a difference of 11MiB, or 12MB.

Let's revisit this in follow ups if needed.

mtojek · 2022-03-01T10:30:52Z

code/go/internal/validator/common_spec.go

+	FieldsPerDataStreamLimit int `yaml:"fieldsPerDataStreamLimit"`
+}
+
+func (l *commonSpecLimits) update(o commonSpecLimits) {


In general, I'm hesitant to use reflection unless we can't achieve something any other way round. I'm wondering if we can go with a standard copy operation between fields.

I agree, but for this case I didn't find any other way around:

Preinitializing the structs with default values and a standard copy is not possible, because these objects are created when unmarshaling the objects. And these "default" values are obtained in the same unmarshal operation.

I could do some kind of unmarshaling in two passes, one to read the limits on the parent and preinitialize the children contents, and another one to unmarshal the children, bit this can be worse than using reflection

I could do the good old set field by field, but I consider this error prone, we may forget of adding a field here after adding it to the struct.

I could do some kind of unmarshaling in two passes, one to read the limits on the parent and preinitialize the children contents, and another one to unmarshal the children, bit this can be worse than using reflection

Yes, I used to follow this option when interacting with complex structures, ignoring the performance issues. Regarding reflection, I'm not sure if it still works when we add slices or pointers.

Let's stick to reflection, but maybe make sure if it's supported if we decide to build a WebAssembly out of the package-spec. It might be a blocker then.

Yes, I used to follow this option when interacting with complex structures, ignoring the performance issues. Regarding reflection, I'm not sure if it still works when we add slices or pointers.

Yep, with more complex types this may not work, but this object is intended to store limits, that are most likely going to be numeric values.

Let's stick to reflection, but maybe make sure if it's supported if we decide to build a WebAssembly out of the package-spec. It might be a blocker then.

I wouldn't expect compatibility issues because of using reflection, this is also used by core packages such as encode/json.

(In any case, I have tried, and tests pass with WebAssembly - GOOS=js GOARCH=wasm go test -exec="$(go env GOROOT)/misc/wasm/go_js_wasm_exec" ./...)

code/go/internal/validator/folder_item_content.go

code/go/pkg/validator/limits_test.go

code/go/internal/validator/semantic/format_checkers.go

jsoriano · 2022-03-01T17:30:28Z

What makes me worried is the fact, that the implementation will become more tangled than it was before. I think we should open a new issue to research alternative spec parsers to slim down the current codebase.

I guess that with any "validation" feature we add we are adding more code to the mix, not sure what we can do about this...

mtojek

LGTM, any leftovers can be implemented as follow-ups.

I guess that with any "validation" feature we add we are adding more code to the mix, not sure what we can do about this...

What do you think about adding an issue to the backlog to write down all non-standard features we're using that are not present in the jsonschema library? It will help us understand what exactly are we looking for to make the spec clear.

jsoriano · 2022-03-02T11:49:45Z

What do you think about adding an issue to the backlog to write down all non-standard features we're using that are not present in the jsonschema library? It will help us understand what exactly are we looking for to make the spec clear.

Issue created for this #287.

jsoriano added 13 commits February 16, 2022 13:15

Configure max sizes

23e80ea

Add helper types for content type and file sizes

ee16618

Check size limits based on content size

a80b486

Fix file size yaml parsing

c798d32

Fix yaml marshaler

84b6ffc

Add fuzz tests

02ca9ad

Validate max size

e6bdd5a

Read item content only if it is going to be used

60ec912

Add limits to files referenced in relative paths

7c00668

Check limit of files in folder

f7c781f

Global limit per file

fd71e99

Validate limit of fields

5257b7d

Fix validation of nested fields

dd3c3f6

jsoriano self-assigned this Feb 17, 2022

jsoriano commented Feb 17, 2022

View reviewed changes

jsoriano mentioned this pull request Feb 17, 2022

Fix validation of dimension fields inside objects #279

Merged

2 tasks

jsoriano added 2 commits February 17, 2022 20:39

Fix linting

a0ed324

strings.Cut is not available yet :)

a64bd33

jsoriano mentioned this pull request Feb 21, 2022

Disallow unknown properties on fields definitions #281

Closed

2 tasks

mtojek reviewed Feb 21, 2022

View reviewed changes

jsoriano added 7 commits February 21, 2022 17:36

Merge remote-tracking branch 'origin/main' into package-limits

0df97f3

Define limits in the spec

2f87d26

Propagate limits to children content

9b909f3

Extend comments on spec types

96b7804

Linting

c373452

Implement global limits

050dd75

Linting

2c4c8ac

Use fs.FS

b524c92

jsoriano mentioned this pull request Feb 23, 2022

Make all file operations through the filesystem interface #283

Merged

2 tasks

jsoriano commented Feb 23, 2022

View reviewed changes

jsoriano added 2 commits February 24, 2022 11:56

Merge remote-tracking branch 'origin/main' into package-limits

38c0256

Move limit to data stream fields to the spec

9453025

jsoriano force-pushed the package-limits branch 2 times, most recently from a271592 to 8b6e763 Compare February 24, 2022 18:25

Add tests

3ca2145

jsoriano force-pushed the package-limits branch from 8b6e763 to 3ca2145 Compare February 24, 2022 18:27

jsoriano added 4 commits February 28, 2022 17:21

Refactor fs mock

48e4b51

More tests for limits

f3efa84

Add changelog entry

e462211

Adjust test file size

a0f59d3

jsoriano marked this pull request as ready for review March 1, 2022 08:49

jsoriano requested a review from a team as a code owner March 1, 2022 08:49

mtojek reviewed Mar 1, 2022

View reviewed changes

Feedback from Review

43a3811

mtojek self-requested a review March 2, 2022 09:21

mtojek approved these changes Mar 2, 2022

View reviewed changes

jsoriano merged commit 6df0a1a into elastic:main Mar 2, 2022

This was referenced Mar 17, 2022

Define guidelines about what kind of assets should be included in packages #162

Open

Discuss strategies to reduce number of fields elastic/integrations#2839

Closed

Conversation

jsoriano commented Feb 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Why is it important?

Checklist

Related issues

Uh oh!

elasticmachine commented Feb 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

Build stats

🤖 GitHub comments

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jsoriano commented Feb 17, 2022

Uh oh!

mtojek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jsoriano commented Feb 21, 2022

Uh oh!

jsoriano commented Feb 22, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsoriano commented Feb 28, 2022

Uh oh!

mtojek commented Mar 1, 2022

Uh oh!

jsoriano commented Mar 1, 2022

Uh oh!

mtojek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jsoriano commented Mar 1, 2022

Uh oh!

mtojek left a comment

Choose a reason for hiding this comment

jsoriano commented Feb 17, 2022 •

edited

Loading

elasticmachine commented Feb 17, 2022 •

edited

Loading

jsoriano commented Mar 2, 2022 •

edited

Loading