Skip to content

Implement package limits#278

Merged
jsoriano merged 31 commits intoelastic:mainfrom
jsoriano:package-limits
Mar 2, 2022
Merged

Implement package limits#278
jsoriano merged 31 commits intoelastic:mainfrom
jsoriano:package-limits

Conversation

@jsoriano
Copy link
Member

@jsoriano jsoriano commented Feb 17, 2022

What does this PR do?

  • Implement package limits:
    • Global - Package size - 250MB
    • Global - Total filesize - 150MB
    • Global - Total number of files in a package - 65535
    • Global - Total number of files in a directory - 65535
    • Local - Number of data streams - 500
    • Local - Number of fields in data stream - 1024
    • Local - Size of graphic resources - 3MB
    • Local - Size of configuration files (yml files, manifests, fields) - 5MB
    • Local - Ingest pipelines - 3MB
    • Local - Config templates (hbs files) - 2MB
  • Add helper types for content media types and file sizes.
  • Refactor folderItemSpec.validate and loadItemContent to split its responsibilities.

Why is it important?

Ensure that packages and the resources they contain fit under controlled limits.

Checklist

Related issues

@jsoriano jsoriano self-assigned this Feb 17, 2022
@elasticmachine
Copy link

elasticmachine commented Feb 17, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-03-01T16:06:13.729+0000

  • Duration: 6 min 58 sec

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

return fmt.Sprintf("relative path is invalid, target doesn't exist or is bigger than %s", semantic.RelativePathFileMaxSize)
} else if description == "Does not match format '"+semantic.DataStreamNameFormat+"'" {
return "data stream doesn't exist"
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding the need of these adjustments would be another reason to look for another jsonschema library with better customization options.

@jsoriano
Copy link
Member Author

@mtojek ideas for testing this? I wouldn't like to commit files that exceed the limits 🙂

An idea could be to add a generator of packages for testing, and generate the packages that exceed the limits on test time.

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtojek ideas for testing this? I wouldn't like to commit files that exceed the limits 🙂
An idea could be to add a generator of packages for testing, and generate the packages that exceed the limits on test time.

My first thought would be mocking, so you don't have to generate big files even temporarily. You check if you can mimic the FS interface and return some big size().

Another alternative would be setting really low limits just for testing.


First round of the code review done.

I admit that I got scared with this PR... It was supposed to be an easy size check to implement, but we ended up with complex "spectyping". I'm wondering if we shouldn't rethink the idea or adjust the existing design.

What we can consider here is a "file limits" file, which compacts all global limitations we have for folders and their content, so that it's easy accessible and not hidden in the Go code. Keep in mind about the original (but dead today) portability idea :)

return err
}
if mime.FormatMediaType(mediatype, params) == "" {
// Bug in mime library? Happens when parsing something like "0;*0=0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any deeper knowledge of this. Did you find it by accident with unit tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsoriano
Copy link
Member Author

My first thought would be mocking, so you don't have to generate big files even temporarily. You check if you can mimic the FS interface and return some big size().

Good point, I will give it a try. Though we may need to adapt some calls that are done directly with os or ioutil.

@jsoriano
Copy link
Member Author

I will move b524c92 to a different PR.

"github.com/elastic/package-spec/code/go/internal/fspath"
)

const maxFieldsPerDataStream = 1024
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if moving this too to somewhere in the spec 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider every number/count/etc. as a constant value, so - in the spec :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved 9453025

@jsoriano jsoriano force-pushed the package-limits branch 2 times, most recently from a271592 to 8b6e763 Compare February 24, 2022 18:25
@jsoriano
Copy link
Member Author

Tests and changelog added, opening for review.

@mtojek
Copy link
Contributor

mtojek commented Mar 1, 2022

Tests and changelog added, opening for review.

Do you think you can mark it as ready for review or there is a potential blocker?

@jsoriano jsoriano marked this pull request as ready for review March 1, 2022 08:49
@jsoriano jsoriano requested a review from a team as a code owner March 1, 2022 08:49
@jsoriano
Copy link
Member Author

jsoriano commented Mar 1, 2022

Tests and changelog added, opening for review.

Do you think you can mark it as ready for review or there is a potential blocker?

Ah yes, I wanted to do that after my previous comment 🤦

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few minor comments to be addressed/responded to, but nothing serious. You did a pretty good job adapting the spec.

What makes me worried is the fact, that the implementation will become more tangled than it was before. I think we should open a new issue to research alternative spec parsers to slim down the current codebase.

return fmt.Errorf("invalid format for file size (%s): %w", text, err)
}

unit := match[2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a good idea to have less code here.

But there are a couple of small concerns I have with humanize:

  • It is strict with the interpretation of unit prefixes (Kilo means 1000, KiByte needs to be used for 1024), we would need to change from MB to MiB for example.
  • Related to previous point, given that the parser supports units with both prefixes, we may end up having a mix in the spec (250MB and 250MiB would be correctly parsed, but they don't mean the same).
  • It rounds-up when giving the string representation ("1024B" and "1025B" are both converted to "1.0 KiB"). We need to be able to convert back and forth because we parse yaml and convert it to json, and we would be losing precision. We could workaround this problem by marshalling always to the number of bytes, and using humanize to parse and for String(). Or we could ignore this lose of precission.

Wdyt, should I give a try to humanize?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to previous point, given that the parser supports units with both prefixes, we may end up having a mix in the spec (250MB and 250MiB would be correctly parsed, but they don't mean the same).

they would be parsed correctly, but would mean different amount of units: 25010241024 vs 25010001000), so I would say it isn't a bug.

It rounds-up when giving the string representation ("1024B" and "1025B" are both converted to "1.0 KiB"). We need to be able to convert back and forth because we parse yaml and convert it to json, and we would be losing precision. We could workaround this problem by marshalling always to the number of bytes, and using humanize to parse and for String(). Or we could ignore this lose of precission.

It rounds-up when giving the string representation ("1024B" and "1025B" are both converted to "1.0 KiB").

Definitely, we can't lose precision. I would give the go-humanize library a try only if there is an option to strictly convert values, without any rounding. If we can't achieve this, let's stick to what you implemented.

BTW I'm fine with improving it in follow-ups as what you have here works well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to previous point, given that the parser supports units with both prefixes, we may end up having a mix in the spec (250MB and 250MiB would be correctly parsed, but they don't mean the same).

they would be parsed correctly, but would mean different amount of units: 250_1024_1024 vs 250_1000_1000), so I would say it isn't a bug.

Well, 250MB are parsed as 250000000 bytes, while 250MiB are parsed as 262144000 bytes, a difference of 11MiB, or 12MB.

Let's revisit this in follow ups if needed.

FieldsPerDataStreamLimit int `yaml:"fieldsPerDataStreamLimit"`
}

func (l *commonSpecLimits) update(o commonSpecLimits) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'm hesitant to use reflection unless we can't achieve something any other way round. I'm wondering if we can go with a standard copy operation between fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but for this case I didn't find any other way around:

  • Preinitializing the structs with default values and a standard copy is not possible, because these objects are created when unmarshaling the objects. And these "default" values are obtained in the same unmarshal operation.
  • I could do some kind of unmarshaling in two passes, one to read the limits on the parent and preinitialize the children contents, and another one to unmarshal the children, bit this can be worse than using reflection
  • I could do the good old set field by field, but I consider this error prone, we may forget of adding a field here after adding it to the struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do some kind of unmarshaling in two passes, one to read the limits on the parent and preinitialize the children contents, and another one to unmarshal the children, bit this can be worse than using reflection

Yes, I used to follow this option when interacting with complex structures, ignoring the performance issues. Regarding reflection, I'm not sure if it still works when we add slices or pointers.

Let's stick to reflection, but maybe make sure if it's supported if we decide to build a WebAssembly out of the package-spec. It might be a blocker then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I used to follow this option when interacting with complex structures, ignoring the performance issues. Regarding reflection, I'm not sure if it still works when we add slices or pointers.

Yep, with more complex types this may not work, but this object is intended to store limits, that are most likely going to be numeric values.

Let's stick to reflection, but maybe make sure if it's supported if we decide to build a WebAssembly out of the package-spec. It might be a blocker then.

I wouldn't expect compatibility issues because of using reflection, this is also used by core packages such as encode/json.

(In any case, I have tried, and tests pass with WebAssembly - GOOS=js GOARCH=wasm go test -exec="$(go env GOROOT)/misc/wasm/go_js_wasm_exec" ./...)

@jsoriano
Copy link
Member Author

jsoriano commented Mar 1, 2022

What makes me worried is the fact, that the implementation will become more tangled than it was before. I think we should open a new issue to research alternative spec parsers to slim down the current codebase.

I guess that with any "validation" feature we add we are adding more code to the mix, not sure what we can do about this...

@mtojek mtojek self-requested a review March 2, 2022 09:21
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, any leftovers can be implemented as follow-ups.

I guess that with any "validation" feature we add we are adding more code to the mix, not sure what we can do about this...

What do you think about adding an issue to the backlog to write down all non-standard features we're using that are not present in the jsonschema library? It will help us understand what exactly are we looking for to make the spec clear.

@jsoriano jsoriano merged commit 6df0a1a into elastic:main Mar 2, 2022
@jsoriano
Copy link
Member Author

jsoriano commented Mar 2, 2022

What do you think about adding an issue to the backlog to write down all non-standard features we're using that are not present in the jsonschema library? It will help us understand what exactly are we looking for to make the spec clear.

Issue created for this #287.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants