Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce a new Object Storage WAL format. #13253

Merged
merged 25 commits into from
Jun 19, 2024

Conversation

cyriltovena
Copy link
Contributor

@cyriltovena cyriltovena commented Jun 18, 2024

What this PR does / why we need it:

This PR introduces a new file format for storing WAL segments directly on object storage. This is part of an experiment to replace the ring replication factor and delegate this responsibility to the object storage.

The format is documented at https://github.com/grafana/loki/blob/32b1d2c4acf3955801f7be9a0a43bef396e15507/pkg/storage/wal/README.md

I had some hesitationw with the format, particularly with tenants, may be storing an index per tenant is better, in case each tenant has big indexes. We will learn more once we run things.

see https://github.com/grafana/loki-private/issues/1001

Copy link
Contributor

@benclive benclive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me - the chunks and index readers/writers look correct and I've verified that the tests look sensible, though its definitely possible I missed something!

In general I prefer slightly longer variable names (e.g. idxWriter over idxw, buf over b) as it makes the code slightly easier to understand & review but not a major issue as I could still follow everything.

// NewWalSegmentWriter creates a new WalSegmentWriter.
func NewWalSegmentWriter() *SegmentWriter {
return &SegmentWriter{
streams: swiss.NewMap[streamID, *streamSegment](64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you choose a swiss map over something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know yet if this will make a difference but swissmap are reusable possibly we'll change to normal map.

}

// Labels are passed a string `{foo="bar",baz="qux"}` `{foo="foo",baz="foo"}`. labels.Labels => Symbols foo, baz , qux
func (b *SegmentWriter) Append(tenantID, labelsString string, lbls labels.Labels, entries []*logproto.Entry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be called directly from a Push method, or is the client still expecting to group entries under instances/streams and then pass them to Append only when creating the WAL object to be stored?
If it's going to be called from Push directly, it'll vastly simplify that API!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think entries are already grouped by stream in ingester requests so for each stream we'll push into one selected segment.

@cyriltovena cyriltovena merged commit 1d6f8d5 into grafana:main Jun 19, 2024
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants