Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

make index pruning configurable via index-rules.conf #924

Merged
merged 10 commits into from
Oct 24, 2018
Merged

Conversation

Dieterbe
Copy link
Contributor

@Dieterbe Dieterbe commented May 24, 2018

for #868

@Dieterbe Dieterbe added this to the 0.10.0 milestone Aug 17, 2018
@Dieterbe Dieterbe force-pushed the index-rules branch 7 times, most recently from a39085a to 07a49fb Compare August 23, 2018 14:14
@Dieterbe Dieterbe requested review from woodsaj and replay August 23, 2018 14:16
@woodsaj
Copy link
Member

woodsaj commented Aug 23, 2018

It seems like we should just add the max-stale setting to the existing storage-schemas.conf rather then having a whole new config file, like was done for the reorderBuffer.

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Aug 23, 2018

maybe. i've been going back and forth on that a bit myself. the benefit of separate file is that it easily allows to add more per-pattern index tunables in the future. though i don't know yet which those would be.
also, if the pruning rules are orthogonal to the storage schemas then having them in one file would mean we need a multiple of the rules.

eg schemas:

^a.* -> retain 1y
^b.* -> retain 2y
^c.* -> retain 3y

index rules:

\.containers$ -> max-stale 1d
\.biz$ -> max-stale 30d

becomes:

^a.*\.containers$ -> retain 1y, max-stale 1d
^a.*\.biz$ -> retain 1y, max-stale 30d
^b.*\.containers$ -> retain 2y, max-stale 1d
^b.*\.biz$ -> retain 2y, max-stale 30d
^c.*\.containers$ -> retain 3y, max-stale 1d
^c.*\.biz$ -> retain 3y, max-stale 30d

while for all the simple cases (and the majority of our deployments are simple) the extra file approach makes things a bit harder, having it all in 1 file makes the complication of more complicated cases a multiple. and that's a bit concerning in terms of operability.

the benefit of single file is we save some memory by not having to track the separate IrId.

I don't see a strong reason to go either way though.

@woodsaj
Copy link
Member

woodsaj commented Aug 23, 2018

ok, lets stick with the separate file for now.

}

type IndexCheck struct {
Keep bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the Keep field. Cant we just set Cutoff to 0 if maxStale==0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

# * Valid units are s/sec/secs/second/seconds, m/min/mins/minute/minutes, h/hour/hours, d/day/days, w/week/weeks, mon/month/months, y/year/years

[default]
pattern =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this have a pattern set? eg .*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed. empty string is a valid regex that matches everything. (the first test in conf/indexrules_test.go checks this)

staleTs := time.Now().Add(maxStale * -1)
_, err := c.Prune(staleTs)
for now := range ticker.C {
log.Debug("cassandra-idx: pruning items")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have an info level log message for when a prune starts, and an info message when the prune completes. The message on completion should log how long the prune took and number of items pruned.

Copy link
Contributor Author

@Dieterbe Dieterbe Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have such info logs for the memory index and the cassandra index doesn't really do any work beyond that, but I think it's still useful. in fact i'm making the logs from both more consistent so that it'll look obvious in the logs, both memory and cassandra index logging their own steps. in the future the cassandra-idx may do its own extra steps again.

@@ -1251,9 +1269,13 @@ func (m *MemoryIdx) Prune(oldest time.Time) ([]idx.Archive, error) {
pre := time.Now()

m.RLock()

// getting all checks once saves having to recompute the cutoff everytime we have a match
indexChecks := IndexRules.Checks(now)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this to before we acquire the RLock(). We want to keep lock times as small as possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

# * Valid units are s/sec/secs/second/seconds, m/min/mins/minute/minutes, h/hour/hours, d/day/days, w/week/weeks, mon/month/months, y/year/years

[default]
pattern =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this have a match-all pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty string is a valid regex that matches everything. (the first test in conf/indexrules_test.go checks this)

@Dieterbe
Copy link
Contributor Author

PTAL

},
}
for i, c := range cases {
err := ioutil.WriteFile("/tmp/indexrules-test-readindexrules", []byte(c.in), 0644)
Copy link
Member

@woodsaj woodsaj Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather then a hard coded filename, you should use ioutil.TempFile()
https://golang.org/pkg/io/ioutil/#TempFile

You also need to make sure you delete the file when done.

note that CassandraIdx.load() returns results out of order.
No idea why our tests didn't fail before this but now we correctly
work around this.
@Dieterbe
Copy link
Contributor Author

rebased on master

@Dieterbe Dieterbe merged commit 82c333a into master Oct 24, 2018
@Dieterbe Dieterbe changed the title Index rules make index pruning configurable via index-rules.conf Oct 24, 2018
@Dieterbe Dieterbe deleted the index-rules branch October 29, 2018 09:06
@Dieterbe Dieterbe modified the milestones: 1.0, 0.11.0 Dec 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants