Skip to content

Conversation

@Thejas-bhat
Copy link
Member

@Thejas-bhat Thejas-bhat commented Jul 15, 2022

The newly introduced stats:
num_bytes_read_at_query_time - computes the bytes read from disk while query
num_bytes_indexed_after_analysis track the bytes read - computes the bytes written to disk while indexing

are mainly used to track the disk utilisation for the current index. Currently, the num_bytes_indexed_after_analysis (now changing to num_bytes_written_at_index_time) considered only the total bytes of the tokens after the analysis of a field's content. However, a user can further store a field, enable doc values to be stored, enable location information of a term to be stored or even include the field's content in _all field. All these options incur additional cost in terms of disk utilisation and have to be considered in the stats. The PRs #119 and #125 and the current one aim to achieve these changes.

result.VisitFields(func(f index.Field) {
atomic.AddUint64(&s.stats.TotBytesIndexedAfterAnalysis,
analysisBytes(f.AnalyzedTokenFrequencies()))
if segment.CollectIOStats {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend against adding this flag in scorch_segment_api.
Instead make it a scorch config option and apply this only while -

  • Estimating usage within bleve
  • Reporting usage reported by zap again from bleve

Leave zap's computation on always.

@Thejas-bhat Thejas-bhat force-pushed the statsUnderFlag branch 3 times, most recently from d359c60 to ff04e75 Compare July 20, 2022 14:30
@abhinavdangeti abhinavdangeti added this to the v2.3.4 milestone Jul 20, 2022
@abhinavdangeti abhinavdangeti changed the title Making the IOStats computation optional Fix the IOStats computation Jul 22, 2022
@abhinavdangeti abhinavdangeti marked this pull request as ready for review July 22, 2022 17:32
@abhinavdangeti
Copy link
Member

Unit test TestBytesWritten is flaky in the results it expects -

[11:56:02] AD: ~/Documents/go/src/github.com/blevesearch/bleve $ go test -run=TestBytesWritten -race
PASS
ok  	github.com/blevesearch/bleve/v2	2.217s
[11:56:18] AD: ~/Documents/go/src/github.com/blevesearch/bleve $ go test -run=TestBytesWritten
--- FAIL: TestBytesWritten (0.49s)
    index_test.go:314: expected bytes written is 55471, got 92127
FAIL
exit status 1
FAIL	github.com/blevesearch/bleve/v2	0.762s
[11:56:24] AD: ~/Documents/go/src/github.com/blevesearch/bleve $ go test -run=TestBytesWritten
--- FAIL: TestBytesWritten (0.83s)
    index_test.go:338: expected bytes written is 62347, got 117818
FAIL
exit status 1
FAIL	github.com/blevesearch/bleve/v2	1.068s
[11:56:27] AD: ~/Documents/go/src/github.com/blevesearch/bleve $ go test -run=TestBytesWritten
--- FAIL: TestBytesWritten (1.08s)
    index_test.go:363: expected bytes written is 102602, got 164949
FAIL
exit status 1
FAIL	github.com/blevesearch/bleve/v2	1.322s
[11:56:31] AD: ~/Documents/go/src/github.com/blevesearch/bleve $ go test -run=TestBytesWritten
PASS
ok  	github.com/blevesearch/bleve/v2	1.326s

@Thejas-bhat
Copy link
Member Author

Turns out,TestBytesWritten behaviour was flaky mainly because of a memory leak issue in zapx which has been addressed in zapx PR #125


contentFieldMapping.IncludeInAll = true
tmpIndexPath2 := createTmpIndexPath(t)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up of this path is missing?
Also, should we need to defer it as we could try it right after the index close? (doesn't matter much)
It could also be written like a table-driven test since the variations across run are only a few values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

idx.Close()

return statValError
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert a new line at the end of a method.

t.Fatal(err)
}
cleanupTmpIndexPath(t, tmpIndexPath4)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants