Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix open bytes index #1

Closed
wants to merge 320 commits into from
Closed

Conversation

PSeitz
Copy link
Owner

@PSeitz PSeitz commented Feb 18, 2022

PSeitz and others added 30 commits May 31, 2021 23:15
Add a boolean flag in the Query::query_terms informing on whether
position information is required.

Closes #1070
replace test with smaller test in doc_store
Fix panic in store reader raw document iterator during segment merge
support multiple codes
prepend codec id to all fast fields
add new api to create fastfields with access to all data
use new fastfield creation api in initial creation and merge
remove unused collect of data in doc_id_mapping
move common to common crate
create fastfield_codecs crate
add bitpacker to fast field codecs
add linear interpolation to fast field codecs
add tests
change api to fastfield reader in codec crate
add fastfield metadata to footer
remove old code
merge codec files
add estimation tests
add codec test data in tests
add CodecReader as common interface in fastfield codec crate
add LinearInterpolation to DynamicFastFieldReader
calc estimation and choose best codec
cleanup
test tests::bench_fastfield_bitpack_create        ... bench:      57,628 ns/iter (+/- 23,486)
test tests::bench_fastfield_bitpack_get           ... bench:      43,323 ns/iter (+/- 4,286)
test tests::bench_fastfield_linearinterpol_create ... bench:     223,625 ns/iter (+/- 33,563)
test tests::bench_fastfield_linearinterpol_get    ... bench:      82,839 ns/iter (+/- 9,575)
fulmicoton and others added 25 commits December 10, 2021 18:45
* Add a NORMED options on field

Make fieldnorm indexation optional:

* for all types except text => added a NORMED options
* for text field
** if STRING, field has not fieldnorm retained
** if TEXT, field has fieldnorm computed

* Finalize making fieldnorm optional for all field types.

- Using Option for fieldnorm readers.
Add a knob to LogMergePolicy to always merge segments that exceed a threshold of deleted docs

Closes #115
* doc(termdict) expose structs
also add merger doc + lint
refs #1232
* Change Snippet.fragments -> Snippet.fragment
* Apply suggestions from code review

Co-authored-by: Liam Warfield <[email protected]>
The test-env-log crate has been renamed to test-log to better reflect
its intent of not only catering to env_logger specific initialization
but also tracing (and potentially others in the future).
This change updates the crate to use test-log instead of the now
deprecated test-env-log.
* doc(collector)

* doc(directory)

* doc(misc)

* wording
* Term are now typed.

This change is backward compatible:
While the Term has a byte representation that is modified, a Term itself
is a transient object that is not serialized as is in the index.

Its .field() and .value_bytes() on the other hand are unchanged.
This change offers better Debug information for terms.

While not necessary it also will help in the support for JSON types.

* Renamed Hierarchical Facet -> Facet
Adds an API to register Warmers in the IndexReader.

Co-authored-by: Paul Masurel <[email protected]>
Updates the requirements on fastdivide to permit the latest version.

---
updated-dependencies:
- dependency-name: fastdivide
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add Vaporetto tokenizer to README

* Update README.md
@codecov-commenter
Copy link

codecov-commenter commented Feb 18, 2022

Codecov Report

❗ No coverage uploaded for pull request base (main@fef428a). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main       #1   +/-   ##
=======================================
  Coverage        ?   94.22%           
=======================================
  Files           ?      209           
  Lines           ?    35301           
  Branches        ?        0           
=======================================
  Hits            ?    33264           
  Misses          ?     2037           
  Partials        ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fef428a...9c12860. Read the comment docs.

@PSeitz PSeitz closed this Feb 23, 2022
PSeitz added a commit that referenced this pull request Oct 27, 2023
* fix windows build (#1)

* Fix windows build

* Add doc traits

* Add field value iter

* Add value and serialization

* Adjust order

* Fix bug

* Correct type

* Fix generic bugs

* Reformat code

* Add generic to index writer which I forgot about

* Fix missing generics on single segment writer

* Add missing type export

* Add default methods for convenience

* Cleanup

* Fix more-like-this query to use standard types

* Update API and fix tests

* Add doc traits

* Add field value iter

* Add value and serialization

* Adjust order

* Fix bug

* Correct type

* Rebase main and fix conflicts

* Reformat code

* Merge upstream

* Fix missing generics on single segment writer

* Add missing type export

* Add default methods for convenience

* Cleanup

* Fix more-like-this query to use standard types

* Update API and fix tests

* Add tokenizer improvements from previous commits

* Add tokenizer improvements from previous commits

* Reformat

* Fix unit tests

* Fix unit tests

* Use enum in changes

* Stage changes

* Add new deserializer logic

* Add serializer integration

* Add document deserializer

* Implement new (de)serialization api for existing types

* Fix bugs and type errors

* Add helper implementations

* Fix errors

* Reformat code

* Add unit tests and some code organisation for serialization

* Add unit tests to deserializer

* Add some small docs

* Add support for deserializing serde values

* Reformat

* Fix typo

* Fix typo

* Change repr of facet

* Remove unused trait methods

* Add child value type

* Resolve comments

* Fix build

* Fix more build errors

* Fix more build errors

* Fix the tests I missed

* Fix examples

* fix numerical order, serialize PreTok Str

* fix coverage

* rename Document to TantivyDocument, rename DocumentAccess to Document

add Binary prefix to binary de/serialization

* fix coverage

---------

Co-authored-by: Pascal Seitz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment