-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9959: Rename TermVectorsReader to TermVectorsReaderBase #205
LUCENE-9959: Rename TermVectorsReader to TermVectorsReaderBase #205
Conversation
…paration for introducing TermVectors and new TermVectorsReader abstractions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also okay with introducing the new/simpler TermVectorsReader abstraction if it would reduce this PR diff a lot. Up to you.
@@ -57,7 +57,7 @@ | |||
final NormsProducer normsProducer; | |||
|
|||
final StoredFieldsReader fieldsReaderOrig; | |||
final TermVectorsReader termVectorsReaderOrig; | |||
final TermVectorsReaderBase termVectorsReaderOrig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the subsequent PR that re-introduces "TermVectorsReader", I assume you will change some of the references of TermVectorsReaderBase back to TermVectorsReader, such as right here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the subsequent PR that re-introduces "TermVectorsReader", I assume you will change some of the references of TermVectorsReaderBase back to TermVectorsReader
Yes I'm planning to do that!
such as right here?
For this particular one though, later in the code its close
method is called via:
lucene/lucene/core/src/java/org/apache/lucene/index/SegmentCoreReaders.java
Lines 185 to 192 in 167bd99
void decRef() throws IOException { | |
if (ref.decrementAndGet() == 0) { | |
try (Closeable finalizer = this::notifyCoreClosedListeners) { | |
IOUtils.close( | |
termVectorsLocal, | |
fieldsReaderLocal, | |
fields, | |
termVectorsReaderOrig, |
So it may still need to use TermVectorsReaderBase
there, as that class still implements Closeable
per #180 (comment) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps then TermVectorsReader should also implement Closeable? (to be done in another PR)
I called out the line above because it is not harmonious with the other "*Reader" suffixed classes declared next to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this indeed looks inconsistent.
I think the main thing here is, the overall understanding I got from #180 (comment) & #180 (comment) (this comment specifically asked not to move up Closeable
from the original TermVectorsReader
) is that, the new TermVectors / TermVectorsReader
abstraction is supposed to be an index API, while the original TermVectorsReader
/ now TermVectorsReaderBase
is a codec API, which has clone/getMergeInstance/checkIntegrity
methods, and also implements Closeable
(and all other *Reader/Producer
codec classes also implement Closeable
as well). If we were to move Closeable
from TermVectorsReaderBase
to the new TermVectorsReader
, wouldn't that break the separation there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh; I'm not sure what to do then. The non-harmoneous aspect is a code smell but if it's very limited (not widespread) then it's not a big deal I guess. Otherwise it's a stronger indicator there is misalignment. I'm not really sure why there needs to be both TermVectorsReader & TermVectorsReaderBase any way but surely you & Adrien do so I leave that to your judgement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe another idea is leave TermVectorsReader be. In the other PR you have, rename TermVectors to TermVectorsSupplier? Just a thought for your consideration; I have no strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure why there needs to be both TermVectorsReader & TermVectorsReaderBase any way but surely you & Adrien do so I leave that to your judgement.
The main reason that we are adding a new index level API for TermVectors (TermVectors
in the other PR) is that we would like to provide a new way of TV access from IndexReader
like the following
TermVectors termVectors = reader.getTermVectorsReader(); // would clone the underlying TermVectorsReader to be
thread-safe, and GC-able once usage is finished
for (int docId = 0; docId < numDocs; docId++) {
Fields vectors = termVectors.get(docId);
}
to replace the old way of access in the form of
for (int docId = 0; docId < numDocs; docId++) {
Fields vectors = reader.getTermVectors(docId); // using thread-local to cache
}
as per #137 (comment) . From my understanding, it's a bit like separating out the existing index level API IndexReader#getTermVectors
into a new dedicated class to handle TV (and we do plan to deprecate the existing API once the new one is ready), and hence some of the internal index / codec classes organization details are now exposed via the indirection and inheritance relationships, and we need a bridge between them (by having codec class TermVectrosReaderBase
to inherit the new index class TermVectorsReader
).
Not sure if @jpountz or @rmuir have any further suggestion on this? If there's no strong objection to the current approach in this PR and the other one #180, we can potentially use https://issues.apache.org/jira/browse/LUCENE-10018 to further enhance on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dsmiley for the review and approval! I've thought about having the new |
-1 to these renames that solve no api problems. |
I prefer the current API that's been merged as part of #180 than the proposed API where Fields, TermVectors and TermVectorsReader get renamed to TermVectors, TermVectorsReader and TermVectorsReaderBase. Removing usage of |
FWIW, what I care about most in this is that Fields.java is relegated to an internal only thing associated with the terms index, for those that build PostingsFormats 'n such. I find it confusing to have this Fields class used in different ways across the codebase -- the PostingsFormat related Fields, and the term vector Fields. I care less about what we name some of the things, or wether a new public class is needed somewhere. |
Description
Rename
TermVectorsReader
toTermVectorsReaderBase
in preparation for introducingTermVectors
and newTermVectorsReader
abstractions. Please see #180 (comment) for details.Please note that relevant variable names were currently not renamed, in order to keep the changes minimal and reduce the need to undo the renaming once
TermVectorsReader
is introduced back.Tests
Passed existing tests.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.