Conversation
|
Overall structure looks good and follows the CSV/Parquet patterns well.
|
|
@costin I addressed your comments. Schema inference now handles ISO-formatted dates and distinguishes I ended up creating a separate |
bpintea
left a comment
There was a problem hiding this comment.
Looking good, left a few questions.
| private final NdJsonPageDecoder pageDecoder; | ||
| private boolean endOfFile = false; | ||
| private Page nextPage; | ||
| private InputStream inputStream; |
There was a problem hiding this comment.
Removed it, pageDecoder now owns it.
|
|
||
| @Override | ||
| public void close() { | ||
| for (var blockBuilder : blockBuilders) { |
There was a problem hiding this comment.
You may use Releasables#close for this.
|
|
||
| Block[] blocks = new Block[blockBuilders.length]; | ||
| for (int i = 0; i < blockBuilders.length; i++) { | ||
| blocks[i] = blockBuilders[i].build(); |
There was a problem hiding this comment.
Should the building and the closing be decoupled? Maybe have the closing done in a final clause and with the close() method? The caller of this method doesn't call close() on exception either. I see the caller rewraps the IOException, but .build() can raise a CircuitBreakingException.
There was a problem hiding this comment.
Good catch. I've refactored this method in 6d22cab so that block builders are in a local variable and are always closed. We don't need to keep them longer than that.
Also adjusted the code so that created blocks are closed even if an error happens when building one. And also moved ownership of the InputStream to this class, as we don't need it higher-level classes.
| // Builders setup independently as we need to create new ones for each page. | ||
| void setupBuilders() { | ||
| if (attribute != null) { | ||
| blockBuilder = switch (attribute.dataType()) { |
There was a problem hiding this comment.
Can we not use ConstantNullBlock.Builder?
There was a problem hiding this comment.
Done. That class wasn't public, which is why I didn't see it. I changed it to public as there's no reason to keep it private.
| } | ||
| } | ||
|
|
||
| private void decodeValue(JsonParser parser, int position) throws IOException { |
| // Unknown field, skip it | ||
| parser.skipChildren(); | ||
| } else { | ||
| childDecoder.decodeValue(parser, position); |
| } | ||
|
|
||
| // --------------------------------------------------------------------------------------------- | ||
| // A tree of decoders. Avoids path reconstruction when traversing nested objects. |
There was a problem hiding this comment.
Can the ones in the c'tor arguments be final?
There was a problem hiding this comment.
(I left the review notes in IntelliJ and I think you got to push an update on the file after my checkout; so all comments below apply to code 4 lines below.)
There was a problem hiding this comment.
There's no constructor, all fields are set while traversing the list of Attribute 😅
|
|
||
| int c; | ||
| while ((c = input.read()) != -1) { | ||
| if (c == '\n' || c == '\r') { |
There was a problem hiding this comment.
We should add tests with \n\r or \r\n trailing sequences.
| List<Attribute> attributes = new ArrayList<>(); | ||
| buildSchema(root, null, attributes); | ||
| return attributes; | ||
| } |
There was a problem hiding this comment.
Removed, a leftover from previous iterations.
|
|
||
| /** | ||
| * Infers schema from an NDJSON input stream, reading up to maxLines. | ||
| */ |
There was a problem hiding this comment.
Is the name member ever used in FieldInfo?
There was a problem hiding this comment.
Indeed it's not. The key in the parent's children map is what is used.
|
|
||
| Block[] blocks = new Block[blockBuilders.length]; | ||
| for (int i = 0; i < blockBuilders.length; i++) { | ||
| blocks[i] = blockBuilders[i].build(); |
There was a problem hiding this comment.
Good catch. I've refactored this method in 6d22cab so that block builders are in a local variable and are always closed. We don't need to keep them longer than that.
Also adjusted the code so that created blocks are closed even if an error happens when building one. And also moved ownership of the InputStream to this class, as we don't need it higher-level classes.
|
|
||
| @Override | ||
| public void close() { | ||
| for (var blockBuilder : blockBuilders) { |
| } | ||
|
|
||
| // --------------------------------------------------------------------------------------------- | ||
| // A tree of decoders. Avoids path reconstruction when traversing nested objects. |
There was a problem hiding this comment.
There's no constructor, all fields are set while traversing the list of Attribute 😅
| // Builders setup independently as we need to create new ones for each page. | ||
| void setupBuilders() { | ||
| if (attribute != null) { | ||
| blockBuilder = switch (attribute.dataType()) { |
There was a problem hiding this comment.
Done. That class wasn't public, which is why I didn't see it. I changed it to public as there's no reason to keep it private.
| private final NdJsonPageDecoder pageDecoder; | ||
| private boolean endOfFile = false; | ||
| private Page nextPage; | ||
| private InputStream inputStream; |
There was a problem hiding this comment.
Removed it, pageDecoder now owns it.
|
|
||
| /** | ||
| * Infers schema from an NDJSON input stream, reading up to maxLines. | ||
| */ |
There was a problem hiding this comment.
Indeed it's not. The key in the parent's children map is what is used.
| List<Attribute> attributes = new ArrayList<>(); | ||
| buildSchema(root, null, attributes); | ||
| return attributes; | ||
| } |
There was a problem hiding this comment.
Removed, a leftover from previous iterations.
|
@romseygeek looking at it. Thanks for the heads up! |
Adds an NDJSON ESQL datasource. The schema is inferred from the first 100 lines. Supported types are INTEGER, DOUBLE, LONG, BOOLEAN, DATETIME (strings in ISO format) and KEYWORD. Multi-values are supported, nested objects are flattened by joining property names with a '.' In case of parsing failure, the stream is consumed until the next end of line to recover.
This reverts commit 605c05b.
Add a NDJSON datasource, building on the abstractions added in PR #141678