Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in support for setting delim when parsing JSON through java #16867

Merged
merged 3 commits into from
Sep 23, 2024

Conversation

revans2
Copy link
Contributor

@revans2 revans2 commented Sep 20, 2024

Description

This just adds in JNI APIs to allow for changing the delimiter when parsing JSON.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@revans2 revans2 requested a review from a team as a code owner September 20, 2024 20:55
@github-actions github-actions bot added the Java Affects Java cuDF API. label Sep 20, 2024
@revans2 revans2 added 3 - Ready for Review Ready for review by team Spark Functionality that helps Spark RAPIDS improvement Improvement / enhancement to an existing function non-breaking Non-breaking change cuDF (Java) labels Sep 20, 2024
@revans2 revans2 self-assigned this Sep 20, 2024
@@ -38,6 +38,7 @@ public final class JSONOptions extends ColumnFilterOptions {
private final boolean allowLeadingZeros;
private final boolean allowNonNumericNumbers;
private final boolean allowUnquotedControlChars;
private final byte delim;
Copy link
Contributor

@ttnghia ttnghia Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the full name delimiter.

Suggested change
private final byte delim;
private final byte delimiter;

Comment on lines 56 to 60
delim = builder.delim;
}

public byte getDelim() {
return delim;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
delim = builder.delim;
}
public byte getDelim() {
return delim;
delim = builder.delim;
}
public byte getDelimiter() {
return delimiter;

Comment on lines 132 to 140
private byte delim = '\n';

public Builder withDelim(char delimiter) {
if (delimiter > Byte.MAX_VALUE) {
throw new IllegalArgumentException("Only basic ASCII values are supported " + delimiter);
}
delim = (byte)delimiter;
return this;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private byte delim = '\n';
public Builder withDelim(char delimiter) {
if (delimiter > Byte.MAX_VALUE) {
throw new IllegalArgumentException("Only basic ASCII values are supported " + delimiter);
}
delim = (byte)delimiter;
return this;
}
private byte delimiter = '\n';
public Builder withDelimiter(char delimiter) {
if (delimiter > Byte.MAX_VALUE) {
throw new IllegalArgumentException("Only basic ASCII values are supported " + delimiter);
}
this.delimiter = (byte)delimiter;
return this;
}

@@ -258,7 +258,8 @@ private static native long readJSON(int[] numChildren, String[] columnNames,
boolean strictValidation,
boolean allowLeadingZeros,
boolean allowNonNumericNumbers,
boolean allowUnquotedControl) throws CudfException;
boolean allowUnquotedControl,
byte delim) throws CudfException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
byte delim) throws CudfException;
byte delimiter) throws CudfException;

@@ -272,6 +273,7 @@ private static native long readJSONFromDataSource(int[] numChildren, String[] co
boolean allowLeadingZeros,
boolean allowNonNumericNumbers,
boolean allowUnquotedControl,
byte delim,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
byte delim,
byte delimiter,

@@ -284,6 +286,7 @@ private static native long readAndInferJSONFromDataSource(boolean dayFirst, bool
boolean allowLeadingZeros,
boolean allowNonNumericNumbers,
boolean allowUnquotedControl,
byte delim,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
byte delim,
byte delimiter,

@@ -297,7 +300,8 @@ private static native long readAndInferJSON(long address, long length,
boolean strictValidation,
boolean allowLeadingZeros,
boolean allowNonNumericNumbers,
boolean allowUnquotedControl) throws CudfException;
boolean allowUnquotedControl,
byte delim) throws CudfException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
byte delim) throws CudfException;
byte delimiter) throws CudfException;

@@ -1321,7 +1325,8 @@ public static Table readJSON(Schema schema, JSONOptions opts, File path) {
opts.strictValidation(),
opts.leadingZerosAllowed(),
opts.nonNumericNumbersAllowed(),
opts.unquotedControlChars()))) {
opts.unquotedControlChars(),
opts.getDelim()))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
opts.getDelim()))) {
opts.getDelimiter()))) {

@@ -1404,7 +1409,8 @@ public static TableWithMeta readJSON(JSONOptions opts, HostMemoryBuffer buffer,
opts.strictValidation(),
opts.leadingZerosAllowed(),
opts.nonNumericNumbersAllowed(),
opts.unquotedControlChars()));
opts.unquotedControlChars(),
opts.getDelim()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
opts.getDelim()));
opts.getDelimiter()));

@@ -1426,6 +1432,7 @@ public static TableWithMeta readAndInferJSON(JSONOptions opts, DataSource ds) {
opts.leadingZerosAllowed(),
opts.nonNumericNumbersAllowed(),
opts.unquotedControlChars(),
opts.getDelim(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
opts.getDelim(),
opts.getDelimiter(),

@@ -1479,7 +1486,8 @@ public static Table readJSON(Schema schema, JSONOptions opts, HostMemoryBuffer b
opts.strictValidation(),
opts.leadingZerosAllowed(),
opts.nonNumericNumbersAllowed(),
opts.unquotedControlChars()))) {
opts.unquotedControlChars(),
opts.getDelim()))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
opts.getDelim()))) {
opts.getDelimiter()))) {

@@ -1518,6 +1526,7 @@ public static Table readJSON(Schema schema, JSONOptions opts, DataSource ds, int
opts.leadingZerosAllowed(),
opts.nonNumericNumbersAllowed(),
opts.unquotedControlChars(),
opts.getDelim(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
opts.getDelim(),
opts.getDelimiter(),

Schema schema = Schema.builder().addColumn(DType.STRING, "a").build();
JSONOptions opts = JSONOptions.builder()
.withLines(true)
.withDelim('\0')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.withDelim('\0')
.withDelimiter('\0')

@revans2
Copy link
Contributor Author

revans2 commented Sep 23, 2024

/merge

@rapids-bot rapids-bot bot merged commit 8c975fe into rapidsai:branch-24.12 Sep 23, 2024
81 checks passed
revans2 added a commit to revans2/cudf that referenced this pull request Sep 23, 2024
…dsai#16867)

This just adds in JNI APIs to allow for changing the delimiter when parsing JSON.

Authors:
  - Robert (Bobby) Evans (https://github.com/revans2)

Approvers:
  - Alessandro Bellina (https://github.com/abellina)
  - Nghia Truong (https://github.com/ttnghia)

URL: rapidsai#16867
rapids-bot bot pushed a commit that referenced this pull request Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants