Skip to content
32 changes: 24 additions & 8 deletions python/pyspark/sql/readwriter.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,10 +191,13 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
:param mode: allows a mode for dealing with corrupt records during parsing. If None is
set, it uses the default value, ``PERMISSIVE``.

* ``PERMISSIVE`` : sets other fields to ``null`` when it meets a corrupted \
record and puts the malformed string into a new field configured by \
``columnNameOfCorruptRecord``. When a schema is set by user, it sets \
``null`` for extra fields.
* ``PERMISSIVE`` : sets other fields to ``null`` when it meets a corrupted \
record, and puts the malformed string into a field configured by \
``columnNameOfCorruptRecord``. To keep corrupt records, an user can set \
a string type field named ``columnNameOfCorruptRecord`` in an user-defined \
schema. If a schema does not have the field, it drops corrupt records during \
parsing. When inferring a schema, it implicitly adds a \
``columnNameOfCorruptRecord`` field in an output schema.
Copy link
Member

@gatorsmile gatorsmile Jun 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu For JSON, we implicitly add the columnNameOfCorruptRecord field during schema inference, when the mode is PERMISSIVE . What is the reason we are not doing the same thing for CSV schema inference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry for interrupting) yea, it should be consistent and we probably should change. Probably, we should also consider the records with tokens less or more than the schema as malformed records in PERMISSIVE mode rafher than filling some of it. @cloud-fan raised this issue before and I had a talk with some data analysists. It looked some agree and others do not. So, I just decided to not change the current behaviour for now.

To cut it short, the reason (I assume) is I could not imagine a simple common case that fails to parse CSV (not during conversion) for the current implementation. If there are, we should match the behaviour.

I am currently outside and this is my phone. I will double check this when I get to my computer but this will be correct if I haven't missed some changes in this code path.

Copy link
Member

@gatorsmile gatorsmile Jun 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CSV, the records with tokens less or more than the schema are already viewed as malformed records in (at least) 2.2. I did not check the previous versions.

I think we need to implicitly add the column columnNameOfCorruptRecord during the schema inference too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has more than one issue here. The default of columnNameOfCorruptRecord does not respect the session conf spark.sql.columnNameOfCorruptRecord

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will submit a PR soon for fixing both issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, users have to manually add the column columnNameOfCorruptRecord for seeing these malformed records.

Copy link
Member

@HyukjinKwon HyukjinKwon Jun 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile, I just got to my laptop.

I checked when the length of tokens are more than the schema it fills the malformed column. with the data below:

a,a

(BTW, it looks respecting spark.sql.columnNameOfCorruptRecord ?)

scala> spark.read.schema("a string, _corrupt_record string").csv("test.csv").show()
+---+---------------+
|  a|_corrupt_record|
+---+---------------+
|  a|            a,a|
+---+---------------+
scala> spark.conf.set("spark.sql.columnNameOfCorruptRecord", "abc")

scala> spark.read.schema("a string, abc string").csv("test.csv").show()
+---+---+
|  a|abc|
+---+---+
|  a|a,a|
+---+---+

And, I found another bug (when the length is less then the schema):

with data

a
a
a
a
a
scala> spark.read.schema("a string, b string, _corrupt_record string").csv("test.csv").show()

prints ...

17/06/05 09:45:26 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 5)
java.lang.NullPointerException
	at scala.collection.immutable.StringLike$class.stripLineEnd(StringLike.scala:89)
	at scala.collection.immutable.StringOps.stripLineEnd(StringOps.scala:29)
	at org.apache.spark.sql.execution.datasources.csv.UnivocityParser.org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$getCurrentInput(UnivocityParser.scala:56)
	at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert$1.apply(UnivocityParser.scala:211)
	at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert$1.apply(UnivocityParser.scala:211)
	at org.apache.spark.sql.execution.datasources.FailureSafeParser$$anonfun$2.apply(FailureSafeParser.scala:50)
	at org.apache.spark.sql.execution.datasources.FailureSafeParser$$anonfun$2.apply(FailureSafeParser.scala:43)
	at org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:64)
	at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$parseIterator$1.apply(UnivocityParser.scala:312)
	at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$parseIterator$1.apply(UnivocityParser.scala:312)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:236)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:230)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

It looks getCurrentInput produces null as the input is all parsed.

Another thing I would like to leave is (just to note the difference for all of us to not forget), JSON produces null in the columns and put the contents in the malformed column:
With the input:

{"a": 1, "b": "a"}
scala> spark.read.json("test.json").show()
+---+---+
|  a|  b|
+---+---+
|  1|  a|
+---+---+
scala> spark.read.schema("a string, b int, _corrupt_record string").json("test.json").show()
+----+----+------------------+
|   a|   b|   _corrupt_record|
+----+----+------------------+
|null|null|{"a": 1, "b": "a"}|
+----+----+------------------+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. I was writing the comments before seeing your comments ... Yes, I agree with your comments.

Copy link
Member

@HyukjinKwon HyukjinKwon Jun 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me give a shot to fix the bug I found above (NullPointerException). I think this can be easily fixed (but I am pretty sure the behaviour could be arguable). I will open a PR and cc you to show what it looks like.

Copy link
Member Author

@maropu maropu Jun 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my late response. yea, I also think these behaviour should be the same. But, I tried though in this pr though, I couldn't because (both you already noticed this...) we couldn't easily add a new column in the CSV code path. So, I think we probably need some refactoring DataSource to make this behaviour consistent.

* ``DROPMALFORMED`` : ignores the whole corrupted records.
* ``FAILFAST`` : throws an exception when it meets corrupted records.

Expand Down Expand Up @@ -304,7 +307,8 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
comment=None, header=None, inferSchema=None, ignoreLeadingWhiteSpace=None,
ignoreTrailingWhiteSpace=None, nullValue=None, nanValue=None, positiveInf=None,
negativeInf=None, dateFormat=None, timestampFormat=None, maxColumns=None,
maxCharsPerColumn=None, maxMalformedLogPerPartition=None, mode=None, timeZone=None):
maxCharsPerColumn=None, maxMalformedLogPerPartition=None, mode=None, timeZone=None,
columnNameOfCorruptRecord=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, it seems we should add this in streaming.py and DataStreamReader too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I'll check soon

"""Loads a CSV file and returns the result as a :class:`DataFrame`.

This function will go through the input once to determine the input schema if
Expand Down Expand Up @@ -366,11 +370,22 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
:param timeZone: sets the string that indicates a timezone to be used to parse timestamps.
If None is set, it uses the default value, session local timezone.

* ``PERMISSIVE`` : sets other fields to ``null`` when it meets a corrupted record.
When a schema is set by user, it sets ``null`` for extra fields.
* ``PERMISSIVE`` : sets other fields to ``null`` when it meets a corrupted \
record, and puts the malformed string into a field configured by \
``columnNameOfCorruptRecord``. To keep corrupt records, an user can set \
a string type field named ``columnNameOfCorruptRecord`` in an \
user-defined schema. If a schema does not have the field, it drops corrupt \
records during parsing. When a length of parsed CSV tokens is shorter than \
an expected length of a schema, it sets `null` for extra fields.
* ``DROPMALFORMED`` : ignores the whole corrupted records.
* ``FAILFAST`` : throws an exception when it meets corrupted records.

:param columnNameOfCorruptRecord: allows renaming the new field having malformed string
created by ``PERMISSIVE`` mode. This overrides
``spark.sql.columnNameOfCorruptRecord``. If None is set,
it uses the value specified in
``spark.sql.columnNameOfCorruptRecord``.

>>> df = spark.read.csv('python/test_support/sql/ages.csv')
>>> df.dtypes
[('_c0', 'string'), ('_c1', 'string')]
Expand All @@ -382,7 +397,8 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
nanValue=nanValue, positiveInf=positiveInf, negativeInf=negativeInf,
dateFormat=dateFormat, timestampFormat=timestampFormat, maxColumns=maxColumns,
maxCharsPerColumn=maxCharsPerColumn,
maxMalformedLogPerPartition=maxMalformedLogPerPartition, mode=mode, timeZone=timeZone)
maxMalformedLogPerPartition=maxMalformedLogPerPartition, mode=mode, timeZone=timeZone,
columnNameOfCorruptRecord=columnNameOfCorruptRecord)
if isinstance(path, basestring):
path = [path]
return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
Expand Down
32 changes: 24 additions & 8 deletions python/pyspark/sql/streaming.py
Original file line number Diff line number Diff line change
Expand Up @@ -463,10 +463,13 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
:param mode: allows a mode for dealing with corrupt records during parsing. If None is
set, it uses the default value, ``PERMISSIVE``.

* ``PERMISSIVE`` : sets other fields to ``null`` when it meets a corrupted \
record and puts the malformed string into a new field configured by \
``columnNameOfCorruptRecord``. When a schema is set by user, it sets \
``null`` for extra fields.
* ``PERMISSIVE`` : sets other fields to ``null`` when it meets a corrupted \
record, and puts the malformed string into a field configured by \
``columnNameOfCorruptRecord``. To keep corrupt records, an user can set \
a string type field named ``columnNameOfCorruptRecord`` in an user-defined \
schema. If a schema does not have the field, it drops corrupt records during \
parsing. When inferring a schema, it implicitly adds a \
``columnNameOfCorruptRecord`` field in an output schema.
* ``DROPMALFORMED`` : ignores the whole corrupted records.
* ``FAILFAST`` : throws an exception when it meets corrupted records.

Expand Down Expand Up @@ -558,7 +561,8 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
comment=None, header=None, inferSchema=None, ignoreLeadingWhiteSpace=None,
ignoreTrailingWhiteSpace=None, nullValue=None, nanValue=None, positiveInf=None,
negativeInf=None, dateFormat=None, timestampFormat=None, maxColumns=None,
maxCharsPerColumn=None, maxMalformedLogPerPartition=None, mode=None, timeZone=None):
maxCharsPerColumn=None, maxMalformedLogPerPartition=None, mode=None, timeZone=None,
columnNameOfCorruptRecord=None):
"""Loads a CSV file stream and returns the result as a :class:`DataFrame`.

This function will go through the input once to determine the input schema if
Expand Down Expand Up @@ -618,11 +622,22 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
:param timeZone: sets the string that indicates a timezone to be used to parse timestamps.
If None is set, it uses the default value, session local timezone.

* ``PERMISSIVE`` : sets other fields to ``null`` when it meets a corrupted record.
When a schema is set by user, it sets ``null`` for extra fields.
* ``PERMISSIVE`` : sets other fields to ``null`` when it meets a corrupted \
record, and puts the malformed string into a field configured by \
``columnNameOfCorruptRecord``. To keep corrupt records, an user can set \
a string type field named ``columnNameOfCorruptRecord`` in an \
user-defined schema. If a schema does not have the field, it drops corrupt \
records during parsing. When a length of parsed CSV tokens is shorter than \
an expected length of a schema, it sets `null` for extra fields.
* ``DROPMALFORMED`` : ignores the whole corrupted records.
* ``FAILFAST`` : throws an exception when it meets corrupted records.

:param columnNameOfCorruptRecord: allows renaming the new field having malformed string
created by ``PERMISSIVE`` mode. This overrides
``spark.sql.columnNameOfCorruptRecord``. If None is set,
it uses the value specified in
``spark.sql.columnNameOfCorruptRecord``.

>>> csv_sdf = spark.readStream.csv(tempfile.mkdtemp(), schema = sdf_schema)
>>> csv_sdf.isStreaming
True
Expand All @@ -636,7 +651,8 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
nanValue=nanValue, positiveInf=positiveInf, negativeInf=negativeInf,
dateFormat=dateFormat, timestampFormat=timestampFormat, maxColumns=maxColumns,
maxCharsPerColumn=maxCharsPerColumn,
maxMalformedLogPerPartition=maxMalformedLogPerPartition, mode=mode, timeZone=timeZone)
maxMalformedLogPerPartition=maxMalformedLogPerPartition, mode=mode, timeZone=timeZone,
columnNameOfCorruptRecord=columnNameOfCorruptRecord)
if isinstance(path, basestring):
return self._df(self._jreader.csv(path))
else:
Expand Down
18 changes: 14 additions & 4 deletions sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
Original file line number Diff line number Diff line change
Expand Up @@ -286,8 +286,11 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
* during parsing.
* <ul>
* <li>`PERMISSIVE` : sets other fields to `null` when it meets a corrupted record, and puts
* the malformed string into a new field configured by `columnNameOfCorruptRecord`. When
* a schema is set by user, it sets `null` for extra fields.</li>
* the malformed string into a field configured by `columnNameOfCorruptRecord`. To keep
* corrupt records, an user can set a string type field named `columnNameOfCorruptRecord`
* in an user-defined schema. If a schema does not have the field, it drops corrupt records
* during parsing. When inferring a schema, it implicitly adds a `columnNameOfCorruptRecord`
* field in an output schema.</li>
* <li>`DROPMALFORMED` : ignores the whole corrupted records.</li>
* <li>`FAILFAST` : throws an exception when it meets corrupted records.</li>
* </ul>
Expand Down Expand Up @@ -422,12 +425,19 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
* <li>`mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt records
* during parsing.
* <ul>
* <li>`PERMISSIVE` : sets other fields to `null` when it meets a corrupted record. When
* a schema is set by user, it sets `null` for extra fields.</li>
* <li>`PERMISSIVE` : sets other fields to `null` when it meets a corrupted record, and puts
* the malformed string into a field configured by `columnNameOfCorruptRecord`. To keep
* corrupt records, an user can set a string type field named `columnNameOfCorruptRecord`
* in an user-defined schema. If a schema does not have the field, it drops corrupt records
* during parsing. When a length of parsed CSV tokens is shorter than an expected length
* of a schema, it sets `null` for extra fields.</li>
* <li>`DROPMALFORMED` : ignores the whole corrupted records.</li>
* <li>`FAILFAST` : throws an exception when it meets corrupted records.</li>
* </ul>
* </li>
* <li>`columnNameOfCorruptRecord` (default is the value specified in
* `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field having malformed string
* created by `PERMISSIVE` mode. This overrides `spark.sql.columnNameOfCorruptRecord`.</li>
* </ul>
* @since 2.0.0
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ import org.apache.hadoop.mapreduce._

import org.apache.spark.TaskContext
import org.apache.spark.internal.Logging
import org.apache.spark.sql.{Dataset, Encoders, SparkSession}
import org.apache.spark.sql.{AnalysisException, Dataset, Encoders, SparkSession}
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, CompressionCodecs}
import org.apache.spark.sql.catalyst.util.CompressionCodecs
import org.apache.spark.sql.execution.datasources._
import org.apache.spark.sql.execution.datasources.text.TextFileFormat
import org.apache.spark.sql.sources._
Expand Down Expand Up @@ -96,31 +96,44 @@ class CSVFileFormat extends TextBasedFileFormat with DataSourceRegister {
filters: Seq[Filter],
options: Map[String, String],
hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] = {
val csvOptions = new CSVOptions(options, sparkSession.sessionState.conf.sessionLocalTimeZone)

CSVUtils.verifySchema(dataSchema)
val broadcastedHadoopConf =
sparkSession.sparkContext.broadcast(new SerializableConfiguration(hadoopConf))

val parsedOptions = new CSVOptions(
options,
sparkSession.sessionState.conf.sessionLocalTimeZone,
sparkSession.sessionState.conf.columnNameOfCorruptRecord)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It seems CSVOptions is created twice above :)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


// Check a field requirement for corrupt records here to throw an exception in a driver side
dataSchema.getFieldIndex(parsedOptions.columnNameOfCorruptRecord).foreach { corruptFieldIndex =>
val f = dataSchema(corruptFieldIndex)
if (f.dataType != StringType || !f.nullable) {
throw new AnalysisException(
"The field for corrupt records must be string type and nullable")
}
}

(file: PartitionedFile) => {
val lines = {
val conf = broadcastedHadoopConf.value.value
val linesReader = new HadoopFileLinesReader(file, conf)
Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ => linesReader.close()))
linesReader.map { line =>
new String(line.getBytes, 0, line.getLength, csvOptions.charset)
new String(line.getBytes, 0, line.getLength, parsedOptions.charset)
}
}

val linesWithoutHeader = if (csvOptions.headerFlag && file.start == 0) {
val linesWithoutHeader = if (parsedOptions.headerFlag && file.start == 0) {
// Note that if there are only comments in the first block, the header would probably
// be not dropped.
CSVUtils.dropHeaderLine(lines, csvOptions)
CSVUtils.dropHeaderLine(lines, parsedOptions)
} else {
lines
}

val filteredLines = CSVUtils.filterCommentAndEmpty(linesWithoutHeader, csvOptions)
val parser = new UnivocityParser(dataSchema, requiredSchema, csvOptions)
val filteredLines = CSVUtils.filterCommentAndEmpty(linesWithoutHeader, parsedOptions)
val parser = new UnivocityParser(dataSchema, requiredSchema, parsedOptions)
filteredLines.flatMap(parser.parse)
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,20 @@ import org.apache.spark.internal.Logging
import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, CompressionCodecs, ParseModes}

private[csv] class CSVOptions(
@transient private val parameters: CaseInsensitiveMap[String], defaultTimeZoneId: String)
@transient private val parameters: CaseInsensitiveMap[String],
defaultTimeZoneId: String,
defaultColumnNameOfCorruptRecord: String)
extends Logging with Serializable {

def this(parameters: Map[String, String], defaultTimeZoneId: String) =
this(CaseInsensitiveMap(parameters), defaultTimeZoneId)
def this(
parameters: Map[String, String],
defaultTimeZoneId: String,
defaultColumnNameOfCorruptRecord: String = "") = {
this(
CaseInsensitiveMap(parameters),
defaultTimeZoneId,
defaultColumnNameOfCorruptRecord)
}

private def getChar(paramName: String, default: Char): Char = {
val paramValue = parameters.get(paramName)
Expand Down Expand Up @@ -95,6 +104,9 @@ private[csv] class CSVOptions(
val dropMalformed = ParseModes.isDropMalformedMode(parseMode)
val permissive = ParseModes.isPermissiveMode(parseMode)

val columnNameOfCorruptRecord =
parameters.getOrElse("columnNameOfCorruptRecord", defaultColumnNameOfCorruptRecord)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we should add this in readwriter.py too and document this in readwriter.py, DataFrameReader and DataStreamReader.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added doc descriptions in readwriter.py, DataFrameReader, and DataStreamReader .


val nullValue = parameters.getOrElse("nullValue", "")

val nanValue = parameters.getOrElse("nanValue", "NaN")
Expand Down
Loading