Skip to content
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion python/pyspark/sql/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,25 @@ def __ne__(self, other):

@classmethod
def typeName(cls):
return cls.__name__[:-4].lower()
typeTypeNameMap = {"DataType": "data",
"NullType": "null",
"StringType": "string",
"BinaryType": "binary",
"BooleanType": "boolean",
"DateType": "date",
"TimestampType": "timestamp",
"DecimalType": "decimal",
"DoubleType": "double",
"FloatType": "float",
"ByteType": "byte",
"IntegerType": "integer",
"LongType": "long",
"ShortType": "short",
"ArrayType": "array",
"MapType": "map",
"StructField": "struct",
Copy link
Member

@HyukjinKwon HyukjinKwon Mar 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this problem only applies to StructField. Could we just override typeName with simply throwing an exception? I think users are not supposed to call typeName against StructField but simpleString against the instance.

BTW, It apparently seems a bit odd that it extends DataType though.. I guess probably some tests are broken if we change the parent as it seems it is dependent on the parent assuming from #1598. So, I guess minimised fix would be just to override it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @holdenk and @viirya WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think it is valid to call typeName against a StructField. Actually, StructField is not a data type, strictly speaking...

I don't know why StructField inherits DataType in pyspark. In scala, it is not.

Throwing an exception when calling typeName on StructField seems good enough, instead of a map like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya The way I found this bug:
I wanted to figure out the schema of a dataset. I loaded it into a data frame and asked its schema. Then I called the typeName on each column. I do not know this is/was best way to do this, but I think it is valid to call typeName against a dataType to get its real type.

Copy link
Member

@HyukjinKwon HyukjinKwon Mar 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid call for DataTypes but I don't think it is against StructField. If you look at the Scala-side codes, StructField is not a DataType (which is "The base type of all Spark SQL data types") but it seems the parent became DataType in Python for some reason in the PR I pointed out. In any way, It seems "A field inside a StructType".

If you want to know the schema, you could simply

>>> spark.range(1).schema[0].simpleString()
'id:bigint'
>>> spark.range(1).schema.simpleString()
'struct<id:bigint>'
>>> type(spark.range(1).schema[0])
<class 'pyspark.sql.types.StructField'>
>>> str(spark.range(1).schema[0])
'StructField(id,LongType,false)'

Could you elaborate your use-case and why simpleString is not enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szalai1 I think @HyukjinKwon 's code snippets should address your request. Doesn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I don't think i.typeName() is a valid usage. We better let it throw an exception when calling typeName on StructField.

i.dataType.typeName() is more reasonable call to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya You are right. I will use it that way. Thanks @HyukjinKwon !

But my bug report is still valid, I think. Can I override the typeName function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I think we should still fix and overriding it is good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

"StructType": "struct"}
return typeTypeNameMap[cls.__name__]

def simpleString(self):
return self.typeName()
Expand Down