Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix DataFrame.drop() to remove fields from Spark DataFrame also. (#794)
When we drop columns from dataframe with `DataFrame.drop()`, We can get a dataframe which columns are dropped properly like below. ```python >>> df name class max_speed 0 falcon bird 389.0 1 parrot bird 24.0 2 lion mammal 80.5 3 monkey mammal NaN >>> >>> df = df.drop('name') >>> df class max_speed 0 bird 389.0 1 bird 24.0 2 mammal 80.5 3 mammal NaN ``` But when we try to get an internal spark dataframe after then, it shows us original one which is not delete columns like below. ``` >>> df._sdf.show() +-----------------+------+------+---------+ |__index_level_0__| name| class|max_speed| +-----------------+------+------+---------+ | 0|falcon| bird| 389.0| | 1|parrot| bird| 24.0| | 2| lion|mammal| 80.5| | 3|monkey|mammal| null| +-----------------+------+------+---------+ ``` (Although I dropped a column 'name' above example, it still shown in internal spark dataframe) so i think maybe we need to drop them, too. like: ``` >>> df._sdf.show() +-----------------+------+---------+ |__index_level_0__| class|max_speed| +-----------------+------+---------+ | 0| bird| 389.0| | 1| bird| 24.0| | 2|mammal| 80.5| | 3|mammal| null| +-----------------+------+---------+ ```
- Loading branch information