Skip to content

Commit 0ec0102

Browse files
H0TB0X420timsaucer
andauthored
Fix drop() method to handle quoted column names consistently (apache#1242)
* Fix drop() method to handle quoted column names consistently - Strip quotes from column names in drop() method - Maintains consistency with other DataFrame operations - Both drop('col') and drop('col') now work Fixes apache#1212 * Update drop() method docstring to clarify quote handling - Document that column names are case-sensitive and don't require quotes - Clarify that both quoted and unquoted column names are accepted - Add examples showing both 'col' and 'col' syntax work - Note difference from select() operation behavior * Fix whitespace and documentation errors --------- Co-authored-by: Tim Saucer <[email protected]>
1 parent 5f8d500 commit 0ec0102

File tree

2 files changed

+29
-2
lines changed

2 files changed

+29
-2
lines changed

python/datafusion/dataframe.py

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -413,13 +413,30 @@ def select(self, *exprs: Expr | str) -> DataFrame:
413413
def drop(self, *columns: str) -> DataFrame:
414414
"""Drop arbitrary amount of columns.
415415
416+
Column names are case-sensitive and do not require double quotes like
417+
other operations such as `select`. Leading and trailing double quotes
418+
are allowed and will be automatically stripped if present.
419+
416420
Args:
417-
columns: Column names to drop from the dataframe.
421+
columns: Column names to drop from the dataframe. Both ``column_name``
422+
and ``"column_name"`` are accepted.
418423
419424
Returns:
420425
DataFrame with those columns removed in the projection.
426+
427+
Example Usage::
428+
429+
df.drop('ID_For_Students') # Works
430+
df.drop('"ID_For_Students"') # Also works (quotes stripped)
421431
"""
422-
return DataFrame(self.df.drop(*columns))
432+
normalized_columns = []
433+
for col in columns:
434+
if col.startswith('"') and col.endswith('"'):
435+
normalized_columns.append(col.strip('"')) # Strip double quotes
436+
else:
437+
normalized_columns.append(col)
438+
439+
return DataFrame(self.df.drop(*normalized_columns))
423440

424441
def filter(self, *predicates: Expr) -> DataFrame:
425442
"""Return a DataFrame for which ``predicate`` evaluates to ``True``.

python/tests/test_dataframe.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,16 @@ def test_select(df):
220220
assert result.column(1) == pa.array([1, 2, 3])
221221

222222

223+
def test_drop_quoted_columns():
224+
ctx = SessionContext()
225+
batch = pa.RecordBatch.from_arrays([pa.array([1, 2, 3])], names=["ID_For_Students"])
226+
df = ctx.create_dataframe([[batch]])
227+
228+
# Both should work
229+
assert df.drop('"ID_For_Students"').schema().names == []
230+
assert df.drop("ID_For_Students").schema().names == []
231+
232+
223233
def test_select_mixed_expr_string(df):
224234
df = df.select(column("b"), "a")
225235

0 commit comments

Comments
 (0)