test: Simplify `read_scan_test`, spark session by dangotbanned · Pull Request #3024 · narwhals-dev/narwhals

dangotbanned · 2025-08-22T22:29:44Z

What type of PR is this? (check all applicable)

Related issues

Related issue Tracking: Overhauling the test suite #2959
Got sidetracked starting Support DataFrame().lazy({'pyspark','sqlframe','pyspark[connect]'}) #3023

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

I was looking to deduplicate the spark-like session stuff in prep for #3023.

I thought I'd use this as full-module example for #2959 on how we can write less repetitive tests 🙂

https://docs.pytest.org/en/stable/how-to/tmp_path.html#the-tmpdir-and-tmpdir-factory-fixtures

`parametrize` already defines the contents, that'll do

https://github.com/narwhals-dev/narwhals/actions/runs/17167325832/job/48710454201?pr=3024

FBruzzesi

Thanks @dangotbanned - I have one comment that can help us even more 🎉

FBruzzesi · 2025-08-23T20:01:06Z

tests/read_scan_test.py

+def sqlframe_session() -> DuckDBSession:
+    from sqlframe.duckdb import DuckDBSession

-def test_scan_csv(tmpdir: pytest.TempdirFactory, constructor: Constructor) -> None:
+    # NOTE: `__new__` override inferred by `pyright` only
+    # https://github.com/eakmanrq/sqlframe/blob/772b3a6bfe5a1ffd569b7749d84bea2f3a314510/sqlframe/base/session.py#L181-L184
+    return cast("DuckDBSession", DuckDBSession())  # type: ignore[redundant-cast]
+
+
+def pyspark_session() -> SparkSession:  # pragma: no cover
+    if is_spark_connect := os.environ.get("SPARK_CONNECT", None):
+        from pyspark.sql.connect.session import SparkSession
+    else:
+        from pyspark.sql import SparkSession
+    builder = cast("SparkSession.Builder", SparkSession.builder).appName("unit-tests")
+    builder = (
+        builder.remote(f"sc://localhost:{os.environ.get('SPARK_PORT', '15002')}")
+        if is_spark_connect
+        else builder.master("local[1]").config("spark.ui.enabled", "false")
+    )
+    return (
+        builder.config("spark.default.parallelism", "1")
+        .config("spark.sql.shuffle.partitions", "2")
+        .config("spark.sql.session.timeZone", "UTC")
+        .getOrCreate()
+    )


Should we consider moving these into tests/utils.py? They can be re-used both in tests/conftest.py and in #3032

I do want to eventually, but I was thinking as session-scoped fixtures?

We'd need to restructure some of the existing tests though - e.g. so the same filtering that happens in --constructors also applies to these heavy things 🤔

We don't have to do that, but that was the idea I was working on when I got distracted and did this PR instead 😂

Ah second thought, @FBruzzesi yeah just move them, merge and use in #3032 if they're helpful 😅

I did explicitly mention that this was prep for #3023 anyway 🤦

Regarding

I do want to eventually, but I was thinking as session-scoped fixtures?

and

heavy things

The pyspark session is a singleton (the getOrCreate part should be key) - we create it once and use it in the pyspark constructor

narwhals/tests/conftest.py

Lines 203 to 213 in 68d762a

def _constructor(obj: Data) -> PySparkDataFrame:

_obj = deepcopy(obj)

index_col_name = generate_temporary_column_name(n_bytes=8, columns=list(_obj))

_obj[index_col_name] = list(range(len(_obj[next(iter(_obj))])))

return (

session.createDataFrame([*zip(*_obj.values())], schema=[*_obj.keys()])

.repartition(2)

.orderBy(index_col_name)

.drop(index_col_name)

)

For SQLFrame it should be quite lightweight

so the same filtering that happens in --constructors

What do you mean by this exactly? I am not following 🙈

Anyway, we can move it as a follow up, no worries

My brain has melted for the day dude 😭

Anyway, we can move it as a follow up, no worries

I'm happy for you to do it now if you're keen to use it the other PR?

dangotbanned · 2025-08-23T21:35:24Z

tests/dtypes_test.py

-        "spark.sql.session.timeZone", "UTC"
-    ).getOrCreate()
+    session = pyspark_session()



Christ, great call @FBruzzesi!
I had no idea we had this logic in so many places 😂

FBruzzesi

Self approving my edits 😂 But please take a look at 267e7b7

You might have already done it, but I might want to wait for tomorrow because of

My brain has melted for the day dude 😭

dangotbanned · 2025-08-23T21:53:14Z

Self approving my edits 😂 But please take a look at 267e7b7

All good thanks @FBruzzesi 😍

You might have already done it, but I might want to wait for tomorrow because of

My brain has melted for the day dude 😭

I'm still good to watch all the lovely code fly by and appreciate it 😄

dangotbanned added 11 commits August 22, 2025 20:49

test: tmpdir -> tmp_path

154c5f4

https://docs.pytest.org/en/stable/how-to/tmp_path.html#the-tmpdir-and-tmpdir-factory-fixtures

test: Add {csv,parquet}_path fixtures

e59cb73

test: reuse @parametrize

8865868

test: reuse eager_backend fixture

1f75dc8

test: Add assert_equal_{eager,lazy}

6e3a48f

test: Avoid unused assignment

e30d211

test: Add native_namespace

f353791

test: Factor out session building

cf5e939

test: match message pattern

a400669

test: Omit unused getattr typing

320b46b

`parametrize` already defines the contents, that'll do

test: Make most one-liners 😄

972be20

dangotbanned added the tests label Aug 22, 2025

dangotbanned mentioned this pull request Aug 22, 2025

Tracking: Overhauling the test suite #2959

Open

dangotbanned marked this pull request as ready for review August 22, 2025 22:30

cov

8fd5920

https://github.com/narwhals-dev/narwhals/actions/runs/17167325832/job/48710454201?pr=3024

FBruzzesi reviewed Aug 23, 2025

View reviewed changes

FBruzzesi added 2 commits August 23, 2025 22:57

merge main

e5bea71

factor out spark-like session creation

267e7b7

dangotbanned commented Aug 23, 2025

View reviewed changes

FBruzzesi approved these changes Aug 23, 2025

View reviewed changes

dangotbanned changed the title ~~test: Simplify read_scan_test~~ test: Simplify read_scan_test, spark session Aug 23, 2025

dangotbanned merged commit 3c58b0e into main Aug 23, 2025
31 of 32 checks passed

dangotbanned deleted the test-simp-read-scan branch August 23, 2025 21:55

This was referenced Aug 24, 2025

feat: Allow spark-like backends in .lazy(backend=...) #3032

Merged

[Enh]: Allow read_ and scan_ functions to read from path-like or IO objects #3064

Open

dangotbanned mentioned this pull request Oct 18, 2025

feat: add separator argument in read_csv/scan_csv #2989

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Simplify `read_scan_test`, spark session#3024

test: Simplify `read_scan_test`, spark session#3024
dangotbanned merged 14 commits intomainfrom
test-simp-read-scan

dangotbanned commented Aug 22, 2025 •

edited

Loading

Uh oh!

FBruzzesi left a comment

Uh oh!

FBruzzesi Aug 23, 2025

Uh oh!

dangotbanned Aug 23, 2025

Uh oh!

dangotbanned Aug 23, 2025

Uh oh!

FBruzzesi Aug 23, 2025

Uh oh!

dangotbanned Aug 23, 2025

Uh oh!

dangotbanned Aug 23, 2025

Uh oh!

FBruzzesi left a comment

Uh oh!

dangotbanned commented Aug 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def _constructor(obj: Data) -> PySparkDataFrame:
	_obj = deepcopy(obj)
	index_col_name = generate_temporary_column_name(n_bytes=8, columns=list(_obj))
	_obj[index_col_name] = list(range(len(_obj[next(iter(_obj))])))

	return (
	session.createDataFrame([zip(_obj.values())], schema=[*_obj.keys()])
	.repartition(2)
	.orderBy(index_col_name)
	.drop(index_col_name)
	)

Conversation

dangotbanned commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

FBruzzesi left a comment

Choose a reason for hiding this comment

Uh oh!

FBruzzesi Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

dangotbanned Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

dangotbanned Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

FBruzzesi Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

dangotbanned Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

dangotbanned Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

FBruzzesi left a comment

Choose a reason for hiding this comment

Uh oh!

dangotbanned commented Aug 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dangotbanned commented Aug 22, 2025 •

edited

Loading