Skip to content

Commit 3082b27

Browse files
committed
comments
1 parent a3c5063 commit 3082b27

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

online/src/main/scala/ai/chronon/online/CatalystUtil.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,10 @@ object CatalystUtil {
6363
.config("spark.sql.adaptive.enabled", "false")
6464
.config("spark.sql.legacy.timeParserPolicy", "LEGACY")
6565
.config("spark.ui.enabled", "false")
66+
// the default column reader batch size is 4096 - spark reads that many rows into memory buffer at once.
67+
// that causes ooms on large columns.
68+
// for derivations we only need to read one row at a time.
69+
// for interactive we set the limit to 16.
6670
.config("spark.sql.parquet.columnarReaderBatchSize", "16")
6771
.enableHiveSupport() // needed to support registering Hive UDFs via CREATE FUNCTION.. calls
6872
.getOrCreate()

0 commit comments

Comments
 (0)