Releases: snowflakedb/snowpark-python
Release
1.15.0 (2024-04-24)
New Features
- Added
truncate
save mode inDataFrameWrite
to overwrite existing tables by truncating the underlying table instead of dropping it. - Added telemetry to calculate query plan height and number of duplicate nodes during collect operations.
- Added the functions below to unload data from a
DataFrame
into one or more files in a stage:DataFrame.write.json
DataFrame.write.csv
DataFrame.write.parquet
- Added distributed tracing using open telemetry APIs for action functions in
DataFrame
andDataFrameWriter
:- snowflake.snowpark.DataFrame:
- collect
- collect_nowait
- to_pandas
- count
- show
- snowflake.snowpark.DataFrameWriter:
- save_as_table
- snowflake.snowpark.DataFrame:
- Added support for snow:// URLs to
snowflake.snowpark.Session.file.get
andsnowflake.snowpark.Session.file.get_stream
- Added support to register stored procedures and UDxFs with a
comment
. - UDAF client support is ready for public preview. Please stay tuned for the Snowflake announcement of UDAF public preview.
- Added support for dynamic pivot. This feature is currently in private preview.
Improvements
- Improved the generated query performance for both compilation and execution by converting duplicate subqueries to Common Table Expressions (CTEs). It is still an experimental feature not enabled by default, and can be enabled by setting
session.cte_optimization_enabled
toTrue
.
Bug Fixes
- Fixed a bug where
statement_params
was not passed to query executions that register stored procedures and user defined functions. - Fixed a bug causing
snowflake.snowpark.Session.file.get_stream
to fail for quoted stage locations. - Fixed a bug that an internal type hint in
utils.py
might raise AttributeError in case the underlying module can not be found.
Local Testing Updates
New Features
- Added support for registering UDFs and stored procedures.
- Added support for the following APIs:
- snowflake.snowpark.Session:
- file.put
- file.put_stream
- file.get
- file.get_stream
- read.json
- add_import
- remove_import
- get_imports
- clear_imports
- add_packages
- add_requirements
- clear_packages
- remove_package
- udf.register
- udf.register_from_file
- sproc.register
- sproc.register_from_file
- snowflake.snowpark.functions
- current_database
- current_session
- date_trunc
- object_construct
- object_construct_keep_null
- pow
- sqrt
- udf
- sproc
- snowflake.snowpark.Session:
- Added support for StringType, TimestampType and VariantType data conversion in the mocked function
to_time
.
Bug Fixes
- Fixed a bug that null filled columns for constant functions.
- Fixed a bug that implementation of to_object, to_array and to_binary to better handle null inputs.
- Fixed a bug that timestamp data comparison can not handle year beyond 2262.
- Fixed a bug that
Session.builder.getOrCreate
should return the created mock session.
Release
1.14.0 (2024-03-20)
New Features
- Added support for creating vectorized UDTFs with
process
method. - Added support for dataframe functions:
- to_timestamp_ltz
- to_timestamp_ntz
- to_timestamp_tz
- locate
- Added support for ASOF JOIN type.
- Added support for the following local testing APIs:
- snowflake.snowpark.functions:
- to_double
- to_timestamp
- to_timestamp_ltz
- to_timestamp_ntz
- to_timestamp_tz
- greatest
- least
- convert_timezone
- dateadd
- date_part
- snowflake.snowpark.Session:
- get_current_account
- get_current_warehouse
- get_current_role
- use_schema
- use_warehouse
- use_database
- use_role
- snowflake.snowpark.functions:
Bug Fixes
- Fixed a bug in
SnowflakePlanBuilder
thatsave_as_table
does not filter column that name start with '$' and follow by number correctly. - Fixed a bug that statement parameters may have no effect when resolving imports and packages.
- Fixed bugs in local testing:
- LEFT ANTI and LEFT SEMI joins drop rows with null values.
- DataFrameReader.csv incorrectly parses data when the optional parameter
field_optionally_enclosed_by
is specified. - Column.regexp only considers the first entry when
pattern
is aColumn
. - Table.update raises
KeyError
when updating null values in the rows. - VARIANT columns raise errors at
DataFrame.collect
. count_distinct
does not work correctly when counting.- Null values in integer columns raise
TypeError
.
Improvements
- Added telemetry to local testing.
- Improved the error message of
DataFrameReader
to raiseFileNotFound
error when reading a path that does not exist or when there are no files under the path.
Release
1.13.0 (2024-02-26)
New Features
- Added support for an optional
date_part
argument in functionlast_day
SessionBuilder.app_name
will set the query_tag after the session is created.- Added support for the following local testing functions:
- current_timestamp
- current_date
- current_time
- strip_null_value
- upper
- lower
- length
- initcap
Improvements
- Added cleanup logic at interpreter shutdown to close all active sessions.
Bug Fixes
- Fixed a bug in
DataFrame.to_local_iterator
where the iterator could yield wrong results if another query is executed before the iterator finishes due to wrong isolation level. For details, please see #945. - Fixed a bug that truncated table names in error messages while running a plan with local testing enabled.
- Fixed a bug that
Session.range
returns empty result when the range is large.
Release
1.12.1 (2024-02-08)
Improvements
- Use
split_blocks=True
by default duringto_pandas
conversion, for optimal memory allocation. This parameter is passed topyarrow.Table.to_pandas
, which enablesPyArrow
to split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.
Bug Fixes
- Fixed a bug in
DataFrame.to_pandas
that caused an error when evaluating on a Dataframe with anIntergerType
column with null values.
v1.12.0
1.12.0 (2024-01-30)
New Features
- Exposed
statement_params
inStoredProcedure.__call__
. - Added two optional arguments to
Session.add_import
.chunk_size
: The number of bytes to hash per chunk of the uploaded files.whole_file_hash
: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.
- Added parameters
external_access_integrations
andsecrets
when creating a UDAF from Snowpark Python to allow integration with external access. - Added a new method
Session.append_query_tag
. Allows an additional tag to be added to the current query tag by appending it as a comma separated value. - Added a new method
Session.update_query_tag
. Allows updates to a JSON encoded dictionary query tag. SessionBuilder.getOrCreate
will now attempt to replace the singleton it returns when token expiration has been detected.- Added support for new functions in
snowflake.snowpark.functions
:array_except
create_map
sign
/signum
- Added the following functions to
DataFrame.analytics
:- Added the
moving_agg
function inDataFrame.analytics
to enable moving aggregations like sums and averages with multiple window sizes. - Added the
cummulative_agg
function inDataFrame.analytics
to enable moving aggregations like sums and averages with multiple window sizes.
- Added the
Bug Fixes
-
Fixed a bug in
DataFrame.na.fill
that caused Boolean values to erroneously override integer values. -
Fixed a bug in
Session.create_dataframe
where the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:- Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as
LongType()
, but will now be correctly maintained as timestamp values and be inferred asTimestampType(TimestampTimeZone.NTZ)
. - Earlier timestamp columns with a timezone would be inferred as
TimestampType(TimestampTimeZone.NTZ)
and loose timezone information but will now be correctly inferred asTimestampType(TimestampTimeZone.LTZ)
and timezone information is retained correctly. - Set session parameter
PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAME
to revert back to old behavior. It is recommended that you update your code to align with correct behavior because the parameter will be removed in the future.
- Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as
-
Fixed a bug that
DataFrame.to_pandas
gets decimal type when scale is not 0, and creates an object dtype inpandas
. Instead, we cast the value to a float64 type. -
Fixed bugs that wrongly flattened the generated SQL when one of the following happens:
DataFrame.filter()
is called afterDataFrame.sort().limit()
.DataFrame.sort()
orfilter()
is called on a DataFrame that already has a window function or sequence-dependent data generator column.
For instance,df.select("a", seq1().alias("b")).select("a", "b").sort("a")
won't flatten the sort clause anymore.- a window or sequence-dependent data generator column is used after
DataFrame.limit()
. For instance,df.limit(10).select(row_number().over())
won't flatten the limit and select in the generated SQL.
-
Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,
df = df.select(col("a").alias("b")) df = copy(df) df.select(col("b").alias("c")) # threw an error. Now it's fixed.
-
Fixed a bug in
Session.create_dataframe
that the non-nullable field in a schema is not respected for boolean type. Note that this fix is only effective when the user has the privilege to create a temp table. -
Fixed a bug in SQL simplifier where non-select statements in
session.sql
dropped a SQL query when used withlimit()
. -
Fixed a bug that raised an exception when session parameter
ERROR_ON_NONDETERMINISTIC_UPDATE
is true.
Behavior Changes (API Compatible)
- When parsing data types during a
to_pandas
operation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned asint8
gets returned asint64
. Users can fix this by explicitly specifying precision values for their return column. - Aligned behavior for
Session.call
in case of table stored procedures where runningSession.call
would not trigger stored procedure unless acollect()
operation was performed. StoredProcedureRegistration
will now automatically addsnowflake-snowpark-python
as a package dependency. The added dependency will be on the client's local version of the library and an error is thrown if the server cannot support that version.
Release
1.11.1 (2023-12-07)
Bug Fixes
- Fixed a bug that numpy should not be imported at the top level of mock module.
Release
1.11.0 (2023-12-05)
New Features
-
Add the
conn_error
attribute toSnowflakeSQLException
that stores the whole underlying exception fromsnowflake-connector-python
. -
Added support for
RelationalGroupedDataframe.pivot()
to accesspivot
in the following patternDataframe.group_by(...).pivot(...)
. -
Added experimental feature: Local Testing Mode, which allows you to create and operate on Snowpark Python DataFrames locally without connecting to a Snowflake account. You can use the local testing framework to test your DataFrame operations locally, on your development machine or in a CI (continuous integration) pipeline, before deploying code changes to your account.
-
Added support for
arrays_to_object
new functions insnowflake.snowpark.functions
. -
Added support for the vector data type.
Dependency Updates
- Bumped cloudpickle dependency to work with
cloudpickle==2.2.1
- Updated
snowflake-connector-python
to3.4.0
.
Bug Fixes
- DataFrame column names quoting check now supports newline characters.
- Fix a bug where a DataFrame generated by
session.read.with_metadata
creates inconsistent table when doingdf.write.save_as_table
.
Release
1.10.0 (2023-11-03)
New Features
- Added support for managing case sensitivity in
DataFrame.to_local_iterator()
. - Added support for specifying vectorized UDTF's input column names by using the optional parameter
input_names
inUDTFRegistration.register/register_file
andfunctions.pandas_udtf
. By default,RelationalGroupedDataFrame.applyInPandas
will infer the column names from current dataframe schema. - Add
sql_error_code
andraw_message
attributes toSnowflakeSQLException
when it is caused by a SQL exception.
Bug Fixes
- Fixed a bug in
DataFrame.to_pandas()
where converting snowpark dataframes to pandas dataframes was losing precision on integers with more than 19 digits. - Fixed a bug that
session.add_packages
can not handle requirement specifier that contains project name with underscore and version. - Fixed a bug in
DataFrame.limit()
whenoffset
is used and the parentDataFrame
useslimit
. Now theoffset
won't impact the parent DataFrame'slimit
. - Fixed a bug in
DataFrame.write.save_as_table
where dataframes created from read api could not save data into snowflake because of invalid column name$1
.
Behavior change
- Changed the behavior of
date_format
:- The
format
argument changed from optional to required. - The returned result changed from a date object to a date-formatted string.
- The
- When a window function, or a sequence-dependent data generator (
normal
,zipf
,uniform
,seq1
,seq2
,seq4
,seq8
) function is used, the sort and filter operation will no longer be flattened when generating the query.
Release
1.9.0 (2023-10-13)
New Features
- Added support for the Python 3.11 runtime environment.
Dependency updates
- Added back the dependency of
typing-extensions
.
Bug Fixes
- Fixed a bug where imports from permanent stage locations were ignored for temporary stored procedures, UDTFs, UDFs, and UDAFs.
- Revert back to using CTAS (create table as select) statement for
Dataframe.writer.save_as_table
which does not need insert permission for writing tables.
New Features
- Support
PythonObjJSONEncoder
json-serializable objects forARRAY
andOBJECT
literals.
Release
1.8.0 (2023-09-14)
New Features
- Added support for VOLATILE/IMMUTABLE keyword when registering UDFs.
- Added support for specifying clustering keys when saving dataframes using
DataFrame.save_as_table
. - Accept
Iterable
objects input forschema
when creating dataframes usingSession.create_dataframe
. - Added the property
DataFrame.session
to return aSession
object. - Added the property
Session.session_id
to return an integer that represents session ID. - Added the property
Session.connection
to return aSnowflakeConnection
object . - Added support for creating a Snowpark session from a configuration file or environment variables.
Dependency updates
- Updated
snowflake-connector-python
to 3.2.0.
Bug Fixes
- Fixed a bug where automatic package upload would raise
ValueError
even when compatible package version were added insession.add_packages
. - Fixed a bug where table stored procedures were not registered correctly when using
register_from_file
. - Fixed a bug where dataframe joins failed with
invalid_identifier
error. - Fixed a bug where
DataFrame.copy
disables SQL simplfier for the returned copy. - Fixed a bug where
session.sql().select()
would fail if any parameters are specified tosession.sql()
.