Skip to content

Releases: snowflakedb/snowpark-python

Release

12 Sep 19:06
Compare
Choose a tag to compare

1.22.1 (2024-09-11)

This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.

1.22.0 (2024-09-10)

Snowpark Python API Updates

New Features

  • Added the following new functions in snowflake.snowpark.functions:
    • array_remove
    • ln

Improvements

  • Improved documentation for Session.write_pandas by making use_logical_type option more explicit.
  • Added support for specifying the following to DataFrameWriter.save_as_table:
    • enable_schema_evolution
    • data_retention_time
    • max_data_extension_time
    • change_tracking
    • copy_grants
    • iceberg_config A dicitionary that can hold the following iceberg configuration options:
      • external_volume
      • catalog
      • base_location
      • catalog_sync
      • storage_serialization_policy
  • Added support for specifying the following to DataFrameWriter.copy_into_table:
    • iceberg_config A dicitionary that can hold the following iceberg configuration options:
      • external_volume
      • catalog
      • base_location
      • catalog_sync
      • storage_serialization_policy
  • Added support for specifying the following parameters to DataFrame.create_or_replace_dynamic_table:
    • mode
    • refresh_mode
    • initialize
    • clustering_keys
    • is_transient
    • data_retention_time
    • max_data_extension_time

Bug Fixes

  • Fixed a bug in session.read.csv that caused an error when setting PARSE_HEADER = True in an externally defined file format.
  • Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.
  • Fixed a bug in session.get_session_stage that referenced a non-existing stage after switching database or schema.
  • Fixed a bug where calling DataFrame.to_snowpark_pandas without explicitly initializing the Snowpark pandas plugin caused an error.
  • Fixed a bug where using the explode function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on the outer parameter.

Snowpark Local Testing Updates

New Features

  • Added support for type coercion when passing columns as input to UDF calls.
  • Added support for Index.identical.

Bug Fixes

  • Fixed a bug where the truncate mode in DataFrameWriter.save_as_table incorrectly handled DataFrames containing only a subset of columns from the existing table.
  • Fixed a bug where function to_timestamp does not set the default timezone of the column datatype.

Snowpark pandas API Updates

New Features

  • Added limited support for the Timedelta type, including the following features. Snowpark pandas will raise NotImplementedError for unsupported Timedelta use cases.
    • supporting tracking the Timedelta type through copy, cache_result, shift, sort_index, assign, bfill, ffill, fillna, compare, diff, drop, dropna, duplicated, empty, equals, insert, isin, isna, items, iterrows, join, len, mask, melt, merge, nlargest, nsmallest, to_pandas.
    • converting non-timedelta to timedelta via astype.
    • NotImplementedError will be raised for the rest of methods that do not support Timedelta.
    • support for subtracting two timestamps to get a Timedelta.
    • support indexing with Timedelta data columns.
    • support for adding or subtracting timestamps and Timedelta.
    • support for binary arithmetic between two Timedelta values.
    • support for binary arithmetic and comparisons between Timedelta values and numeric values.
    • support for lazy TimedeltaIndex.
    • support for pd.to_timedelta.
    • support for GroupBy aggregations min, max, mean, idxmax, idxmin, std, sum, median, count, any, all, size, nunique, head, tail, aggregate.
    • support for GroupBy filtrations first and last.
    • support for TimedeltaIndex attributes: days, seconds, microseconds and nanoseconds.
    • support for diff with timestamp columns on axis=0 and axis=1
    • support for TimedeltaIndex methods: ceil, floor and round.
    • support for TimedeltaIndex.total_seconds method.
  • Added support for index's arithmetic and comparison operators.
  • Added support for Series.dt.round.
  • Added documentation pages for DatetimeIndex.
  • Added support for Index.name, Index.names, Index.rename, and Index.set_names.
  • Added support for Index.__repr__.
  • Added support for DatetimeIndex.month_name and DatetimeIndex.day_name.
  • Added support for Series.dt.weekday, Series.dt.time, and DatetimeIndex.time.
  • Added support for Index.min and Index.max.
  • Added support for pd.merge_asof.
  • Added support for Series.dt.normalize and DatetimeIndex.normalize.
  • Added support for Index.is_boolean, Index.is_integer, Index.is_floating, Index.is_numeric, and Index.is_object.
  • Added support for DatetimeIndex.round, DatetimeIndex.floor and DatetimeIndex.ceil.
  • Added support for Series.dt.days_in_month and Series.dt.daysinmonth.
  • Added support for DataFrameGroupBy.value_counts and SeriesGroupBy.value_counts.
  • Added support for Series.is_monotonic_increasing and Series.is_monotonic_decreasing.
  • Added support for Index.is_monotonic_increasing and Index.is_monotonic_decreasing.
  • Added support for pd.crosstab.
  • Added support for pd.bdate_range and included business frequency support (B, BME, BMS, BQE, BQS, BYE, BYS) for both pd.date_range and pd.bdate_range.
  • Added support for lazy Index objects as labels in DataFrame.reindex and Series.reindex.
  • Added support for Series.dt.days, Series.dt.seconds, Series.dt.microseconds, and Series.dt.nanoseconds.
  • Added support for creating a DatetimeIndex from an Index of numeric or string type.
  • Added support for string indexing with Timedelta objects.
  • Added support for Series.dt.total_seconds method.

Improvements

  • Improve concat, join performance when operations are performed on series coming from the same dataframe by avoiding unnecessary joins.
  • Refactored quoted_identifier_to_snowflake_type to avoid making metadata queries if the types have been cached locally.
  • Improved pd.to_datetime to handle all local input cases.
  • Create a lazy index from another lazy index without pulling data to client.
  • Raised NotImplementedError for Index bitwise operators.
  • Display a more clear error message when Index.names is set to a non-like-like object.
  • Raise a warning whenever MultiIndex values are pulled in locally.
  • Improve warning message for pd.read_snowflake include the creation reason when temp table creation is triggered.
  • Improve performance for DataFrame.set_index, or setting DataFrame.index or Series.index by avoiding checks require eager evaluation. As a consequence, when the new index that does not match the current Series/DataFrame object length, a ValueError is no longer raised. Instead, when the Series/DataFrame object is longer than the provided index, the Series/DataFrame's new index is filled with NaN values for the "extra" elements. Otherwise, the extra values in the provided index are ignored.

Bug Fixes

  • Stopped ignoring nanoseconds in pd.Timedelta scalars.
  • Fixed AssertionError in tree of binary operations.
  • Fixed bug in Series.dt.isocalendar using a named Series
  • Fixed inplace argument for Series objects derived from DataFrame columns.
  • Fixed a bug where Series.reindex and DataFrame.reindex did not update the result index's name correctly.
  • Fixed a bug where Series.take did not error when axis=1 was specified.

Release

05 Sep 20:28
Compare
Choose a tag to compare

1.21.1 (2024-09-05)

Snowpark Python API Updates

Bug Fixes

  • Fixed a bug where using to_pandas_batches with async jobs caused an error due to improper handling of waiting for asynchronous query completion.

Release

19 Aug 18:37
Compare
Choose a tag to compare

1.21.0 (2024-08-19)

Snowpark Python API Updates

New Features

  • Added support for snowflake.snowpark.testing.assert_dataframe_equal that is a utility function to check the equality of two Snowpark DataFrames.

Improvements

  • Added support server side string size limitations.
  • Added support to create and invoke stored procedures, UDFs and UDTFs with optional arguments.
  • Added support for column lineage in the DataFrame.lineage.trace API.
  • Added support for passing INFER_SCHEMA options to DataFrameReader via INFER_SCHEMA_OPTIONS.
  • Added support for passing parameters parameter to Column.rlike and Column.regexp.
  • Added support for automatically cleaning up temporary tables created by df.cache_result() in the current session, when the DataFrame is no longer referenced (i.e., gets garbage collected). It is still an experimental feature not enabled by default, and can be enabled by setting session.auto_clean_up_temp_table_enabled to True.
  • Added support for string literals to the fmt parameter of snowflake.snowpark.functions.to_date.

Bug Fixes

  • Fixed a bug where SQL generated for selecting * column has an incorrect subquery.
  • Fixed a bug in DataFrame.to_pandas_batches where the iterator could throw an error if certain transformation is made to the pandas dataframe due to wrong isolation level.
  • Fixed a bug in DataFrame.lineage.trace to split the quoted feature view's name and version correctly.
  • Fixed a bug in Column.isin that caused invalid sql generation when passed an empty list.
  • Fixed a bug that fails to raise NotImplementedError while setting cell with list like item.

Snowpark Local Testing Updates

New Features

  • Added support for the following APIs:
    • snowflake.snowpark.functions
      • rank
      • dense_rank
      • percent_rank
      • cume_dist
      • ntile
      • datediff
      • array_agg
    • snowflake.snowpark.column.Column.within_group
  • Added support for parsing flags in regex statements for mocked plans. This maintains parity with the rlike and regexp changes above.

Bug Fixes

  • Fixed a bug where Window Functions LEAD and LAG do not handle option ignore_nulls properly.
  • Fixed a bug where values were not populated into the result DataFrame during the insertion of table merge operation.

Improvements

  • Fix pandas FutureWarning about integer indexing.

Snowpark pandas API Updates

New Features

  • Added support for DataFrame.backfill, DataFrame.bfill, Series.backfill, and Series.bfill.
  • Added support for DataFrame.compare and Series.compare with default parameters.
  • Added support for Series.dt.microsecond and Series.dt.nanosecond.
  • Added support for Index.is_unique and Index.has_duplicates.
  • Added support for Index.equals.
  • Added support for Index.value_counts.
  • Added support for Series.dt.day_name and Series.dt.month_name.
  • Added support for indexing on Index, e.g., df.index[:10].
  • Added support for DataFrame.unstack and Series.unstack.
  • Added support for DataFrame.asfreq and Series.asfreq.
  • Added support for Series.dt.is_month_start and Series.dt.is_month_end.
  • Added support for Index.all and Index.any.
  • Added support for Series.dt.is_year_start and Series.dt.is_year_end.
  • Added support for Series.dt.is_quarter_start and Series.dt.is_quarter_end.
  • Added support for lazy DatetimeIndex.
  • Added support for Series.argmax and Series.argmin.
  • Added support for Series.dt.is_leap_year.
  • Added support for DataFrame.items.
  • Added support for Series.dt.floor and Series.dt.ceil.
  • Added support for Index.reindex.
  • Added support for DatetimeIndex properties: year, month, day, hour, minute, second, microsecond,
    nanosecond, date, dayofyear, day_of_year, dayofweek, day_of_week, weekday, quarter,
    is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_year_start, is_year_end
    and is_leap_year.
  • Added support for Resampler.fillna and Resampler.bfill.
  • Added limited support for the Timedelta type, including creating Timedelta columns and to_pandas.
  • Added support for Index.argmax and Index.argmin.

Improvements

  • Removed the public preview warning message when importing Snowpark pandas.
  • Removed unnecessary count query from SnowflakeQueryCompiler.is_series_like method.
  • Dataframe.columns now returns native pandas Index object instead of Snowpark Index object.
  • Refactor and introduce query_compiler argument in Index constructor to create Index from query compiler.
  • pd.to_datetime now returns a DatetimeIndex object instead of a Series object.
  • pd.date_range now returns a DatetimeIndex object instead of a Series object.

Bug Fixes

  • Made passing an unsupported aggregation function to pivot_table raise NotImplementedError instead of KeyError.
  • Removed axis labels and callable names from error messages and telemetry about unsupported aggregations.
  • Fixed AssertionError in Series.drop_duplicates and DataFrame.drop_duplicates when called after sort_values.
  • Fixed a bug in Index.to_frame where the result frame's column name may be wrong where name is unspecified.
  • Fixed a bug where some Index docstrings are ignored.
  • Fixed a bug in Series.reset_index(drop=True) where the result name may be wrong.
  • Fixed a bug in Groupby.first/last ordering by the correct columns in the underlying window expression.

Release

17 Jul 21:33
Compare
Choose a tag to compare

1.20.0 (2024-07-17)

Snowpark Python API Updates

Improvements

  • Added distributed tracing using open telemetry APIs for table stored procedure function in DataFrame:
    • _execute_and_get_query_id
  • Added support for the arrays_zip function.
  • Improves performance for binary column expression and df._in by avoiding unnecessary cast for numeric values. You can enable this optimization by setting session.eliminate_numeric_sql_value_cast_enabled = True.
  • Improved error message for write_pandas when the target table does not exist and auto_create_table=False.
  • Added open telemetry tracing on UDxF functions in Snowpark.
  • Added open telemetry tracing on stored procedure registration in Snowpark.
  • Added a new optional parameter called format_json to the Session.SessionBuilder.app_name function that sets the app name in the Session.query_tag in JSON format. By default, this parameter is set to False.

Bug Fixes

  • Fixed a bug where SQL generated for lag(x, 0) was incorrect and failed with error message argument 1 to function LAG needs to be constant, found 'SYSTEM$NULL_TO_FIXED(null)'.

Snowpark Local Testing Updates

New Features

  • Added support for the following APIs:
    • snowflake.snowpark.functions
      • random
  • Added new parameters to patch function when registering a mocked function:
    • distinct allows an alternate function to be specified for when a sql function should be distinct.
    • pass_column_index passes a named parameter column_index to the mocked function that contains the pandas.Index for the input data.
    • pass_row_index passes a named parameter row_index to the mocked function that is the 0 indexed row number the function is currently operating on.
    • pass_input_data passes a named parameter input_data to the mocked function that contains the entire input dataframe for the current expression.
    • Added support for the column_order parameter to method DataFrameWriter.save_as_table.

Bug Fixes

  • Fixed a bug that caused DecimalType columns to be incorrectly truncated to integer precision when used in BinaryExpressions.

Snowpark pandas API Updates

New Features

  • Added support for DataFrameGroupBy.all, SeriesGroupBy.all, DataFrameGroupBy.any, and SeriesGroupBy.any.
  • Added support for DataFrame.nlargest, DataFrame.nsmallest, Series.nlargest and Series.nsmallest.
  • Added support for replace and frac > 1 in DataFrame.sample and Series.sample.
  • Added support for read_excel (Uses local pandas for processing)
  • Added support for Series.at, Series.iat, DataFrame.at, and DataFrame.iat.
  • Added support for Series.dt.isocalendar.
  • Added support for Series.case_when except when condition or replacement is callable.
  • Added documentation pages for Index and its APIs.
  • Added support for DataFrame.assign.
  • Added support for DataFrame.stack.
  • Added support for DataFrame.pivot and pd.pivot.
  • Added support for DataFrame.to_csv and Series.to_csv.
  • Added partial support for Series.str.translate where the values in the table are single-codepoint strings.
  • Added support for DataFrame.corr.
  • Allow df.plot() and series.plot() to be called, materializing the data into the local client
  • Added support for DataFrameGroupBy and SeriesGroupBy aggregations first and last
  • Added support for DataFrameGroupBy.get_group.
  • Added support for limit parameter when method parameter is used in fillna.
  • Added partial support for Series.str.translate where the values in the table are single-codepoint strings.
  • Added support for DataFrame.corr.
  • Added support for DataFrame.equals and Series.equals.
  • Added support for DataFrame.reindex and Series.reindex.
  • Added support for Index.astype.
  • Added support for Index.unique and Index.nunique.

Bug Fixes

  • Fixed an issue when using np.where and df.where when the scalar 'other' is the literal 0.
  • Fixed a bug regarding precision loss when converting to Snowpark pandas DataFrame or Series with dtype=np.uint64.
  • Fixed bug where values is set to index when index and columns contain all columns in DataFrame during pivot_table.

Improvements

  • Added support for Index.copy()
  • Added support for Index APIs: dtype, values, item(), tolist(), to_series() and to_frame()
  • Expand support for DataFrames with no rows in pd.pivot_table and DataFrame.pivot_table.
  • Added support for inplace parameter in DataFrame.sort_index and Series.sort_index.

Release

26 Jun 14:49
0c81e76
Compare
Choose a tag to compare

1.19.0 (2024-06-25)

Snowpark Python API Updates

Improvements

New Features

  • Added support for to_boolean function.
  • Added documentation pages for Index and its APIs.

Bug Fixes

  • Fixed a bug where python stored procedure with table return type fails when run in a task.
  • Fixed a bug where df.dropna fails due to RecursionError: maximum recursion depth exceeded when the DataFrame has more than 500 columns.
  • Fixed a bug where AsyncJob.result("no_result") doesn't wait for the query to finish execution.

Snowpark Local Testing Updates

New Features

  • Added support for the strict parameter when registering UDFs and Stored Procedures.

Bug Fixes

  • Fixed a bug in convert_timezone that made the setting the source_timezone parameter return an error.
  • Fixed a bug where creating DataFrame with empty data of type DateType raises AttributeError.
  • Fixed a bug that table merge fails when update clause exists but no update takes place.
  • Fixed a bug in mock implementation of to_char that raises IndexError when incoming column has nonconsecutive row index.
  • Fixed a bug in handling of CaseExpr expressions that raises IndexError when incoming column has nonconsecutive row index.
  • Fixed a bug in implementation of Column.like that raises IndexError when incoming column has nonconsecutive row index.

Improvements

  • Added support for type coercion in the implementation of DataFrame.replace, DataFrame.dropna and the mock function iff.

Snowpark pandas API Updates

New Features

  • Added partial support for DataFrame.pct_change and Series.pct_change without the freq and limit parameters.
  • Added support for Series.str.get.
  • Added support for Series.dt.dayofweek, Series.dt.day_of_week, Series.dt.dayofyear, and Series.dt.day_of_year.
  • Added support for Series.str.__getitem__ (Series.str[...]).
  • Added support for Series.str.lstrip and Series.str.rstrip.
  • Added support for DataFrameGroupby.size and SeriesGroupby.size.
  • Added support for DataFrame.expanding and Series.expanding for aggregations count, sum, min, max, mean, std, and var with axis=0.
  • Added support for DataFrame.rolling and Series.rolling for aggregation count with axis=0.
  • Added support for Series.str.match.
  • Added support for DataFrame.resample and Series.resample for aggregation size.

Bug Fixes

  • Fixed a bug that causes output of GroupBy.aggregate's columns to be ordered incorrectly.
  • Fixed a bug where DataFrame.describe on a frame with duplicate columns of differing dtypes could cause an error or incorrect results.
  • Fixed a bug in DataFrame.rolling and Series.rolling so window=0 now throws NotImplementedError instead of ValueError

Improvements

  • Added support for named aggregations in DataFrame.aggregate and Series.aggregate with axis=0.
  • pd.read_csv reads using the native pandas CSV parser, then uploads data to snowflake using parquet. This enables most of the parameters supported by read_csv including date parsing and numeric conversions. Uploading via parquet is roughly twice as fast as uploading via CSV.
  • Initial work to support an pd.Index directly in Snowpark pandas. Support for pd.Index as a first-class component of Snowpark pandas is coming soon.
  • Added a lazy index constructor and support for len, shape, size, empty, to_pandas() and names. For df.index, Snowpark pandas creates a lazy index object.
  • For df.columns, Snowpark pandas supports a non-lazy version of an Index since the data is already stored locally.

Release

28 May 23:33
Compare
Choose a tag to compare

1.18.0 (2024-05-28)

Snowpark pandas API Updates

New Features

  • Added DataFrame.cache_result and Series.cache_result methods for users to persist DataFrames and Series to a temporary table lasting the duration of the session to improve latency of subsequent operations.

Improvements

  • Added partial support for DataFrame.pivot_table with no index parameter, as well as for margins parameter.
  • Updated the signature of DataFrame.shift/Series.shift/DataFrameGroupBy.shift/SeriesGroupBy.shift to match pandas 2.2.1. Snowpark pandas does not yet support the newly-added suffix argument, or sequence values of periods.
  • Re-added support for Series.str.split.

Bug Fixes

  • Fixed how we support mixed columns for string methods (Series.str.*).

Snowpark Local Testing Updates

New Features

  • Added support for the following DataFrameReader read options to file formats csv and json:
    • PURGE
    • PATTERN
    • INFER_SCHEMA with value being False
    • ENCODING with value being UTF8
  • Added support for DataFrame.analytics.moving_agg and DataFrame.analytics.cumulative_agg_agg.
  • Added support for if_not_exists parameter during UDF and stored procedure registration.

Bug Fixes

  • Fixed a bug that when processing time format, fractional second part is not handled properly.
  • Fixed a bug that caused function calls on * to fail.
  • Fixed a bug that prevented creation of map and struct type objects.
  • Fixed a bug that function date_add was unable to handle some numeric types.
  • Fixed a bug that TimestampType casting resulted in incorrect data.
  • Fixed a bug that caused DecimalType data to have incorrect precision in some cases.
  • Fixed a bug where referencing missing table or view raises confusing IndexError.
  • Fixed a bug that mocked function to_timestamp_ntz can not handle None data.
  • Fixed a bug that mocked UDFs handles output data of None improperly.
  • Fixed a bug where DataFrame.with_column_renamed ignores attributes from parent DataFrames after join operations.
  • Fixed a bug that integer precision of large value gets lost when converted to pandas DataFrame.
  • Fixed a bug that the schema of datetime object is wrong when create DataFrame from a pandas DataFrame.
  • Fixed a bug in the implementation of Column.equal_nan where null data is handled incorrectly.
  • Fixed a bug where DataFrame.drop ignore attributes from parent DataFrames after join operations.
  • Fixed a bug in mocked function date_part where Column type is set wrong.
  • Fixed a bug where DataFrameWriter.save_as_table does not raise exceptions when inserting null data into non-nullable columns.
  • Fixed a bug in the implementation of DataFrameWriter.save_as_table where
    • Append or Truncate fails when incoming data has different schema than existing table.
    • Truncate fails when incoming data does not specify columns that are nullable.

Improvements

  • Removed dependency check for pyarrow as it is not used.
  • Improved target type coverage of Column.cast, adding support for casting to boolean and all integral types.
  • Aligned error experience when calling UDFs and stored procedures.
  • Added appropriate error messages for is_permanent and anonymous options in UDFs and stored procedures registration to make it more clear that those features are not yet supported.
  • File read operation with unsupported options and values now raises NotImplementedError instead of warnings and unclear error information.

Release

21 May 22:31
afa4433
Compare
Choose a tag to compare

1.17.0 (2024-05-21)

Snowpark Python API Updates

New Features

  • Added support to add a comment on tables and views using the functions listed below:
    • DataFrameWriter.save_as_table
    • DataFrame.create_or_replace_view
    • DataFrame.create_or_replace_temp_view
    • DataFrame.create_or_replace_dynamic_table

Improvements

  • Improved error message to remind users to set {"infer_schema": True} when reading CSV file without specifying its schema.

Snowpark pandas API Updates

New Features

Snowpark Local Testing Updates

New Features

  • Added support for NumericType and VariantType data conversion in the mocked function to_timestamp_ltz, to_timestamp_ntz, to_timestamp_tz and to_timestamp.
  • Added support for DecimalType, BinaryType, ArrayType, MapType, TimestampType, DateType and TimeType data conversion in the mocked function to_char.
  • Added support for the following APIs:
    • snowflake.snowpark.functions:
      • to_varchar
    • snowflake.snowpark.DataFrame:
      • pivot
    • snowflake.snowpark.Session:
      • cancel_all
  • Introduced a new exception class snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException.
  • Added support for casting to FloatType

Bug Fixes

  • Fixed a bug that stored procedure and UDF should not remove imports already in the sys.path during the clean-up step.
  • Fixed a bug that when processing datetime format, the fractional second part is not handled properly.
  • Fixed a bug that on Windows platform that file operations was unable to properly handle file separator in directory name.
  • Fixed a bug that on Windows platform that when reading a pandas dataframe, IntervalType column with integer data can not be processed.
  • Fixed a bug that prevented users from being able to select multiple columns with the same alias.
  • Fixed a bug that Session.get_current_[schema|database|role|user|account|warehouse] returns upper-cased identifiers when identifiers are quoted.
  • Fixed a bug that function substr and substring can not handle 0-based start_expr.

Improvements

  • Standardized the error experience by raising SnowparkLocalTestingException in error cases which is on par with SnowparkSQLException raised in non-local execution.
  • Improved error experience of Session.write_pandas method that NotImplementError will be raised when called.
  • Aligned error experience with reusing a closed session in non-local execution.

Release

08 May 17:57
71c8ea4
Compare
Choose a tag to compare

1.16.0 (2024-05-07)

New Features

  • Added snowflake.snowpark.Session.lineage.trace to explore data lineage of Snowflake objects.
  • Support stored procedure registration with packages given as Python modules.
  • Added support for structured type schema parsing.

Bug Fixes

  • Fixed a bug that when inferring a schema, single quotes are added to stage files that already have single quotes.

Local Testing Updates

New Features

  • Added support for StringType, TimestampType and VariantType data conversion in the mocked function to_date.
  • Added support for the following APIs:
    • snowflake.snowpark.functions
      • get
      • concat
      • concat_ws

Bug Fixes

  • Fixed a bug that caused NaT and NaN values to not be recognized.
  • Fixed a bug when inferring schema, single quotes are added to stage files already have single quotes.
  • Fixed a bug where DataFrameReader.csv was unable to handle quoted values containing a delimiter.
  • Fixed a bug that when there is None value in an arithmetic calculation, the output should remain None instead of math.nan.
  • Fixed a bug in function sum and covar_pop that when there is math.nan in the data, the output should also be math.nan.
  • Fixed a bug that stage operation can not handle directories.
  • Fixed a bug that DataFrame.to_pandas should take Snowflake numeric types with precision 38 as int64.

Release

24 Apr 19:22
3a5c7b0
Compare
Choose a tag to compare

1.15.0 (2024-04-24)

New Features

  • Added truncate save mode in DataFrameWrite to overwrite existing tables by truncating the underlying table instead of dropping it.
  • Added telemetry to calculate query plan height and number of duplicate nodes during collect operations.
  • Added the functions below to unload data from a DataFrame into one or more files in a stage:
    • DataFrame.write.json
    • DataFrame.write.csv
    • DataFrame.write.parquet
  • Added distributed tracing using open telemetry APIs for action functions in DataFrame and DataFrameWriter:
    • snowflake.snowpark.DataFrame:
      • collect
      • collect_nowait
      • to_pandas
      • count
      • show
    • snowflake.snowpark.DataFrameWriter:
      • save_as_table
  • Added support for snow:// URLs to snowflake.snowpark.Session.file.get and snowflake.snowpark.Session.file.get_stream
  • Added support to register stored procedures and UDxFs with a comment.
  • UDAF client support is ready for public preview. Please stay tuned for the Snowflake announcement of UDAF public preview.
  • Added support for dynamic pivot. This feature is currently in private preview.

Improvements

  • Improved the generated query performance for both compilation and execution by converting duplicate subqueries to Common Table Expressions (CTEs). It is still an experimental feature not enabled by default, and can be enabled by setting session.cte_optimization_enabled to True.

Bug Fixes

  • Fixed a bug where statement_params was not passed to query executions that register stored procedures and user defined functions.
  • Fixed a bug causing snowflake.snowpark.Session.file.get_stream to fail for quoted stage locations.
  • Fixed a bug that an internal type hint in utils.py might raise AttributeError in case the underlying module can not be found.

Local Testing Updates

New Features

  • Added support for registering UDFs and stored procedures.
  • Added support for the following APIs:
    • snowflake.snowpark.Session:
      • file.put
      • file.put_stream
      • file.get
      • file.get_stream
      • read.json
      • add_import
      • remove_import
      • get_imports
      • clear_imports
      • add_packages
      • add_requirements
      • clear_packages
      • remove_package
      • udf.register
      • udf.register_from_file
      • sproc.register
      • sproc.register_from_file
    • snowflake.snowpark.functions
      • current_database
      • current_session
      • date_trunc
      • object_construct
      • object_construct_keep_null
      • pow
      • sqrt
      • udf
      • sproc
  • Added support for StringType, TimestampType and VariantType data conversion in the mocked function to_time.

Bug Fixes

  • Fixed a bug that null filled columns for constant functions.
  • Fixed a bug that implementation of to_object, to_array and to_binary to better handle null inputs.
  • Fixed a bug that timestamp data comparison can not handle year beyond 2262.
  • Fixed a bug that Session.builder.getOrCreate should return the created mock session.

Release

21 Mar 17:28
6906e56
Compare
Choose a tag to compare

1.14.0 (2024-03-20)

New Features

  • Added support for creating vectorized UDTFs with process method.
  • Added support for dataframe functions:
    • to_timestamp_ltz
    • to_timestamp_ntz
    • to_timestamp_tz
    • locate
  • Added support for ASOF JOIN type.
  • Added support for the following local testing APIs:
    • snowflake.snowpark.functions:
      • to_double
      • to_timestamp
      • to_timestamp_ltz
      • to_timestamp_ntz
      • to_timestamp_tz
      • greatest
      • least
      • convert_timezone
      • dateadd
      • date_part
    • snowflake.snowpark.Session:
      • get_current_account
      • get_current_warehouse
      • get_current_role
      • use_schema
      • use_warehouse
      • use_database
      • use_role

Bug Fixes

  • Fixed a bug in SnowflakePlanBuilder that save_as_table does not filter column that name start with '$' and follow by number correctly.
  • Fixed a bug that statement parameters may have no effect when resolving imports and packages.
  • Fixed bugs in local testing:
    • LEFT ANTI and LEFT SEMI joins drop rows with null values.
    • DataFrameReader.csv incorrectly parses data when the optional parameter field_optionally_enclosed_by is specified.
    • Column.regexp only considers the first entry when pattern is a Column.
    • Table.update raises KeyError when updating null values in the rows.
    • VARIANT columns raise errors at DataFrame.collect.
    • count_distinct does not work correctly when counting.
    • Null values in integer columns raise TypeError.

Improvements

  • Added telemetry to local testing.
  • Improved the error message of DataFrameReader to raise FileNotFound error when reading a path that does not exist or when there are no files under the path.