Skip to content

Releases: pegasystems/pega-datascientist-tools

Pega Data Scientist Tools V4

09 Dec 15:47
Compare
Choose a tag to compare

This release of pdstools is a big cleanup from version 3. A lot of changes are breaking - but that's for the best: pdstools is now much easier to maintain, new functionality has a more logical place to go, and the API should be a lot more intuitive. The goal is for the initial V4 release to contain most of the breaking API changes we foresee in a long time. Then, we can of course still change the inner functionality and/or add new functions - but hopefully the most important function schemas/API don't need more changes anytime soon.

✨Highlights

  • Farewell R - you've served us well, but pdstools is now Python only
  • Introducing the Pega DX API Client
    • Starting out with support for the 24.2 Prediction Studio and Knowledge Buddy APIs
  • Major refactor of the entire codebase: consistent python naming, optional dependency groups, well-defined typehints

❌Deprecations/removals

  • The R version of pdstools has been removed. In case you still want to use the R tools, you should manually clone the repo at the V3.x tag.
  • The legacy IH utilities have been dropped. These were old parts of the codebase and untested/unused. New IH utilities are on their way!
  • The Wiki documentation has been ported to the (tracked) Python documentation. We'll deprecate the wiki, but keep it live to give external links some time to link to the documentation instead.

🔨Changes

  • Consistent pythonic casing, meaning PascalCase for classes & snake_case for methods, variables & arguments
  • Much improved typehints, so it's much more obvious what the response of a given function will be
  • Fewer 'base' dependencies; different functionality is split up into 'namespaces' that all have their own set of requirements
    • The first time you invoke a method in a 'namespace', it verifies the dependencies and gives a clear warning if any are missing
  • To expand on the previous point: functionality is split up much more logically. Taking the ADMDatamart class as an example:
    • Plotting functionality is part of ADMDatamart.plot.bubble_chart() (or any other plot of course)
    • The health check and other reports are part of ADMDatamart.generate.health_check() (for instance)
    • The intermediate aggregations needed are part of ADMDatamart.aggregations.pivot() (for instance)
  • Using classmethods, we can initialize the ADMDatamart class in particular in a much more flexible way.
    • The main __init__ method of the ADMDatamart class is very simple: it expects two polars.LazyFrames; one for model_data and one for prediction_data. If you've already read in your data, simply use this
    • If, instead, you want to use the previous functionality which automatically found the most recent file in a folder, you should initialize the datamart class like ADMDatamart.from_ds_export()
    • Or, if instead, you are consuming the results of a data flow (including the OOTB Prediction Studio export), you can simply initialize the datamart class like ADMDatamart.from_dataflow_export(model_data="pattern_for_model_files*.json", predictor_data="pattern_for_predictor_files*.json"). We can also cache the files we've read in before by writing to a 'cache' file automatically - this makes things move quickly. This closes #205 as well.

Full Changelog: V3.5.2...V4.0.0

Pdstools V4 beta 1

19 Nov 11:43
Compare
Choose a tag to compare
Pdstools V4 beta 1 Pre-release
Pre-release

V4 brings some pretty major (and necessary) changes. A lot of them are, unfortunately, breaking - but it's for the best. pdstools is now much easier to maintain and keep consistent, and new functionality now has a much more logical place to go.

The goal is for the initial V4 release to contain most of the breaking (API-centric) changes we foresee in a long time. Then, we can of course still change the inner functionality and/or add new functions - but hopefully the most important function schemas/API don't need more changes anytime soon.

✨Highlights

  • Farewell R - you've served us well, but pdstools is now Python only
  • Introducing the Pega DX API Client
    • Starting out with support for the 24.2 Prediction Studio and Knowledge Buddy APIs
  • Major refactor of the entire codebase: consistent python naming, optional dependency groups, well-defined typehints

❌Deprecations/removals

  • The R version of pdstools has been removed. In case you still want to use the R tools, you should manually clone the repo at the V3.x tag.
  • The legacy IH utilities have been dropped. These were old parts of the codebase and untested/unused. New IH utilities are on their way!
  • The Wiki documentation has been ported to the (tracked) Python documentation. We'll deprecate the wiki, but keep it live to give external links some time to link to the documentation instead.

🔨Changes

  • Consistent pythonic casing, meaning PascalCase for classes & snake_case for methods, variables & arguments
  • Much improved typehints, so it's much more obvious what the response of a given function will be
  • Fewer 'base' dependencies; different functionality is split up into 'namespaces' that all have their own set of requirements
    • The first time you invoke a method in a 'namespace', it verifies the dependencies and gives a clear warning if any are missing
  • To expand on the previous point: functionality is split up much more logically. Taking the ADMDatamart class as an example:
    • Plotting functionality is part of ADMDatamart.plot.bubble_chart() (or any other plot of course)
    • The health check and other reports are part of ADMDatamart.generate.health_check() (for instance)
    • The intermediate aggregations needed are part of ADMDatamart.aggregations.pivot() (for instance)
  • Using classmethods, we can initialize the ADMDatamart class in particular in a much more flexible way.
    • The main __init__ method of the ADMDatamart class is very simple: it expects two polars.LazyFrames; one for model_data and one for prediction_data. If you've already read in your data, simply use this
    • If, instead, you want to use the previous functionality which automatically found the most recent file in a folder, you should initialize the datamart class like ADMDatamart.from_ds_export()
    • Or, if instead, you are consuming the results of a data flow (including the OOTB Prediction Studio export), you can simply initialize the datamart class like ADMDatamart.from_dataflow_export(model_data="pattern_for_model_files*.json", predictor_data="pattern_for_predictor_files*.json"). We can also cache the files we've read in before by writing to a 'cache' file automatically - this makes things move quickly. This closes #205 as well.

Todo before release:

Full Changelog: V4.0.0-alpha.1...V4.0.0-beta.1

Pdstools V4 alpha 1

30 Oct 17:17
5a42b5c
Compare
Choose a tag to compare
Pdstools V4 alpha 1 Pre-release
Pre-release

V4 brings some pretty major (and necessary) changes. A lot of them are, unfortunately, breaking - but it's for the best. pdstools is now much easier to maintain and keep consistent, and new functionality now has a much more logical place to go.

The goal is for the initial V4 release to contain most of the breaking (API-centric) changes we foresee in a long time. Then, we can of course still change the inner functionality and/or add new functions - but hopefully the most important function schemas/API don't need more changes anytime soon.

✨Highlights

  • Farewell R - you've served us well, but pdstools is now Python only
  • Introducing the Pega DX API Client
    • Starting out with support for the 24.2 Prediction Studio and Knowledge Buddy APIs
  • Major refactor of the entire codebase: consistent python naming, optional dependency groups, well-defined typehints

❌Deprecations/removals

  • The R version of pdstools has been removed. In case you still want to use the R tools, you should manually clone the repo at the V3.x tag.
  • The legacy IH utilities have been dropped. These were old parts of the codebase and untested/unused. New IH utilities are on their way!

🔨Changes

  • Consistent pythonic casing, meaning PascalCase for classes & snake_case for methods, variables & arguments
  • Much improved typehints, so it's much more obvious what the response of a given function will be
  • Fewer 'base' dependencies; different functionality is split up into 'namespaces' that all have their own set of requirements
    • The first time you invoke a method in a 'namespace', it verifies the dependencies and gives a clear warning if any are missing
  • To expand on the previous point: functionality is split up much more logically. Taking the ADMDatamart class as an example:
    • Plotting functionality is part of ADMDatamart.plot.bubble_chart() (or any other plot of course)
    • The health check and other reports are part of ADMDatamart.generate.health_check() (for instance)
    • The intermediate aggregations needed are part of ADMDatamart.aggregations.pivot() (for instance)
  • Using classmethods, we can initialize the ADMDatamart class in particular in a much more flexible way.
    • The main __init__ method of the ADMDatamart class is very simple: it expects two polars.LazyFrames; one for model_data and one for prediction_data. If you've already read in your data, simply use this
    • If, instead, you want to use the previous functionality which automatically found the most recent file in a folder, you should initialize the datamart class like ADMDatamart.from_ds_export()
    • Or, if instead, you are consuming the results of a data flow (including the OOTB Prediction Studio export), you can simply initialize the datamart class like ADMDatamart.from_dataflow_export(model_data="pattern_for_model_files*.json", predictor_data="pattern_for_predictor_files*.json"). We can also cache the files we've read in before by writing to a 'cache' file automatically - this makes things move quickly. This closes #205 as well.

Todo before release:

Pega Data Scientist Tools V3.5.0: Polars V1

08 Oct 10:24
7c91518
Compare
Choose a tag to compare

While we've been hard at work creating version 4 of pdstools, I wanted to get one last release for V3 out of the way.

V4 brings some pretty sizable changes; we'll deprecate the R tools and fully rework all python classes to make them more consistent & maintainable, including renaming pretty much all classes & methods. Since things will be breaking, it may be desirable to sometimes fall back to V3 while transitioning. However, our V3 branch was falling out of date mainly because we were tied to Polars < 1. This minor update brings Polars V1 support. It does require Polars > 1.9 as we were facing a ipc serialization bug in earlier versions. We likely will not support the V3 branch out into the future, but if necessary we could accept small bug fixes down the line.

What's Changed

  • Make Quarto Render OS-Agnostic and Add Version Info to Health Check Logs by @yusufuyanik1 in #257
  • Fixing issue with inconsitent coloring and ordering by @operdeck in #261
  • [WIP] Compatibility with Polars V1 by @StijnKas in #233

Full Changelog: V3.4.7...V3.5.0

Pega Data Scientist Tools V3.4.6

24 Jul 15:03
Compare
Choose a tag to compare

Enhancement and Bug Fixes

This patch release, V3.4.6, includes multiple enhancements and bug fixes aimed at improving the functionality and user experience
Key updates include Improved logging for Health Check(#250), handling empty data frames gracefully, prediction analysis improvements and more.

What's Changed

Full Changelog: V3.4.4...V3.4.6

Pega Data Scientist Tools V3.4: Binning Insights

06 Mar 15:40
7b3dc6b
Compare
Choose a tag to compare

Rolling up the ADM bins

This release adds additional insights from the ADM binning data, letting you find information on predictors across models and channels!

Check out the explainer article here.

What's Changed

Full Changelog: V3.3...V3.4

Pega Data Scientist Tools V3.3

05 Jan 13:25
Compare
Choose a tag to compare

HealthCheck App Changes

In this release, we've primarily focused on improving the HealthCheck app, making it more powerful and user-friendly.

  • Beyond the comprehensive global HealthCheck report, you now have the capability to generate individual model reports. These reports allow you to delve into the performance of a specific model, providing an in-depth view of predictors and their individual effects on propensity.
  • You can now run HealthCheck in the cloud, directly from GitHub without the need to install any tools. A Github codespace is a development environment that's hosted in the cloud. Each codespace you create is hosted by GitHub in a Docker container, running on a virtual machine. See the Wiki for all the possible ways to run the ADM HealthCheck.
  • If our out-of-the-box report isn't quite what you're looking for, you can now export the latest Datamart Snapshot in Excel format, empowering you to perform your own custom analysis.
  • You can save your filters and upload them later to avoid repetition.

Code Changes

  • Added support for Python 3.12
  • Aligned with performance improvements coming with the latest version of polars

What's Changed

Full Changelog: V3.2...V3.3

Pega Data Scientist Tools V3.2

28 Jul 14:39
0804c03
Compare
Choose a tag to compare

What's Changed

Nothing major in this release, but a lot of bugfixes and versioning compatibilities. For an overview of pull requests:

Full Changelog: V3.1...V3.2

Pega Data Scientist Tools V3.1

10 Apr 15:04
Compare
Choose a tag to compare

In case you haven’t seen it yet, V3.0 brought many important changes. V3.1 is a minor release, but brings some nice usability changes:

What’s new?

  • Rebuilt the pdstools app to be much more user-friendly and easy to use
  • Added a models-only Health Check
  • Added a Tables class to generate tables and export them to Excel
  • Allow for multiple pl.Exprs in ._apply_query
  • Added Thompson Sampling & ADM Explained articles to the Python docs
  • Added plotPredictorContribution to the main plots
  • Basic S3 tools, for reading Pega Repository datasets, including get_ADMDatamart to get the datamart directly from S3
  • pdstools.show_versions allows you to easily get the versions of installed packages
  • Issue templates to more easily and clearly define gh issues

What’s changed?

  • Moved Health Check generation responsibility to the ADMDatamart class
  • Moved Health Check files to reports
  • Updated Value Finder to a more streamlined implementation
  • Separated IO into pega_io
  • Support for more OOTB timestamp formats
  • Fixed a bug in getMultiTrees that caused different models to not separate properly
  • Fixed compatibility with Polars version 0.17

Technical improvements

  • Automated documentation builds & deployment
  • Automated pypi releases
  • Automated tests for Health Check
  • Fixed tests not being found by VS Code

Full Changelog: V3.0...V3.1

Pega Data Scientist Tools V3.0

17 Mar 12:56
f1429a1
Compare
Choose a tag to compare

A new major version with major changes!

Highlights:

  • Pdstools now uses Polars as the backend, replacing Pandas. See this article for a summary of the changes
  • The crowd favorite ADM Healthcheck has been fully ported over to Python, alongside a streamlit app. Simply call pdstools run in your terminal to get started!
  • The matplotlib plots have been deprecated, and only plotly is supported. Plots that were matplotlib only have been removed.
  • Added data anonymization tools, see this article for more information

Other changes

  • Minimum Python version is now bumped to 3.8
  • The new Polars backend touched almost all areas of the codebase. All plot functions, backend functions and aggregations have been ported over.
  • cdh_utils & ADMDatamart imports return pl.LazyFrames by default
  • Overwrite mapping functionality is removed. If you need the legacy functionality, you can manually read in the data as pl.lazyFrames, and then call .rename()
  • ModelName is renamed to Name for consistency
  • ADMDatamart keyword arguments have been added to the main class signature, making them easier to find & use
  • query arguments should now use pl.Expr for querying, keeping the lazy execution path alive
  • extract_treatment has been renamed to extract_keys, and is now just boolean. If True, will extract all extra keys in pyName
  • Added last_ResponseCount and last_Positives columns, indicating the last timestamp either of these columns increased. This is useful for estimating wether an action has stopped getting responses, therefore being turned to inactive
  • Added a save_data() method to the ADMDatamart class, that will save the modelData and predictorData to local files
  • Updated docstrings & tests to be consistent and up-to-date
  • Added a FeatureImportance function, closing #49

New Contributors

Full Changelog: V2.2...V3.0