Skip to content

Releases: pegasystems/pega-datascientist-tools

Pega Data Scientist Tools V3.0

17 Mar 12:56
f1429a1
Compare
Choose a tag to compare

A new major version with major changes!

Highlights:

  • Pdstools now uses Polars as the backend, replacing Pandas. See this article for a summary of the changes
  • The crowd favorite ADM Healthcheck has been fully ported over to Python, alongside a streamlit app. Simply call pdstools run in your terminal to get started!
  • The matplotlib plots have been deprecated, and only plotly is supported. Plots that were matplotlib only have been removed.
  • Added data anonymization tools, see this article for more information

Other changes

  • Minimum Python version is now bumped to 3.8
  • The new Polars backend touched almost all areas of the codebase. All plot functions, backend functions and aggregations have been ported over.
  • cdh_utils & ADMDatamart imports return pl.LazyFrames by default
  • Overwrite mapping functionality is removed. If you need the legacy functionality, you can manually read in the data as pl.lazyFrames, and then call .rename()
  • ModelName is renamed to Name for consistency
  • ADMDatamart keyword arguments have been added to the main class signature, making them easier to find & use
  • query arguments should now use pl.Expr for querying, keeping the lazy execution path alive
  • extract_treatment has been renamed to extract_keys, and is now just boolean. If True, will extract all extra keys in pyName
  • Added last_ResponseCount and last_Positives columns, indicating the last timestamp either of these columns increased. This is useful for estimating wether an action has stopped getting responses, therefore being turned to inactive
  • Added a save_data() method to the ADMDatamart class, that will save the modelData and predictorData to local files
  • Updated docstrings & tests to be consistent and up-to-date
  • Added a FeatureImportance function, closing #49

New Contributors

Full Changelog: V2.2...V3.0

Pega Data Scientist Tools V2.2

31 Aug 14:23
a02764f
Compare
Choose a tag to compare

As you may have noticed, we've rebranded! CDH Tools is now pdstools. We've done our best to update all references to the old name and URL, but if you happen to come across any broken links please do let us know. And while GitHub seems to do quite a good job at redirecting the old URL to the new one, it is probably wise to update all locally saved links as well.

Apart from the name change, the Python tools have some other noteworthy changes:

  • PyPI listing: pdstools is now listed on PyPI! Installation is now as simple as pip install pdstools.
  • Support for Python version 3.6 deprecated, now supports Python 3.7 and up
  • New value finder tools
  • Support for reading AGB models from the datamart
  • Ability to scan for AGB models in the datamart
  • Support for reading BytesIO files
  • General improvements to data imports, including logging
  • Removed dependency on sklearn
  • Support for feature importance in certain plots

Full Changelog: V2.1...V2.2

V2.1: Analyze ADM Trees

25 May 11:32
9e191a5
Compare
Choose a tag to compare

V2.1: Analyze ADM Trees

This release introduces a new class: ADMTrees. With ADMTrees you can analyze and visualize ADM Gradient boosting models.

Some new features include:

  • Analyze predictors and their splits
  • Visualize individual trees within the ensemble
  • Visualize the prediction path of the model, given input data
  • Replicate the scoring of the model
  • Visualize the contribution of each tree towards the final score

For inspiration on how to use ADMTrees, check out this example: https://pegasystems.github.io/cdh-datascientist-tools/Python/articles/AGBModelVisualisation.html. There you can also find the API reference documentation.

Python CDH Tools 2.0

30 Mar 14:39
Compare
Choose a tag to compare

This version brings many major improvements to the Python version of CDH Tools. Please see below for a quick summary:

  • This version now supports loading cdh tools without having to clone the entire Github repository simply by running the following command:
    • pip3 install git+https://github.com/pegasystems/cdh-datascientist-tools.git
    • It is then possible to import the ADMDatamart class with the following syntax:
      from cdhtools import ADMDatamart
  • For quickly testing things out, you can import the CDH Sample dataset with a simple command. You can import and use it as such:
from cdhtools import datasets 
Sample = datasets.CDHSample()
Sample.plotPerformanceSuccessRateBubbleChart()

See also Example_ADM_Analysis.ipynb in examples/datamart, where it is used as well.

  • An additional plotting library is now supported: Plotly. It is chosen by default, but to revert back to matplotlib simply give the argument 'plotting_engine = "mpl"' to either the ADMDatamart class initialization or an individual plotting function.
  • New visualisations were also added with the introduction of Plotly: Treemap, ModelsByPositives, OverTime & ResponseGain.
  • There is now a Python plot gallery, you can find it under examples/plot_gallery.
  • Unit tests are now added, improving reliability.
  • Documentation is much improved - you can refer to either ADMDatamart.py or plot_base.py for information about the purpose and arguments for each function.
  • With Plotly, facetting is now much easier as well. Simply supply the 'facets' arguments with a list of context keys to facet on and, for compatible plots, different facets will be created.
  • It is now possible to easily extract the treatment out of the pyName column with the 'extract_treatment' argument to the ADMDatamart class. Example: ADMDatamart('data', extract_treatment='pyName').
  • Various bugfixes, such as SettingWithCopyWarning errors, a minor miscalculation in getting the latest predictors and a new way to get the latest file by looking at the timestamp of the zip file names.