Releases: pegasystems/pega-datascientist-tools
Pega Data Scientist Tools V3.0
A new major version with major changes!
Highlights:
- Pdstools now uses Polars as the backend, replacing Pandas. See this article for a summary of the changes
- The crowd favorite ADM Healthcheck has been fully ported over to Python, alongside a streamlit app. Simply call
pdstools run
in your terminal to get started! - The matplotlib plots have been deprecated, and only plotly is supported. Plots that were
matplotlib
only have been removed. - Added data anonymization tools, see this article for more information
Other changes
- Minimum Python version is now bumped to 3.8
- The new Polars backend touched almost all areas of the codebase. All plot functions, backend functions and aggregations have been ported over.
- cdh_utils & ADMDatamart imports return
pl.LazyFrame
s by default - Overwrite mapping functionality is removed. If you need the legacy functionality, you can manually read in the data as
pl.lazyFrame
s, and then call.rename()
ModelName
is renamed toName
for consistencyADMDatamart
keyword arguments have been added to the main class signature, making them easier to find & usequery
arguments should now usepl.Expr
for querying, keeping the lazy execution path aliveextract_treatment
has been renamed toextract_keys
, and is now just boolean. If True, will extract all extra keys inpyName
- Added
last_ResponseCount
andlast_Positives
columns, indicating the last timestamp either of these columns increased. This is useful for estimating wether an action has stopped getting responses, therefore being turned to inactive - Added a
save_data()
method to theADMDatamart
class, that will save themodelData
andpredictorData
to local files - Updated docstrings & tests to be consistent and up-to-date
- Added a
FeatureImportance
function, closing #49
New Contributors
- @shaniyahassanali made their first contribution in #70
- @yusufuyanik1 made their first contribution in #73
Full Changelog: V2.2...V3.0
Pega Data Scientist Tools V2.2
As you may have noticed, we've rebranded! CDH Tools
is now pdstools
. We've done our best to update all references to the old name and URL, but if you happen to come across any broken links please do let us know. And while GitHub seems to do quite a good job at redirecting the old URL to the new one, it is probably wise to update all locally saved links as well.
Apart from the name change, the Python tools have some other noteworthy changes:
- PyPI listing: pdstools is now listed on PyPI! Installation is now as simple as
pip install pdstools
. - Support for Python version 3.6 deprecated, now supports Python 3.7 and up
- New value finder tools
- Support for reading AGB models from the datamart
- Ability to scan for AGB models in the datamart
- Support for reading BytesIO files
- General improvements to data imports, including logging
- Removed dependency on sklearn
- Support for feature importance in certain plots
Full Changelog: V2.1...V2.2
V2.1: Analyze ADM Trees
V2.1: Analyze ADM Trees
This release introduces a new class: ADMTrees. With ADMTrees you can analyze and visualize ADM Gradient boosting models.
Some new features include:
- Analyze predictors and their splits
- Visualize individual trees within the ensemble
- Visualize the prediction path of the model, given input data
- Replicate the scoring of the model
- Visualize the contribution of each tree towards the final score
For inspiration on how to use ADMTrees, check out this example: https://pegasystems.github.io/cdh-datascientist-tools/Python/articles/AGBModelVisualisation.html. There you can also find the API reference documentation.
Python CDH Tools 2.0
This version brings many major improvements to the Python version of CDH Tools. Please see below for a quick summary:
- This version now supports loading cdh tools without having to clone the entire Github repository simply by running the following command:
pip3 install git+https://github.com/pegasystems/cdh-datascientist-tools.git
- It is then possible to import the ADMDatamart class with the following syntax:
from cdhtools import ADMDatamart
- For quickly testing things out, you can import the CDH Sample dataset with a simple command. You can import and use it as such:
from cdhtools import datasets
Sample = datasets.CDHSample()
Sample.plotPerformanceSuccessRateBubbleChart()
See also Example_ADM_Analysis.ipynb in examples/datamart, where it is used as well.
- An additional plotting library is now supported: Plotly. It is chosen by default, but to revert back to matplotlib simply give the argument 'plotting_engine = "mpl"' to either the ADMDatamart class initialization or an individual plotting function.
- New visualisations were also added with the introduction of Plotly: Treemap, ModelsByPositives, OverTime & ResponseGain.
- There is now a Python plot gallery, you can find it under examples/plot_gallery.
- Unit tests are now added, improving reliability.
- Documentation is much improved - you can refer to either ADMDatamart.py or plot_base.py for information about the purpose and arguments for each function.
- With Plotly, facetting is now much easier as well. Simply supply the 'facets' arguments with a list of context keys to facet on and, for compatible plots, different facets will be created.
- It is now possible to easily extract the treatment out of the pyName column with the 'extract_treatment' argument to the ADMDatamart class. Example: ADMDatamart('data', extract_treatment='pyName').
- Various bugfixes, such as SettingWithCopyWarning errors, a minor miscalculation in getting the latest predictors and a new way to get the latest file by looking at the timestamp of the zip file names.