10 Jan 14:30

operdeck

d163c8e

Latest

Minor version upgrade to 4.1

PDSTools now includes sample analysis scripts for IH (Interaction History) and basic support to ingest IH data. The examples include action distribution analysis, responses analysis, conversion analysis and model performance (e.g. NB vs AGB).

Assets 2

09 Dec 15:47

StijnKas

V4.0.0

23cd452

Pega Data Scientist Tools V4

This release of pdstools is a big cleanup from version 3. A lot of changes are breaking - but that's for the best: pdstools is now much easier to maintain, new functionality has a more logical place to go, and the API should be a lot more intuitive. The goal is for the initial V4 release to contain most of the breaking API changes we foresee in a long time. Then, we can of course still change the inner functionality and/or add new functions - but hopefully the most important function schemas/API don't need more changes anytime soon.

✨Highlights

Farewell R - you've served us well, but pdstools is now Python only
Introducing the Pega DX API Client
- Starting out with support for the 24.2 Prediction Studio and Knowledge Buddy APIs
Major refactor of the entire codebase: consistent python naming, optional dependency groups, well-defined typehints

❌Deprecations/removals

The R version of pdstools has been removed. In case you still want to use the R tools, you should manually clone the repo at the V3.x tag.
The legacy IH utilities have been dropped. These were old parts of the codebase and untested/unused. New IH utilities are on their way!
The Wiki documentation has been ported to the (tracked) Python documentation. We'll deprecate the wiki, but keep it live to give external links some time to link to the documentation instead.

🔨Changes

Consistent pythonic casing, meaning PascalCase for classes & snake_case for methods, variables & arguments
Much improved typehints, so it's much more obvious what the response of a given function will be
Fewer 'base' dependencies; different functionality is split up into 'namespaces' that all have their own set of requirements
- The first time you invoke a method in a 'namespace', it verifies the dependencies and gives a clear warning if any are missing
To expand on the previous point: functionality is split up much more logically. Taking the ADMDatamart class as an example:
- Plotting functionality is part of ADMDatamart.plot.bubble_chart() (or any other plot of course)
- The health check and other reports are part of ADMDatamart.generate.health_check() (for instance)
- The intermediate aggregations needed are part of ADMDatamart.aggregations.pivot() (for instance)
Using classmethods, we can initialize the ADMDatamart class in particular in a much more flexible way.
- The main __init__ method of the ADMDatamart class is very simple: it expects two polars.LazyFrames; one for model_data and one for prediction_data. If you've already read in your data, simply use this
- If, instead, you want to use the previous functionality which automatically found the most recent file in a folder, you should initialize the datamart class like ADMDatamart.from_ds_export()
- Or, if instead, you are consuming the results of a data flow (including the OOTB Prediction Studio export), you can simply initialize the datamart class like ADMDatamart.from_dataflow_export(model_data="pattern_for_model_files*.json", predictor_data="pattern_for_predictor_files*.json"). We can also cache the files we've read in before by writing to a 'cache' file automatically - this makes things move quickly. This closes #205 as well.

Full Changelog: V3.5.2...V4.0.0

Assets 2

0 Join discussion

19 Nov 11:43

StijnKas

V4.0.0-beta.1

ca255eb

Pdstools V4 beta 1 Pre-release

Pre-release

V4 brings some pretty major (and necessary) changes. A lot of them are, unfortunately, breaking - but it's for the best. pdstools is now much easier to maintain and keep consistent, and new functionality now has a much more logical place to go.

The goal is for the initial V4 release to contain most of the breaking (API-centric) changes we foresee in a long time. Then, we can of course still change the inner functionality and/or add new functions - but hopefully the most important function schemas/API don't need more changes anytime soon.

✨Highlights

Farewell R - you've served us well, but pdstools is now Python only
Introducing the Pega DX API Client
- Starting out with support for the 24.2 Prediction Studio and Knowledge Buddy APIs
Major refactor of the entire codebase: consistent python naming, optional dependency groups, well-defined typehints

❌Deprecations/removals

The R version of pdstools has been removed. In case you still want to use the R tools, you should manually clone the repo at the V3.x tag.
The legacy IH utilities have been dropped. These were old parts of the codebase and untested/unused. New IH utilities are on their way!
The Wiki documentation has been ported to the (tracked) Python documentation. We'll deprecate the wiki, but keep it live to give external links some time to link to the documentation instead.

🔨Changes

Consistent pythonic casing, meaning PascalCase for classes & snake_case for methods, variables & arguments
Much improved typehints, so it's much more obvious what the response of a given function will be
Fewer 'base' dependencies; different functionality is split up into 'namespaces' that all have their own set of requirements
- The first time you invoke a method in a 'namespace', it verifies the dependencies and gives a clear warning if any are missing
To expand on the previous point: functionality is split up much more logically. Taking the ADMDatamart class as an example:
- Plotting functionality is part of ADMDatamart.plot.bubble_chart() (or any other plot of course)
- The health check and other reports are part of ADMDatamart.generate.health_check() (for instance)
- The intermediate aggregations needed are part of ADMDatamart.aggregations.pivot() (for instance)
Using classmethods, we can initialize the ADMDatamart class in particular in a much more flexible way.
- The main __init__ method of the ADMDatamart class is very simple: it expects two polars.LazyFrames; one for model_data and one for prediction_data. If you've already read in your data, simply use this
- If, instead, you want to use the previous functionality which automatically found the most recent file in a folder, you should initialize the datamart class like ADMDatamart.from_ds_export()
- Or, if instead, you are consuming the results of a data flow (including the OOTB Prediction Studio export), you can simply initialize the datamart class like ADMDatamart.from_dataflow_export(model_data="pattern_for_model_files*.json", predictor_data="pattern_for_predictor_files*.json"). We can also cache the files we've read in before by writing to a 'cache' file automatically - this makes things move quickly. This closes #205 as well.

Todo before release:

Update Pega Academy article https://academy.pega.com/topic/data-scientist-tools-customer-decision-hub/v1
Further improve test coverage
Complete missing docstrings
Perform further internal testing
Ensure all linked issues are fixed
Improve some of the optional imports that are imported on library import

Full Changelog: V4.0.0-alpha.1...V4.0.0-beta.1

Assets 2

30 Oct 17:17

StijnKas

V4.0.0-alpha.1

5a42b5c

Pdstools V4 alpha 1 Pre-release

Pre-release

✨Highlights

Farewell R - you've served us well, but pdstools is now Python only
Introducing the Pega DX API Client
- Starting out with support for the 24.2 Prediction Studio and Knowledge Buddy APIs
Major refactor of the entire codebase: consistent python naming, optional dependency groups, well-defined typehints

❌Deprecations/removals

The R version of pdstools has been removed. In case you still want to use the R tools, you should manually clone the repo at the V3.x tag.
The legacy IH utilities have been dropped. These were old parts of the codebase and untested/unused. New IH utilities are on their way!

🔨Changes

Consistent pythonic casing, meaning PascalCase for classes & snake_case for methods, variables & arguments
Much improved typehints, so it's much more obvious what the response of a given function will be
Fewer 'base' dependencies; different functionality is split up into 'namespaces' that all have their own set of requirements
- The first time you invoke a method in a 'namespace', it verifies the dependencies and gives a clear warning if any are missing
To expand on the previous point: functionality is split up much more logically. Taking the ADMDatamart class as an example:
- Plotting functionality is part of ADMDatamart.plot.bubble_chart() (or any other plot of course)
- The health check and other reports are part of ADMDatamart.generate.health_check() (for instance)
- The intermediate aggregations needed are part of ADMDatamart.aggregations.pivot() (for instance)
Using classmethods, we can initialize the ADMDatamart class in particular in a much more flexible way.
- The main __init__ method of the ADMDatamart class is very simple: it expects two polars.LazyFrames; one for model_data and one for prediction_data. If you've already read in your data, simply use this
- If, instead, you want to use the previous functionality which automatically found the most recent file in a folder, you should initialize the datamart class like ADMDatamart.from_ds_export()
- Or, if instead, you are consuming the results of a data flow (including the OOTB Prediction Studio export), you can simply initialize the datamart class like ADMDatamart.from_dataflow_export(model_data="pattern_for_model_files*.json", predictor_data="pattern_for_predictor_files*.json"). We can also cache the files we've read in before by writing to a 'cache' file automatically - this makes things move quickly. This closes #205 as well.

Todo before release:

Update Pega Academy article https://academy.pega.com/topic/data-scientist-tools-customer-decision-hub/v1
Further improve test coverage
Complete missing docstrings
Perform further internal testing
Ensure all linked issues are fixed
Improve some of the optional imports that are imported on library import

Assets 2

08 Oct 10:24

StijnKas

V3.5.0

7c91518

Pega Data Scientist Tools V3.5.0: Polars V1

While we've been hard at work creating version 4 of pdstools, I wanted to get one last release for V3 out of the way.

V4 brings some pretty sizable changes; we'll deprecate the R tools and fully rework all python classes to make them more consistent & maintainable, including renaming pretty much all classes & methods. Since things will be breaking, it may be desirable to sometimes fall back to V3 while transitioning. However, our V3 branch was falling out of date mainly because we were tied to Polars < 1. This minor update brings Polars V1 support. It does require Polars > 1.9 as we were facing a ipc serialization bug in earlier versions. We likely will not support the V3 branch out into the future, but if necessary we could accept small bug fixes down the line.

What's Changed

Make Quarto Render OS-Agnostic and Add Version Info to Health Check Logs by @yusufuyanik1 in #257
Fixing issue with inconsitent coloring and ordering by @operdeck in #261
[WIP] Compatibility with Polars V1 by @StijnKas in #233

Full Changelog: V3.4.7...V3.5.0

Contributors

operdeck, yusufuyanik1, and StijnKas

Assets 2

24 Jul 15:03

yusufuyanik1

V3.4.6

6f5ac3b

Pega Data Scientist Tools V3.4.6

Enhancement and Bug Fixes

This patch release, V3.4.6, includes multiple enhancements and bug fixes aimed at improving the functionality and user experience
Key updates include Improved logging for Health Check(#250), handling empty data frames gracefully, prediction analysis improvements and more.

What's Changed

Make regex strings raw strings by @StijnKas in #232
Gracefully handle empty data frame for extract_keys by @StijnKas in #234
Prediction analysis by @operdeck in #237
Fixed timezone issue with new polars by @operdeck in #238
remove DA from EE article by @yusufuyanik1 in #240
fix the sample data path by @yusufuyanik1 in #241
Experimental extra plot to show class separation by @operdeck in #239
Moved PDC specifics back to a separate class. Made ADM more robust ag… by @operdeck in #243
Missing cast of by-period to date in summaries by @operdeck in #245
Prediction fixes by @operdeck in #246
Numbers formatted more human friendly by @operdeck in #247
Improved signature of AUC from bin methods to support a direct ordering by @operdeck in #248

Full Changelog: V3.4.4...V3.4.6

Contributors

operdeck, yusufuyanik1, and StijnKas

Assets 2

06 Mar 15:40

StijnKas

V3.4

7b3dc6b

Pega Data Scientist Tools V3.4: Binning Insights

Rolling up the ADM bins

This release adds additional insights from the ADM binning data, letting you find information on predictors across models and channels!

Check out the explainer article here.

What's Changed

Issue 119 by @operdeck in #185
Added a few more test cases by @operdeck in #187
Improved coverage by @operdeck in #188
Improved coverage by @operdeck in #189
ISSUE_186 by @operdeck in #190
Support some of the accounts with many more configurations by @operdeck in #193
Added aggregated bin insights to examples by @operdeck in #197
Reviewed text sections by @operdeck in #198

Full Changelog: V3.3...V3.4

Contributors

operdeck

Assets 2

05 Jan 13:25

yusufuyanik1

V3.3

e2d7166

Pega Data Scientist Tools V3.3

HealthCheck App Changes

In this release, we've primarily focused on improving the HealthCheck app, making it more powerful and user-friendly.

Beyond the comprehensive global HealthCheck report, you now have the capability to generate individual model reports. These reports allow you to delve into the performance of a specific model, providing an in-depth view of predictors and their individual effects on propensity.
You can now run HealthCheck in the cloud, directly from GitHub without the need to install any tools. A Github codespace is a development environment that's hosted in the cloud. Each codespace you create is hosted by GitHub in a Docker container, running on a virtual machine. See the Wiki for all the possible ways to run the ADM HealthCheck.
If our out-of-the-box report isn't quite what you're looking for, you can now export the latest Datamart Snapshot in Excel format, empowering you to perform your own custom analysis.
You can save your filters and upload them later to avoid repetition.

Code Changes

Added support for Python 3.12
Aligned with performance improvements coming with the latest version of polars

What's Changed

Health Check update by @yusufuyanik1 in #115
HealthCheck fixes by @yusufuyanik1 in #117
Article Fixes by @yusufuyanik1 in #121
Revert changes in Data Anonymization article by @yusufuyanik1 in #122
Made the off line reports shine even more by @operdeck in #123
fix overtime plot metric by @yusufuyanik1 in #124
polars version upgrade by @yusufuyanik1 in #125
polars version update remaining by @yusufuyanik1 in #126
Initial cut of python version of off-line model reports by @operdeck in #129
Usability updates to the Value Finder code by @StijnKas in #131
Update test to reflect the new treatment col by @StijnKas in #133
Add devcontainer in support for codespaces by @StijnKas in #135
Added mostly TODOs and fixed some text and simple layout things by @operdeck in #137
Added trivial cmd line args for easier calling from the outside by @operdeck in #139
Issue 128 by @operdeck in #142
ALigned channel overview by @operdeck in #143
Update-azureopenai-version by @StijnKas in #150
Reports more robust for various cornercases by @operdeck in #151
Included colored styling in tables to highlight issues. Made model re… by @operdeck in #152
Hc improvements by @operdeck in #154
Hc improvements by @operdeck in #161
Fixed incorrect BinIndex type by @StijnKas in #162
Bump version & bump polars min version by @StijnKas in #164
Standalone Report improvements by @operdeck in #166
Replaced gains charts in HC by @operdeck in #168
Supporting utilities to run reports in batch and unattended by @operdeck in #170
Bump azure openai version to latest, support openai v1 by @StijnKas in #169
Streamlit App Changes by @yusufuyanik1 in #155
Doc cleanup by @operdeck in #172
Dropped old batch scripts in favor of recently introduced new ones by @operdeck in #173
HealthCheck fix by @yusufuyanik1 in #171
remove context_keys selection in app by @yusufuyanik1 in #174
Fix errors and depreciations from polars version bump by @yusufuyanik1 in #176
improve polars patch PR by @yusufuyanik1 in #177
Python 3.12 support, bump version by @StijnKas in #144

Full Changelog: V3.2...V3.3

Contributors

operdeck, yusufuyanik1, and StijnKas

Assets 2

28 Jul 14:39

StijnKas

V3.2

0804c03

Pega Data Scientist Tools V3.2

What's Changed

Nothing major in this release, but a lot of bugfixes and versioning compatibilities. For an overview of pull requests:

Use local path in docs workflow by @StijnKas in #89
Fix: Cast cat column to str in ADMExplained by @yusufuyanik1 in #92
Health Check Set Up article added by @yusufuyanik1 in #93
Fix: error logging and version mismatches by @yusufuyanik1 in #94
Adm explained revision by @yusufuyanik1 in #97
Added formula for using Beta directly with positives by @operdeck in #98
Health Check consistency fixes by @yusufuyanik1 in #99
Support finding active range in classifiers through a convenience fun by @operdeck in #105
Supporting active range AUCs in standard offline Model reports by @operdeck in #106
Fixes score calculation details see https://github.com/pegasystems/pe… by @operdeck in #108
Changed to a better example predictor and added more formulae by @operdeck in #109
Changed to a better example predictor and added more formulae by @operdeck in #110
series.cut() fix along with ADMExplained reproducibility improvement by @yusufuyanik1 in #111
ADMExplained fix by @yusufuyanik1 in #112
freeze polars version and article fixes by @yusufuyanik1 in #113

Full Changelog: V3.1...V3.2

Contributors

operdeck, yusufuyanik1, and StijnKas

Assets 2

10 Apr 15:04

StijnKas

V3.1

d83b379

Pega Data Scientist Tools V3.1

In case you haven’t seen it yet, V3.0 brought many important changes. V3.1 is a minor release, but brings some nice usability changes:

What’s new?

Rebuilt the pdstools app to be much more user-friendly and easy to use
Added a models-only Health Check
Added a Tables class to generate tables and export them to Excel
Allow for multiple pl.Exprs in ._apply_query
Added Thompson Sampling & ADM Explained articles to the Python docs
Added plotPredictorContribution to the main plots
Basic S3 tools, for reading Pega Repository datasets, including get_ADMDatamart to get the datamart directly from S3
pdstools.show_versions allows you to easily get the versions of installed packages
Issue templates to more easily and clearly define gh issues

What’s changed?

Moved Health Check generation responsibility to the ADMDatamart class
Moved Health Check files to reports
Updated Value Finder to a more streamlined implementation
Separated IO into pega_io
Support for more OOTB timestamp formats
Fixed a bug in getMultiTrees that caused different models to not separate properly
Fixed compatibility with Polars version 0.17

Technical improvements

Automated documentation builds & deployment
Automated pypi releases
Automated tests for Health Check
Fixed tests not being found by VS Code

Full Changelog: V3.0...V3.1

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor version upgrade to 4.1

✨Highlights

❌Deprecations/removals

🔨Changes

✨Highlights

❌Deprecations/removals

🔨Changes

✨Highlights

❌Deprecations/removals

🔨Changes

What's Changed

Contributors

Enhancement and Bug Fixes

What's Changed

Contributors

Rolling up the ADM bins

What's Changed

Contributors

HealthCheck App Changes

Code Changes

What's Changed

Contributors

What's Changed

Contributors

What’s new?

What’s changed?

Technical improvements

Releases: pegasystems/pega-datascientist-tools

Pega Data Scientist Tools 4.1

Minor version upgrade to 4.1

Pega Data Scientist Tools V4

✨Highlights

❌Deprecations/removals

🔨Changes

Pdstools V4 beta 1

✨Highlights

❌Deprecations/removals

🔨Changes

Pdstools V4 alpha 1

✨Highlights

❌Deprecations/removals

🔨Changes

Pega Data Scientist Tools V3.5.0: Polars V1

What's Changed

Contributors

Pega Data Scientist Tools V3.4.6

Enhancement and Bug Fixes

What's Changed

Contributors

Pega Data Scientist Tools V3.4: Binning Insights

Rolling up the ADM bins

What's Changed

Contributors

Pega Data Scientist Tools V3.3

HealthCheck App Changes

Code Changes

What's Changed

Contributors

Pega Data Scientist Tools V3.2

What's Changed

Contributors

Pega Data Scientist Tools V3.1

What’s new?

What’s changed?

Technical improvements