This project is a collection of utility classes / functions to work with Spark / SparkML.
.
├── README.md
├── build.sbt
├── src
│ ├── main
| ├── resources <- App config for the logging etc.
│ │ └── scala <- All the scala source code
│ │
│ └── test
| ├── resources <- Data / config for the tests
│ └── scala <- All the unit tests
The following is a high level description of the components available in this library:
- Target Encoder - A spark ML implementation of target encoding
- Weight Of Evidence - A spark ML implementation of Weight of evidence encoding
- Stats Calculator - A utility class to generate additional metrics for classifiers (such as F1 Score / MCC)
- Timestamps Transformer - A sparkML transformer to take an input date and split into the components (year / month etc)
The build requires SBT. The following targets are the most important
sbt package
- Build the library and create the jar file for the librarysbt test
- Run all the unit testssbt jacoco
- Use the Jacoco plugin to run the test coverage report