Mortgage Workflow

The Dataset

The dataset used with this workflow is derived from Fannie Mae’s Single-Family Loan Performance Data with all rights reserved by Fannie Mae. This processed dataset is redistributed with permission and consent from Fannie Mae.

To acquire this dataset, please visit RAPIDS Datasets Homepage

Introduction

The Mortgage workflow is composed of three core phases:

ETL - Extract, Transform, Load
Data Conversion
ML - Training

ETL

Data is:

Read in from storage
Transformed to emphasize key features
Loaded into volatile memory for conversion

Data Conversion

Features are:

Broken into (labels, data) pairs
Distributed across many workers
Converted into compressed sparse row (CSR) matrix format for XGBoost

Machine Learning

The CSR data is fed into a distributed training session with xgboost.dask

Performance

We regularly benchmark RAPIDS on this workload to measure our performance against not just Apache Spark on CPUs but past versions of RAPIDS.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
utils		utils
README.md		README.md
mortgage_e2e.ipynb		mortgage_e2e.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mortgage Workflow

The Dataset

Introduction

ETL

Data Conversion

Machine Learning

Performance

About

Releases

Packages

Contributors 3

Languages

rapidsai-community/mortgage

Folders and files

Latest commit

History

Repository files navigation

Mortgage Workflow

The Dataset

Introduction

ETL

Data Conversion

Machine Learning

Performance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages