Skip to content
This repository was archived by the owner on Sep 4, 2024. It is now read-only.
/ book1 Public archive

Book 2023 on Pytorch, Numpy, and SkLearn for deep learning of tabular data

Notifications You must be signed in to change notification settings

rpriam/book1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

"Linear and Deep Models Basics with Pytorch, Numpy, and Scikit-Learn"

Files for the computer book in deep learning with statistical background

Amazon kdp paper back - 2023 - ISBN-13 :979-8371441577


Main Document (PDF EBOOK) with 247 pages (27-12-2022) for direct download book_pytorch_scikit_learn_numpy.pdf




Available Files in this repository

Datasets, main file .py and notebooks .pynb at ./notebooks



Main Features

  • Theory for the linear models and implementation with pytorch and scikit-learn

  • Practice of deep learning with pytorch for feedforward neural networks

  • Many examples and exercices to practice and understand further the contents

  • Very large datasets 450000 and 11000000 on a home computer with a few gigabytes

  • Step by step for theory & code (require only minimum knowledge in python and maths)

  • Learn the maths basics without compromise before consolidate towards advanced models

  • Generic python functions, allow to train and alter deep models for tabular data in a blink



Abstract

This book is an introduction to computational statistics for the generalized linear models (glm) and to machine learning with the python language. Extensions of the glm with nonlinearities come from hidden layer(s) within a neural network for linear and nonlinear regression or classification. This allows to present side by side classical statistics and current deep learning. The loglikelihoods and the corresponding loss functions are explained. The gradient and hessian matrix are discussed and implemented for these linear and nonlinear models. Several methods are implemented from scratch with numpy for prediction (linear, logistic, poisson regressions) and for reduction (principal component analysis, random projection). The gradient descent, newton-raphson, natural gradient and l-fbgs algorithms are implemented. The datasets in stake are with 10 to 10^7 rows, and are tabular such that images or texts are vectorized. The data are stored in a compressed format (memmap or hdf5) and loaded by chunks for several case studies with pytorch or scikit-learn. Pytorch is presented for training with minibatches via a generic implementation for studying with computer programs. Scikit-learn is presented for processing large datasets via the partial fit, after the small examples. Sixty exercises are proposed at the end of the chapters with selected solutions to go beyond the contents.


Chapters

  1. Introduction

    Polynomial regression
    Error on a train sample
    Error on a test sample

  2. Linear models with numpy and scikit-learn (chapter02_book.ipynb)

    Theory for linear regression
    Theory for logistic regression
    Loglikelihood and loss function
    Analytical expression of the derivatives
    implementation with numpy
    Implementation with Scikit-Learn

  3. First-order training of linear models (chapter03_book.ipynb)

    Algorithm with one datum and with one minibatch
    Implementation of the algorithms with numpy
    Implementation of the algorithms with pytorch

  4. Neural networks for (deep) glm (chapter04_book.ipynb)

    Presentation of the different loss functions from pytorch
    Generic implementation of the algorithms with pytorch
    Example of nonlinear frontier with a small dataset

  5. Lasso selection for (deep) glm (chapter05_book.ipynb)

    Penalization of the regression for sparse solution
    Implementation with pytorch for a neural network
    Selection of the hyperparameters (grid and bayesian)

  6. Hessian and covariance for (deep) glm (chapter06_book.ipynb)

    Notion of variance of the parameters
    Implementation with statsmodels for linear models
    Implementation with pytorch for a neural network

  7. Second-order training of (deep) glm (chapter07_book.ipynb)

    Expression of the update for 1st-order for poisson regression
    Expression of the update for 2nd-order for poisson regression
    Implementation of gradient descent for the poisson regression
    Implementation of newton-raphson and natural gradient with numpy
    Implementation of l-fbgs algorithm with pytorch for deep regressions
    Notion of quality of the estimation for comparison

  8. Autoencoder compared to ipca and t-sne (chapter08_book.ipynb)

    Introduction to the algebra for principal component analysis
    Implementation step by step for principal component analysis
    Implementation with scikit-Learn of pca and (non)linear autoencoders
    Implementation of t-sne with python from two modules
    Implementation of random projection for large datasets
    Notion of quality of the visualization for comparison

  9. Solution to selected exercices (chapter09_book.ipynb)

    Several solutions for large datasets with scikit-learn
    Several solutions for neural networks with pytorch






About

Book 2023 on Pytorch, Numpy, and SkLearn for deep learning of tabular data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published