Skip to content

CoventryResearch/neuromantic

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

TOC

Use Cases

Audio Recognition

Speech to Text

Data Augmentation

Design

Games

Gesture Recognition

Using wearable sensors (phones, watches etc.)

Apps

Code repositories

Image Recognition

Face Recognition

Food Recognition

Image Captioning

Person Detection

Semantic Segmentation

Interpretability

Programming and ML

Predict defects

Predict performance

Searching code

Writing code

NLP

Chatbots

Crossword question answerers

Database queries

Named entity resolution

Also known as deduplication and record linkage (but not entity recognition which is picking up the names and classifying them in running text)

Reverse dictionaries

Other name is concept finders Return the name of a concept given a definition or description:

Sequence to sequence

Semantic analysis

Spelling

Summarization

Text to Image

Text to Speech

Personality recognition

  • Mining Facebook Data for Predictive Personality Modeling (Dejan Markovikj,Sonja Gievska, Michal Kosinski, David Stillwell)
  • Personality Traits Recognition on Social Network — Facebook (Firoj Alam, Evgeny A. Stepanov, Giuseppe Riccardi)
  • The Relationship Between Dimensions of Love, Personality, and Relationship Length (Gorkan Ahmetoglu, Viren Swami, Tomas Chamorro-Premuzic)

Search

Transfer Learning

Uber

Video recognition

Body recognition

Object detection

Scene Segmentation

Detects when one video (shot/scene/chapter) ends and another begins

Video captioning

Video classification

Multiple Modalities

Open problems

  • Recycled goods (not solved, no dataset)
  • Safety symbols on cardboard boxes (not solved, no dataset)

Tools

Pros:

  • let users train their own custom machine learning algorithms from scratch, without having to write a single line of code
  • uses Transfer Learning (the more data and customers, the better results)
  • is fully integrated with other Google Cloud services (Google Cloud Storage to store data, use Cloud ML or Vision API to customize the model etc.)

Cons:

  • limited to image recognition (2018-Q1)
  • doesn't allow to download a trained model

Pros:

  • Detect Faces (finds facial landmarks such as the eyes, nose, and mouth; doesn't identifies a person)
  • Scan barcodes
  • Recognize Text

Cons:

  • Label Detection - Detect entities within the video, such as "dog", "flower" or "car"
  • Shot Change Detection - Detect scene changes within the video
  • Explicit Content Detection - Detect adult content within a video
  • Video Transcription - Automatically transcribes video content in English

Experiments Frameworks

Tools to help you configure, organize, log and reproduce experiments

Jupyter Notebook

Playgrounds

IDEs

Repositories

Models

Decision Trees

Pros:

  • can model nonlinearities
  • are highly interpretable
  • do not require extensive feature preprocessing
  • do not require enormous data sets

Cons:

  • tend to overfit
    • fixed by building a decision forest with boosting
  • unstable/undeterministic (generate different results while trained on the same data)
    • fixed by using bootstrap aggregation/bagging (a boosted forest)
  • do mapping directly from the raw input to the label
    • better use neural nets that can learn intermediate representations

Hyperparameters:

  • tree depth
  • maximum number of leaf nodes

Distillation

Embedding models

Evolutionary Algorithms

Metrics of dataset quality

  • Statistical metrics
    • descriptive statistics: dimensionality, unique subject counts, systematic replicates counts, pdfs, cdfs (probability and cumulative distribution fx's)
    • cohort design
    • power analysis
    • sensitivity analysis
    • multiple testing correction analysis
    • dynamic range sensitivity
  • Numerical analysis metrics
    • number of clusters
    • PCA dimensions
    • MDS space dimensions/distances/curves/surfaces
    • variance between buckets/bags/trees/branches
    • informative/discriminative indices (i.e. how much does the top 10 features differ from one another and the group)
    • feature engineering differnetiators

Neural Networks

Approaches when our model doesn’t work:

  • Fetch more data
  • Add more layers to Neural Network
  • Try some new approach in Neural Network
  • Train longer (increase the number of iterations)
  • Change batch size
  • Try Regularisation
  • Check Bias Variance trade-off to avoid under and overfitting
  • Use more GPUs for faster computation

Back-propagation problems:

  • it requires labeled training data; while almost all data is unlabeled
  • the learning time does not scale well, which means it is very slow in networks with multiple hidden layers
  • it can get stuck in poor local optima, so for deep nets they are far from optimal.

Capsule Networks

Convolutional Neural Networks

Deep Residual Networks

Distributed Neural Networks

Feed-Forward Neural Networks

  • Perceptrons

Gated Recurrent Neural Networks

Generative Adversarial Networks

Long-Short Term Memory Networks

Recurrent Neural Networks

Symmetrically Connected Networks

Guidelines

Deep learning

  • Deep Learning: A Critical Appraisal by Gary Marcus, 2018
    • Deep learning thus far is data hungry
    • Deep learning thus far is shallow and has limited capacity for transfer
    • Deep learning thus far has no natural way to deal with hierarchical structure
    • Deep learning thus far has struggled with open-ended inference
    • Deep learning thus far is not sufficiently transparent
    • Deep learning thus far has not been well integrated with prior knowledge
    • Deep learning thus far cannot inherently distinguish causation from correlation
    • Deep learning presumes a largely stable world, in ways that may be problematic
    • Deep learning thus far works well as an approximation, but its answers often cannot be fully trusted
    • Deep learning thus far is difficult to engineer with
  • Software 2.0 by Andrej Karpathy, 2017

Interview preparation

MOOC

Google oriented courses

Books

NLP

Statistics

Datasets

Audios

Images

Videos

Research Groups

Cartoons

The Browser of a Data Scientist

  • The Browser of a Data Scientist

Jokes

A statistician drowned crossing a river that was only three feet deep on average

About

Latest Data Science Materials

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published