Skip to content

Latest commit

 

History

History
726 lines (676 loc) · 44.4 KB

papers.md

File metadata and controls

726 lines (676 loc) · 44.4 KB

Papers

If you want to sync papers.

python scripts/sync_papers --sync_path {path_name}


Category

Description

  • bold : important
  • tag: keyword
  • paper, article, note and code

Background knowledge

  • Gaussian Process
    • Supervised, Regression
    • note
  • Importance Sampling
  • Information Theory: A Tutorial Introduction (2018. 2)

Research Paper

Deep Learning (2015) Review
- nature, note

Adversarial Example

  • Explaining and Harnessing Adversarial Examples (2014. 12)
    • FGSM (Fast Gradient Sign Method), Adversarial Training
    • arXiv
  • The Limitations of Deep Learning in Adversarial Settings (2015. 11)
    • JSMA (Jacobian-based Saliency Map Approach), Adversarial Training
    • arXiv
  • Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization (2015. 11)
    • Adversarial Training (generated adversarial examples), Proactive Defense
    • arXiv
  • Practical Black-Box Attacks against Machine Learning (2016. 2)
    • Black-Box (No Access to Gradient), Generate Synthetic
    • arXiv
  • Adversarial Patch (2017. 12)

AI

  • Machine Theory of Mind (2018. 2)
    • ToMnet, Meta-Learning, General Model, Agent
    • arXiv

Cognitive

Computer Vision

  • Network In Network (2013. 12)
  • Fractional Max-Pooling (2014. 12)
    • Max-Pooling, Data Augmentation, Regularization
    • arXiv, note
  • Deep Residual Learning for Image Recognition (2015. 12)
  • Spherical CNNs (2018. 1)
    • Spherical Correlation, 3D Model, Fast Fourier Transform (FFT)
    • arXiv, open_review
  • Taskonomy: Disentangling Task Transfer Learning (2018. 4)
    • Taskonomy , Transfer Learning, Computational modeling of task relations
    • arXiv
  • AutoAugment: Learning Augmentation Policies from Data (2018. 5)
    • Search Algorithm (RL), Sub-Policy
    • arXiv
  • Exploring Randomly Wired Neural Networks for Image Recognition (2019. 4)
    • Randomly wired neural networks, Random Graph Models (ER, BA and WS)
    • arXiv
  • MixMatch: A Holistic Approach to Semi-Supervised Learning (2019. 5)
    • MixMatch, Semi-Supervised, Augumentation -> Label Guessing -> Average -> Sharpening
    • arXiv

Framework & System

  • Snorkel: Rapid Training Data Creation with Weak Supervision (2017. 11)
  • Training classifiers with natural language explanations (2018. 5)

Model

  • Dropout (2012, 2014)
  • Regularization of Neural Networks using DropConnect (2013)
  • Recurrent Neural Network Regularization (2014. 9)
    • RNN, Dropout to Non-Recurrent Connections
    • arXiv
  • Batch Normalization (2015. 2)
    • Regulaizer, Accelerate Training, CNN
    • arXiv, note
  • Training Very Deep Networks (2015. 7)
  • A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (2015. 12)
    • Variational RNN, Dropout - RNN, Bayesian interpretation
    • arXiv
  • Deep Networks with Stochastic Depth (2016. 3)
    • Dropout, Ensenble, Beyond 1000 layers
    • arXiv, note
  • Adaptive Computation Time for Recurrent Neural Networks (2016. 3)
    • ACT, Dynamically, Logic Task
    • arXiv
  • Layer Normalization (2016. 7)
    • Regulaizer, Accelerate Training, RNN
    • arXiv, note
  • Recurrent Highway Networks (2016. 7)
  • Using Fast Weights to Attend to the Recent Past (2016. 10)
  • Professor Forcing: A New Algorithm for Training Recurrent Networks (2016. 10)
    • Professor Forcing, RNN, Inference Problem, Training with GAN
    • arXiv, note
  • Equality of Opportunity in Supervised Learning (2016. 10)
  • Categorical Reparameterization with Gumbel-Softmax (2016. 11)
    • Gumbel-Softmax distribution , Reparameterization, Smooth relaxation
    • arXiv, open_review
  • Understanding deep learning requires rethinking generalization (2016. 11)
    • Generalization Error, Role of Regularization
    • arXiv
  • Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (2017. 1)
    • MoE Layer, Sparsely-Gated, Capacity, Google Brain
    • arXiv, note
  • A simple neural network module for relational reasoning (2017. 6)
  • On Calibration of Modern Neural Networks (2017. 6)
    • Confidence calibration, Maximum Calibration Error (MCE)
    • arXiv
  • When is a Convolutional Filter Easy To Learn? (2017. 9)
  • mixup: Beyond Empirical Risk Minimization (2017. 10)
    • Data Augmentation, Vicinal Risk Minimization, Generalization
    • arXiv, open_review
  • Measuring the tendency of CNNs to Learn Surface Statistical Regularities (2017. 11)
  • MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels (2017. 12)
    • MentorNet - StudentNet, Curriculum Learning, Output is Weight
    • arXiv
  • Deep Learning Scaling is Predictable, Empirically (2017. 12)
  • Sensitivity and Generalization in Neural Networks: an Empirical Study (2018. 2)
  • Can recurrent neural networks warp time? (2018. 2)
    • RNN, Learnable Gate, Chrono Initialization
    • open_review
  • Spectral Normalization for Generative Adversarial Networks (2018. 2)
    • GAN, Training Discriminator, Constrain Lipschitz, Power Method
    • open_review
  • On the importance of single directions for generalization (2018. 3)
  • Group Normalization (2018. 3)
    • Group Normalization (GN), Batch (BN), Layer (LN), Instance (IN), Independent Batch Size
    • arXiv
  • Fast Decoding in Sequence Models using Discrete Latent Variables (2018. 3)
    • Autoregressive, Latent Transformer, Discretization
    • arXiv
  • Delayed Impact of Fair Machine Learning (2018. 3)
  • How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) (2018. 5)
    • Smoothing Effect, BatchNorm’s Reparametrization
    • arXiv
  • When Recurrent Models Don't Need To Be Recurrent (2018. 5)
  • Relational inductive biases, deep learning, and graph networks (2018, 6)
    • Survey, Relation, Graph
    • arXiv
  • Universal Transformers (2018. 7)
  • Identifying Generalization Properties in Neural Networks (2018. 9)
  • No Multiplication? No Floating Point? No Problem! Training Networks for Efficient Inference (2018. 9)
    • Quantization, Store Multiplication Table, Memory/Power Resources
    • arXiv

Natural Language Processing

  • Distributed Representations of Words and Phrases and their Compositionality (2013. 10)
    • Word2Vec, CBOW, Skip-gram
    • arXiv
  • GloVe: Global Vectors for Word Representation (2014)
    • Word2Vec, GloVe, Co-Occurrence
    • paper
  • Convolutional Neural Networks for Sentence Classification (2014. 8)
  • Neural Machine Translation by Jointly Learning to Align and Translate (2014. 9)
  • Text Understanding from Scratch (2015. 2)
    • CNN, Character-level
    • arXiv
  • Ask Me Anything: Dynamic Memory Networks for Natural Language Processing (2015. 6)
  • Pointer Networks (2015. 6)
    • Seq2Seq, Attention, Combinatorial
    • arXiv, note
  • Skip-Thought Vectors (2015. 6)
  • A Neural Conversational Model (2015. 6)
    • Seq2Seq, Conversation
    • arXiv
  • Teaching Machines to Read and Comprehend (2015. 6)
  • Effective Approaches to Attention-based Neural Machine Translation (2015. 8)
  • Character-Aware Neural Language Models (2015. 8)
    • CNN, Character-level
    • arXiv
  • Neural Machine Translation of Rare Words with Subword Units (2015. 8)
  • A Diversity-Promoting Objective Function for Neural Conversation Models (2015. 10)
  • Multi-task Sequence to Sequence Learning (2015. 11)
  • Multilingual Language Processing From Bytes (2015. 12)
    • Byte-to-Span, Multilingual, Seq2Seq
    • arXiv, note
  • Strategies for Training Large Vocabulary Neural Language Models (2015. 12)
    • Vocabulary, Softmax, NCE, Self Normalization
    • arXiv, note
  • Incorporating Structural Alignment Biases into an Attentional Neural Translation Model (2016. 1)
    • Seq2Seq, Attention with Structural Biases, Translation
    • arXiv
  • Long Short-Term Memory-Networks for Machine Reading (2016. 1)
    • LSTMN, Intra-Attention, RNN
    • arXiv
  • Recurrent Memory Networks for Language Modeling (2016. 1)
  • Exploring the Limits of Language Modeling (2016. 2)
    • Google Brain, Language Modeling
    • arXiv, note
  • Swivel: Improving Embeddings by Noticing What's Missing (2016. 2)
    • Word2Vec, Swivel , Co-Occurrence
    • arXiv
  • Incorporating Copying Mechanism in Sequence-to-Sequence Learning (2016. 3)
  • Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models (2016. 4)
    • Translation, Hybrid NMT, Word-Char
    • arXiv, note
  • Adversarial Training Methods for Semi-Supervised Text Classification (2016. 5)
    • Regulaizer, Adversarial, Virtual Adversarial Training (Semi-Supervised)
    • arXiv, note
  • SQuAD: 100,000+ Questions for Machine Comprehension of Text (2016. 6)
  • Sequence-Level Knowledge Distillation (2016. 6)
  • Attention-over-Attention Neural Networks for Reading Comprehension (2016. 7)
    • Attention, Cloze-style, Reading Comprehension
    • arXiv, note
  • Recurrent Neural Machine Translation (2016. 7)
    • Translation, Attention (RNN)
    • arXiv
  • An Actor-Critic Algorithm for Sequence Prediction (2016. 7)
    • Seq2Seq, Actor-Critic, Objective
    • arXiv, note
  • Pointer Sentinel Mixture Models (2016. 9)
    • Language Modeling, Rare Word, Salesforce
    • arXiv, note
  • Multiplicative LSTM for sequence modelling (2016. 10)
    • mLSTM, Language Modeling, Character-Level
    • arXiv
  • Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models (2016. 10)
  • Fully Character-Level Neural Machine Translation without Explicit Segmentation (2016. 10)
    • Translation, CNN, Character-Level
    • arXiv, note
  • Neural Machine Translation in Linear Time (2016. 10)
    • ByteNet, WaveNet + PixelCNN, Translation, Character-Level
    • arXiv, note
  • Bidirectional Attention Flow for Machine Comprehension (2016. 11)
  • Dynamic Coattention Networks For Question Answering (2016. 11)
    • QA, DCN, Coattention Encoder, Machine Comprehension
    • arXiv
  • Dual Learning for Machine Translation (2016. 11)
    • Translation, RL, Dual Learning (Two-agent)
    • arXiv, note
  • Neural Machine Translation with Reconstruction (2016. 11)
    • Translation, Auto-Encoder, Reconstruction
    • arXiv, note
  • Quasi-Recurrent Neural Networks (2016. 11)
    • QRNN, Parallelism, Conv + Pool + RNN
    • arXiv, note
  • A recurrent neural network without chaos (2016. 12)
    • RNN, CFN, Dynamic, Chaos
    • arXiv
  • Comparative Study of CNN and RNN for Natural Language Processing (2017. 2)
    • Systematic Comparison, CNN vs RNN
    • arXiv
  • A Structured Self-attentive Sentence Embedding (2017. 3)
    • Sentence Embedding, Self-Attention, 2-D Matrix
    • arXiv, note
  • Dynamic Word Embeddings for Evolving Semantic Discovery (2017. 3)
  • Learning to Generate Reviews and Discovering Sentiment (2017. 4)
    • Sentiment, Unsupervised , OpenAI
    • arXiv
  • Ask the Right Questions: Active Question Reformulation with Reinforcement Learning (2017. 5)
    • QA, Active Question Answering, RL, Agent (Reformulate, Aggregate)
    • arXiv, open_review
  • Reinforced Mnemonic Reader for Machine Reading Comprehension (2017. 5)
    • QA, Mnemonic (Syntatic, Lexical), RL, Machine Comprehension
    • arXiv
  • Attention Is All You Need (2017. 6)
  • Depthwise Separable Convolutions for Neural Machine Translation (2017. 6)
  • MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension (2017. 7)
    • MEMEN, QA(MC), Embedding(skip-gram), Full-Orientation Matching
    • arXiv
  • On the State of the Art of Evaluation in Neural Language Models (2017. 7)
    • Standard LSTM, Regularisation, Hyperparemeter
    • arXiv
  • Text Summarization Techniques: A Brief Survey (2017. 7)
  • Adversarial Examples for Evaluating Reading Comprehension Systems (2017. 7)
    • Concatenative Adversaries(AddSent, AddOneSent), SQuAD
    • arXiv
  • Learned in Translation: Contextualized Word Vectors (2017. 8)
    • Word Embedding, CoVe, Context Vector
    • arXiv
  • Simple and Effective Multi-Paragraph Reading Comprehension (2017. 10)
    • Document-QA, Select Paragraph-Level, Confidence Based, AllenAI
    • arXiv, note
  • Unsupervised Neural Machine Translation (2017. 10)
    • Train with both direction (tandem), Shared Encoder, Denoising Auto-Encoder
    • arXiv, open_review
  • Word Translation Without Parallel Data (2017. 10)
    • Unsupervised, Multilingual Embedding, Parallel Dictionary Induction
    • arXiv, open_review
  • Unsupervised Machine Translation Using Monolingual Corpora Only (2017. 11)
  • Neural Text Generation: A Practical Guide (2017. 11)
  • Breaking the Softmax Bottleneck: A High-Rank RNN Language Model (2017. 11)
    • MoS (Mixture of Softmaxes), Softmax Bottleneck
    • arXiv
  • Neural Speed Reading via Skim-RNN (2017. 11)
  • Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks (2017. 11)
    • SCAN, Compositional, Mix-and-Match
    • arXiv
  • The NarrativeQA Reading Comprehension Challenge (2017. 12)
  • Hierarchical Text Generation and Planning for Strategic Dialogue (2017. 12)
    • End2End Strategic Dialogue, Latent Sentence Representations, Planning + RL
    • arXiv
  • Recent Advances in Recurrent Neural Networks (2018. 1)
    • RNN, Recent Advances, Review
    • arXiv
  • Personalizing Dialogue Agents: I have a dog, do you have pets too? (2018. 1)
    • Chit-chat, Profile Memory, Persona-Chat Dataset, ParlAI
    • arXiv
  • Generating Wikipedia by Summarizing Long Sequences (2018. 1)
    • Multi-Document Summarization, Extractive-Abstractive Stage, T-DMCA, WikiSum, Google Brain
    • arXiv, note, open_review
  • MaskGAN: Better Text Generation via Filling in the______ (2018. 1)
  • Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs (2018. 1)
    • Contextual Decomposition (CD), Disambiguate interactions between Gates
    • arXiv, open_review
  • Universal Language Model Fine-tuning for Text Classification (2018. 1)
    • ULMFiT, Pre-trained, Transfer Learning
    • arXiv
  • DeepType: Multilingual Entity Linking by Neural Type System Evolution (2018. 2)
  • Deep contextualized word representations (2018. 2)
    • biLM, ELMo, Word Embedding, Contextualized, AllenAI
    • arXiv, note
  • Ranking Sentences for Extractive Summarization with Reinforcement Learning (2018. 2)
    • Document-Summarization, Cross-Entropy vs RL, Extractive
    • arXiv
  • code2vec: Learning Distributed Representations of Code (2018. 3)
    • code2vec, Code Embedding, Predicting method name
    • arXiv
  • Universal Sentence Encoder (2018. 3)
    • Transformer, Deep Averaging Network (DAN), Transfer
    • arXiv
  • An efficient framework for learning sentence representations (2018. 3)
  • An Analysis of Neural Language Modeling at Multiple Scales (2018. 3)
    • LSTM vs QRNN, Hyperparemeter, AWD-QRNN
    • arXiv
  • Analyzing Uncertainty in Neural Machine Translation (2018. 3)
    • Uncertainty, Beam Search Degradation, Copy Mode
    • arXiv
  • An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (2018. 3)
    • Temporal Convolutional Network (TCN), CNN vs RNN
    • arXiv
  • Training Tips for the Transformer Model (2018. 4)
    • Transformer, Hyperparameter, Multiple GPU
    • arXiv
  • QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension (2018. 4)
  • SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach (2018. 4)
    • Top-K Subject Recognitio, Relation Classification
    • arXiv
  • Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer (2018. 4)
    • Sentiment Transfer, Disentangle Attribute, Unsupervised
    • arXiv
  • Parsing Tweets into Universal Dependencies (2018. 4)
    • Universal Dependencies (UD), TWEEBANK v2
    • arXiv
  • Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates (2018. 4)
    • SR, Subword Sampling + Hyperparameter, Segmentation (BPE, Unigram)
    • arXiv
  • Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension (2018. 4)
    • PI-SQuAD, Challenge, Document Encoder, Scalability
    • arXiv
  • GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding (2018. 4)
  • On the Practical Computational Power of Finite Precision RNNs for Language Recognition (2018. 5)
    • Unbounded counting, IBFP-LSTM
    • arXiv
  • Paper Abstract Writing through Editing Mechanism (2018. 5)
    • Writing-editing Network, Attentive Revision Gate
    • arXiv
  • A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings (2018. 5)
    • Unsupervised initialization scheme, Robust self-leraning
    • arXiv
  • Efficient and Robust Question Answering from Minimal Context over Documents (2018. 5)
    • Sentence Selector, Oracle Sentence, Minimal Set of Sentences (SpeedUp)
    • arXiv, note
  • Global-Locally Self-Attentive Dialogue State Tracker (2018. 5)
    • GLAD, WoZ and DSTC2 Dataset
    • arXiv
  • Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information (2018, 5)
    • Dataset, EVPI, ACL 2018 Best Paper
    • arXiv
  • Know What You Don't Know: Unanswerable Questions for SQuAD (2018, 6)
  • The Natural Language Decathlon: Multitask Learning as Question Answering (2018, 6)
    • decaNLP, Multitask Question Answering Network (MQAN), Transfer Learning
    • arXiv
  • GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations (2018, 6)
    • Transfer Learning Framework, Structured Graphical Representations
    • arXiv
  • Improving Language Understanding by Generative Pre-Training (2018, 6)
    • Transformer, Generative Pre-Training, Discriminative Fine-Tuning
    • paper, open_ai_blog
  • Finding Syntax in Human Encephalography with Beam Search (2018, 6)
    • RNNG+beam search, ACL 2018 Best Paper
    • arXiv
  • Let's do it "again": A First Computational Approach to Detecting Adverbial Presupposition Triggers (2018, 6)
    • Task, Dataset, Weighted-Pooling (WP) ACL 2018 Best Paper
    • arXiv
  • QuAC : Question Answering in Context (2018. 8)
  • CoQA: A Conversational Question Answering Challenge (2018. 8)
    • Abstractive with Extractive Rationale, Challenge, Coreference and Pragmatic Reasoning
    • arXiv, leaderboard
  • Contextual Parameter Generation for Universal Neural Machine Translation (2018. 8)
    • Parameter Generation, Language Embedding, EMNLP 2018
    • arXiv
  • Evaluating Theory of Mind in Question Answering (2018. 8)
    • Dataset, Higher-order Beliefs, EMNLP 2018
    • arXiv
  • Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text (2018. 9)
    • GRAFT-Net, KB+Text Fusion, EMNLP 2018
    • arXiv
  • HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (2018. 9)
    • Dataset, Multi-hop, Sentence-level Supporting Fact, EMNLP 2018
    • arXiv
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018. 10)
    • BERT, Discriminative, Pre-trained, Transfer Learning, NAACL 2019 Best
    • arXiv
  • Trellis Networks for Sequence Modeling (2018. 10)
    • TrellisNet, Structural bridge between TCN and RNN, NAACL 2019
    • arXiv
  • CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge (2018. 11)
    • CommonsenseQA, Dataset, Multiple-Choice, NAACL 2019 Best
    • arXiv
  • Cross-lingual Language Model Pretraining (2019. 1)
    • XLM , MLM + TLM, Cross-lingual Pre-trained, Low-Resource
    • arXiv
  • Better Language Models and Their Implications (2019. 2)
    • GPT-2, 1.5 Billion Parameters, Zero-Shot
    • paper, blog
  • Parameter-Efficient Transfer Learning for NLP (2019. 2)
    • Adapter tuning, Bottleneck, BERT
    • arXiv
  • To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (2019. 3)
    • Fine-tuning vs Feature, BERT and ELMo, Empirically analyze
    • arXiv
  • Linguistic Knowledge and Transferability of Contextual Representations (2019. 3)
    • Analysis CWRs, LSTM, Transformer, Transferable, NAACL 2019
    • arXiv
  • ERNIE: Enhanced Representation through Knowledge Integration (2019. 4)
    • ERNIE, Masking Strategies, Dialog Language Model, Pre-trained, Transfer Learning
    • arXiv
  • CNM: An Interpretable Complex-valued Network for Matching (2019. 4)
    • CNM, Quantum Physics ,Interpretable, NAACL 2019 Best
    • arXiv
  • Unsupervised Recurrent Neural Network Grammars (2019. 4)
    • RNNG, Syntax Tree ,Variational Inference
    • arXiv
  • The Curious Case of Neural Text Degeneration (2019. 4)
    • Nucleus Sampling, Decoding Method ,Generation
    • arXiv
  • Unified Language Model Pre-training for Natural Language Understanding and Generation (2019. 5)
    • UniLM, Uni + Bi + S2S ,Generation
    • arXiv
  • SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems (2019. 5)
  • SpanBERT: Improving Pre-training by Representing and Predicting Spans (2019. 7)
    • SpanBERT, Span Boundary Objective (SBO), Pre-train, Transformer
    • arXiv
  • RoBERTa: A Robustly Optimized BERT Pretraining Approach (2019. 7)
    • RoBERTa, Data-BatchSize, Pre-train, Transformer
    • arXiv
  • ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (2019. 7)
    • ERNIE, Continual Pre-training, Word-Struct-Semantic, Transformer
    • arXiv
  • StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (2019. 8)
    • StructBERT (ALICE), Language Structure, Pre-train, Transformer
    • arXiv

One-Shot/Few-Shot/Meta Learing

  • Matching Networks for One Shot Learning (2016. 6)
  • Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017. 3)
  • SMASH: One-Shot Model Architecture Search through HyperNetworks (2017. 8)
  • Reptile: a Scalable Metalearning Algorithm (2018. 3)

Optimization

  • Understanding the difficulty of training deep feedforward neural networks (2010)
  • On the difficulty of training Recurrent Neural Networks (2012. 11)
    • Gradient Clipping, RNN
    • arXiv
  • Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (2015. 2)
    • PReLU, Weight Initialization (He)
    • arXiv, note
  • A Simple Way to Initialize Recurrent Networks of Rectified Linear Units (2015. 4)
    • Weight Initialization, RNN, Identity Matrix
    • arXiv
  • Cyclical Learning Rates for Training Neural Networks (2015. 6)
    • CLR, Triangular, ExpRange, Longtherm Benefit
    • arXiv
  • On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima (2016. 9)
    • Generalization, Sharpness of Minima
    • arXiv
  • Neural Optimizer Search with Reinforcement Learning (2017. 9)
    • Neural Optimizer Search (NOS), PowerSign, AddSign
    • arXiv
  • On the Convergence of Adam and Beyond (2018. 2)
  • Adafactor: Adaptive Learning Rates with Sublinear Memory Cost (2018. 4)
    • Adafactor, Adaptive Method, Update Clipping
    • arXiv
  • Revisiting Small Batch Training for Deep Neural Networks (2018. 4)
    • Generalization Performance, Training Stability
    • arXiv
  • Reconciling modern machine learning and the bias-variance trade-off (2018. 12)
    • Double Descent Risk Curve, Highly Complex Models
    • arXiv

Reinforcement Learning

  • Progressive Neural Networks (2016. 6)
  • Neural Architecture Search with Reinforcement Learning (2016. 11)
    • NAS, Google AutoML, Google Brain
    • arXiv
  • Third-Person Imitation Learning (2017. 3)
    • Imitation Learning, Unsupervised (Third-Person), GAN + Domain Confusion
    • arXiv
  • Noisy Networks for Exploration (2017. 6)
    • NoisyNet, Exploration, DeepMind
    • arXiv, note
  • Efficient Neural Architecture Search via Parameter Sharing (2018. 2)
    • ENAS, Google AutoML, Google Brain
    • arXiv
  • Learning by Playing - Solving Sparse Reward Tasks from Scratch (2018. 2)
    • Scratch with minimal prior knowledge, Scheduled Auxiliary Control (SAC-X), DeepMind
    • arXiv, deep_mind
  • Investigating Human Priors for Playing Video Games (2018. 2)
  • World Models (2018. 3)
    • Generative + RL, VAE (V), MDN-RNN (M), Controller (C)
    • arXiv
  • Unsupervised Predictive Memory in a Goal-Directed Agent (2018. 3)

Transfer Learning

  • ...

Unsupervised & Generative

  • Auto-Encoding Variational Bayes (2013. 12)
  • Generative Adversarial Networks (2014. 6)
  • Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data (2016. 5)
  • SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient (2016. 9)
  • Structured Inference Networks for Nonlinear State Space Models (2016. 9)
    • Structured Variational Approximation, SVGB
    • arXiv
  • beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework (2016. 11)
  • A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning (2017. 10)
    • Kalman VAE, LGSSM
    • arXiv
  • Self-Attention Generative Adversarial Networks (2018. 5)
    • SAGAN, Attention-Driven, Spectral Normalization
    • arXiv
  • Unsupervised Data Augmentation (2019. 4)
    • UDA, TSA Schedule, Semi-Supervised
    • arXiv