If you want to sync papers.
python scripts/sync_papers --sync_path {path_name}
- bold : important
tag
: keyword- paper, article, note and code
- Gaussian Process
Supervised
,Regression
- note
- Importance Sampling
Approximate
- notes
- Information Theory: A Tutorial Introduction (2018. 2)
Shannon's Theory
- arXiv
Deep Learning (2015) Review
- nature, note
- Explaining and Harnessing Adversarial Examples (2014. 12)
FGSM (Fast Gradient Sign Method)
,Adversarial Training
- arXiv
- The Limitations of Deep Learning in Adversarial Settings (2015. 11)
JSMA (Jacobian-based Saliency Map Approach)
,Adversarial Training
- arXiv
- Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization (2015. 11)
Adversarial Training (generated adversarial examples)
,Proactive Defense
- arXiv
- Practical Black-Box Attacks against Machine Learning (2016. 2)
Black-Box (No Access to Gradient)
,Generate Synthetic
- arXiv
- Adversarial Patch (2017. 12)
Patch
,White Box
,Black Box
- arXiv, the_morning_paper
- Machine Theory of Mind (2018. 2)
ToMnet
,Meta-Learning
,General Model
,Agent
- arXiv
- Building Machines That Learn and Think Like People (2016. 4)
Human-Like
,Learn
,Think
- arXiv, note, the morning paper
- Network In Network (2013. 12)
- Fractional Max-Pooling (2014. 12)
- Deep Residual Learning for Image Recognition (2015. 12)
- Spherical CNNs (2018. 1)
Spherical Correlation
,3D Model
,Fast Fourier Transform (FFT)
- arXiv, open_review
- Taskonomy: Disentangling Task Transfer Learning (2018. 4)
Taskonomy
,Transfer Learning
,Computational modeling of task relations
- arXiv
- AutoAugment: Learning Augmentation Policies from Data (2018. 5)
Search Algorithm (RL)
,Sub-Policy
- arXiv
- Exploring Randomly Wired Neural Networks for Image Recognition (2019. 4)
Randomly wired neural networks
,Random Graph Models (ER, BA and WS)
- arXiv
- MixMatch: A Holistic Approach to Semi-Supervised Learning (2019. 5)
MixMatch
,Semi-Supervised
,Augumentation -> Label Guessing -> Average -> Sharpening
- arXiv
- Snorkel: Rapid Training Data Creation with Weak Supervision (2017. 11)
Labelling Functions
,Data Programming
- arXiv, the_morning_blog
- Training classifiers with natural language explanations (2018. 5)
Babble Labble
,Data Programming
- arXiv, the_morning_blog
- Dropout (2012, 2014)
Regulaizer
,Ensemble
- arXiv (2012), arXiv (2014), note
- Regularization of Neural Networks using DropConnect (2013)
Regulaizer
,Ensemble
- paper, note, wanli_summary
- Recurrent Neural Network Regularization (2014. 9)
RNN
,Dropout to Non-Recurrent Connections
- arXiv
- Batch Normalization (2015. 2)
- Training Very Deep Networks (2015. 7)
- A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (2015. 12)
Variational RNN
,Dropout - RNN
,Bayesian interpretation
- arXiv
- Deep Networks with Stochastic Depth (2016. 3)
- Adaptive Computation Time for Recurrent Neural Networks (2016. 3)
ACT
,Dynamically
,Logic Task
- arXiv
- Layer Normalization (2016. 7)
- Recurrent Highway Networks (2016. 7)
- Using Fast Weights to Attend to the Recent Past (2016. 10)
- Professor Forcing: A New Algorithm for Training Recurrent Networks (2016. 10)
- Equality of Opportunity in Supervised Learning (2016. 10)
Equalized Odds
,Demographic Parity
,Bias
- arXiv, the_morning_paper
- Categorical Reparameterization with Gumbel-Softmax (2016. 11)
Gumbel-Softmax distribution
,Reparameterization
,Smooth relaxation
- arXiv, open_review
- Understanding deep learning requires rethinking generalization (2016. 11)
Generalization Error
,Role of Regularization
- arXiv
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (2017. 1)
- A simple neural network module for relational reasoning (2017. 6)
- On Calibration of Modern Neural Networks (2017. 6)
Confidence calibration
,Maximum Calibration Error (MCE)
- arXiv
- When is a Convolutional Filter Easy To Learn? (2017. 9)
Conv + ReLU
,Non-Gaussian Case
,Polynomial Time
- arXiv, open_review
- mixup: Beyond Empirical Risk Minimization (2017. 10)
Data Augmentation
,Vicinal Risk Minimization
,Generalization
- arXiv, open_review
- Measuring the tendency of CNNs to Learn Surface Statistical Regularities (2017. 11)
not learn High Level Semantics
,learn Surface Statistical Regularities
- arXiv, the_morning_paper
- MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels (2017. 12)
MentorNet - StudentNet
,Curriculum Learning
,Output is Weight
- arXiv
- Deep Learning Scaling is Predictable, Empirically (2017. 12)
Power-Law Exponents
,Grow Training Sets
- arXiv, the_morning_paper
- Sensitivity and Generalization in Neural Networks: an Empirical Study (2018. 2)
Robustness
,Data Perturbations
,Survey
- arXiv, open_review
- Can recurrent neural networks warp time? (2018. 2)
RNN
,Learnable Gate
,Chrono Initialization
- open_review
- Spectral Normalization for Generative Adversarial Networks (2018. 2)
GAN
,Training Discriminator
,Constrain Lipschitz
,Power Method
- open_review
- On the importance of single directions for generalization (2018. 3)
Importance
,Confusiing Neurons
,Selective Neuron
,DeepMind
- arXiv, deepmind_blog
- Group Normalization (2018. 3)
Group Normalization (GN)
,Batch (BN)
,Layer (LN)
,Instance (IN)
,Independent Batch Size
- arXiv
- Fast Decoding in Sequence Models using Discrete Latent Variables (2018. 3)
Autoregressive
,Latent Transformer
,Discretization
- arXiv
- Delayed Impact of Fair Machine Learning (2018. 3)
Outcome Curve
,Max Profit, Demographic Parity, Equal Opportunity
- arXiv, the_morning_paper, bair_blog
- How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) (2018. 5)
Smoothing Effect
,BatchNorm’s Reparametrization
- arXiv
- When Recurrent Models Don't Need To Be Recurrent (2018. 5)
- Relational inductive biases, deep learning, and graph networks (2018, 6)
Survey
,Relation
,Graph
- arXiv
- Universal Transformers (2018. 7)
Transformer
,Weight Sharing
,Adaptive Computation Time (ACT)
- arXiv, google_ai_blog
- Identifying Generalization Properties in Neural Networks (2018. 9)
Generalization
,PAC-Bayes
,Hessian
,Perturbation
- arXiv, salesforce_blog
- No Multiplication? No Floating Point? No Problem! Training Networks for Efficient Inference (2018. 9)
Quantization
,Store Multiplication Table
,Memory/Power Resources
- arXiv
- Distributed Representations of Words and Phrases and their Compositionality (2013. 10)
Word2Vec
,CBOW
,Skip-gram
- arXiv
- GloVe: Global Vectors for Word Representation (2014)
Word2Vec
,GloVe
,Co-Occurrence
- paper
- Convolutional Neural Networks for Sentence Classification (2014. 8)
- Neural Machine Translation by Jointly Learning to Align and Translate (2014. 9)
- Text Understanding from Scratch (2015. 2)
CNN
,Character-level
- arXiv
- Ask Me Anything: Dynamic Memory Networks for Natural Language Processing (2015. 6)
- Pointer Networks (2015. 6)
- Skip-Thought Vectors (2015. 6)
- A Neural Conversational Model (2015. 6)
Seq2Seq
,Conversation
- arXiv
- Teaching Machines to Read and Comprehend (2015. 6)
- Effective Approaches to Attention-based Neural Machine Translation (2015. 8)
- Character-Aware Neural Language Models (2015. 8)
CNN
,Character-level
- arXiv
- Neural Machine Translation of Rare Words with Subword Units (2015. 8)
- A Diversity-Promoting Objective Function for Neural Conversation Models (2015. 10)
- Multi-task Sequence to Sequence Learning (2015. 11)
- Multilingual Language Processing From Bytes (2015. 12)
- Strategies for Training Large Vocabulary Neural Language Models (2015. 12)
- Incorporating Structural Alignment Biases into an Attentional Neural Translation Model (2016. 1)
Seq2Seq
,Attention with Structural Biases
,Translation
- arXiv
- Long Short-Term Memory-Networks for Machine Reading (2016. 1)
LSTMN
,Intra-Attention
,RNN
- arXiv
- Recurrent Memory Networks for Language Modeling (2016. 1)
RMN
,Memory Bank
- arXiv
- Exploring the Limits of Language Modeling (2016. 2)
- Swivel: Improving Embeddings by Noticing What's Missing (2016. 2)
Word2Vec
,Swivel
,Co-Occurrence
- arXiv
- Incorporating Copying Mechanism in Sequence-to-Sequence Learning (2016. 3)
- Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models (2016. 4)
- Adversarial Training Methods for Semi-Supervised Text Classification (2016. 5)
- SQuAD: 100,000+ Questions for Machine Comprehension of Text (2016. 6)
- Sequence-Level Knowledge Distillation (2016. 6)
- Attention-over-Attention Neural Networks for Reading Comprehension (2016. 7)
- Recurrent Neural Machine Translation (2016. 7)
Translation
,Attention (RNN)
- arXiv
- An Actor-Critic Algorithm for Sequence Prediction (2016. 7)
- Pointer Sentinel Mixture Models (2016. 9)
- Multiplicative LSTM for sequence modelling (2016. 10)
mLSTM
,Language Modeling
,Character-Level
- arXiv
- Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models (2016. 10)
- Fully Character-Level Neural Machine Translation without Explicit Segmentation (2016. 10)
- Neural Machine Translation in Linear Time (2016. 10)
- Bidirectional Attention Flow for Machine Comprehension (2016. 11)
- Dynamic Coattention Networks For Question Answering (2016. 11)
QA
,DCN
,Coattention Encoder
,Machine Comprehension
- arXiv
- Dual Learning for Machine Translation (2016. 11)
- Neural Machine Translation with Reconstruction (2016. 11)
- Quasi-Recurrent Neural Networks (2016. 11)
- A recurrent neural network without chaos (2016. 12)
RNN
,CFN
,Dynamic
,Chaos
- arXiv
- Comparative Study of CNN and RNN for Natural Language Processing (2017. 2)
Systematic Comparison
,CNN vs RNN
- arXiv
- A Structured Self-attentive Sentence Embedding (2017. 3)
- Dynamic Word Embeddings for Evolving Semantic Discovery (2017. 3)
Word Embedding
,Temporal
,Alignment
- arXiv, the morning paper
- Learning to Generate Reviews and Discovering Sentiment (2017. 4)
Sentiment
,Unsupervised
,OpenAI
- arXiv
- Ask the Right Questions: Active Question Reformulation with Reinforcement Learning (2017. 5)
QA
,Active Question Answering
,RL
,Agent (Reformulate, Aggregate)
- arXiv, open_review
- Reinforced Mnemonic Reader for Machine Reading Comprehension (2017. 5)
QA
,Mnemonic (Syntatic, Lexical)
,RL
,Machine Comprehension
- arXiv
- Attention Is All You Need (2017. 6)
- Depthwise Separable Convolutions for Neural Machine Translation (2017. 6)
SliceNet
,Super-Separable Conv
,Depsewise + Conv 1x1
- arXiv, open_review, note
- MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension (2017. 7)
MEMEN
,QA(MC)
,Embedding(skip-gram)
,Full-Orientation Matching
- arXiv
- On the State of the Art of Evaluation in Neural Language Models (2017. 7)
Standard LSTM
,Regularisation
,Hyperparemeter
- arXiv
- Text Summarization Techniques: A Brief Survey (2017. 7)
- Adversarial Examples for Evaluating Reading Comprehension Systems (2017. 7)
Concatenative Adversaries(AddSent, AddOneSent)
,SQuAD
- arXiv
- Learned in Translation: Contextualized Word Vectors (2017. 8)
Word Embedding
,CoVe
,Context Vector
- arXiv
- Simple and Effective Multi-Paragraph Reading Comprehension (2017. 10)
- Unsupervised Neural Machine Translation (2017. 10)
Train with both direction (tandem)
,Shared Encoder
,Denoising Auto-Encoder
- arXiv, open_review
- Word Translation Without Parallel Data (2017. 10)
Unsupervised
,Multilingual Embedding
,Parallel Dictionary Induction
- arXiv, open_review
- Unsupervised Machine Translation Using Monolingual Corpora Only (2017. 11)
Unsupervised
,Adversarial
,Monolingual Corpora
- arXiv, open_review
- Neural Text Generation: A Practical Guide (2017. 11)
- Breaking the Softmax Bottleneck: A High-Rank RNN Language Model (2017. 11)
MoS (Mixture of Softmaxes)
,Softmax Bottleneck
- arXiv
- Neural Speed Reading via Skim-RNN (2017. 11)
Skim-RNN
,Speed Reading
,Big(Read)-Small(Skim)
,Dynamic
- arXiv, open_review
- Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks (2017. 11)
SCAN
,Compositional
,Mix-and-Match
- arXiv
- The NarrativeQA Reading Comprehension Challenge (2017. 12)
- Hierarchical Text Generation and Planning for Strategic Dialogue (2017. 12)
End2End Strategic Dialogue
,Latent Sentence Representations
,Planning + RL
- arXiv
- Recent Advances in Recurrent Neural Networks (2018. 1)
RNN
,Recent Advances
,Review
- arXiv
- Personalizing Dialogue Agents: I have a dog, do you have pets too? (2018. 1)
Chit-chat
,Profile Memory
,Persona-Chat Dataset
,ParlAI
- arXiv
- Generating Wikipedia by Summarizing Long Sequences (2018. 1)
Multi-Document Summarization
,Extractive-Abstractive Stage
,T-DMCA
,WikiSum
,Google Brain
- arXiv, note, open_review
- MaskGAN: Better Text Generation via Filling in the______ (2018. 1)
MaskGAN
,Neural Text Generation
,RL Approach
- arXiv, open_review, note
- Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs (2018. 1)
Contextual Decomposition (CD)
,Disambiguate interactions between Gates
- arXiv, open_review
- Universal Language Model Fine-tuning for Text Classification (2018. 1)
ULMFiT
,Pre-trained
,Transfer Learning
- arXiv
- DeepType: Multilingual Entity Linking by Neural Type System Evolution (2018. 2)
DeepType
,Symbolic Information
,Type System
,Open AI
- arXiv, openai blog
- Deep contextualized word representations (2018. 2)
- Ranking Sentences for Extractive Summarization with Reinforcement Learning (2018. 2)
Document-Summarization
,Cross-Entropy vs RL
,Extractive
- arXiv
- code2vec: Learning Distributed Representations of Code (2018. 3)
code2vec
,Code Embedding
,Predicting method name
- arXiv
- Universal Sentence Encoder (2018. 3)
Transformer
,Deep Averaging Network (DAN)
,Transfer
- arXiv
- An efficient framework for learning sentence representations (2018. 3)
Sentence Representation
,True Context
,Unsupervised
- arXiv, open_review
- An Analysis of Neural Language Modeling at Multiple Scales (2018. 3)
LSTM vs QRNN
,Hyperparemeter
,AWD-QRNN
- arXiv
- Analyzing Uncertainty in Neural Machine Translation (2018. 3)
Uncertainty
,Beam Search Degradation
,Copy Mode
- arXiv
- An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (2018. 3)
Temporal Convolutional Network (TCN)
,CNN vs RNN
- arXiv
- Training Tips for the Transformer Model (2018. 4)
Transformer
,Hyperparameter
,Multiple GPU
- arXiv
- QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension (2018. 4)
QA
,Conv - Self-Attention
,Backtranslation (Data Augmentation)
- arXiv, open_review, note
- SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach (2018. 4)
Top-K Subject Recognitio
,Relation Classification
- arXiv
- Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer (2018. 4)
Sentiment Transfer
,Disentangle Attribute
,Unsupervised
- arXiv
- Parsing Tweets into Universal Dependencies (2018. 4)
Universal Dependencies (UD)
,TWEEBANK v2
- arXiv
- Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates (2018. 4)
SR
,Subword Sampling + Hyperparameter
,Segmentation (BPE, Unigram)
- arXiv
- Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension (2018. 4)
PI-SQuAD
,Challenge
,Document Encoder
,Scalability
- arXiv
- GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding (2018. 4)
GLUE
,Benchmark
,Understanding
- arXiv, leaderboard
- On the Practical Computational Power of Finite Precision RNNs for Language Recognition (2018. 5)
Unbounded counting
,IBFP-LSTM
- arXiv
- Paper Abstract Writing through Editing Mechanism (2018. 5)
Writing-editing Network
,Attentive Revision Gate
- arXiv
- A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings (2018. 5)
Unsupervised initialization scheme
,Robust self-leraning
- arXiv
- Efficient and Robust Question Answering from Minimal Context over Documents (2018. 5)
- Global-Locally Self-Attentive Dialogue State Tracker (2018. 5)
GLAD
,WoZ and DSTC2 Dataset
- arXiv
- Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information (2018, 5)
Dataset
,EVPI
,ACL 2018 Best Paper
- arXiv
- Know What You Don't Know: Unanswerable Questions for SQuAD (2018, 6)
SQuAD 2.0
,Negative Example
,ACL 2018 Best Paper
- arXiv, leaderboard
- The Natural Language Decathlon: Multitask Learning as Question Answering (2018, 6)
decaNLP
,Multitask Question Answering Network (MQAN)
,Transfer Learning
- arXiv
- GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations (2018, 6)
Transfer Learning Framework
,Structured Graphical Representations
- arXiv
- Improving Language Understanding by Generative Pre-Training (2018, 6)
Transformer
,Generative Pre-Training
,Discriminative Fine-Tuning
- paper, open_ai_blog
- Finding Syntax in Human Encephalography with Beam Search (2018, 6)
RNNG+beam search
,ACL 2018 Best Paper
- arXiv
- Let's do it "again": A First Computational Approach to Detecting Adverbial Presupposition Triggers (2018, 6)
Task
,Dataset
,Weighted-Pooling (WP)
ACL 2018 Best Paper
- arXiv
- QuAC : Question Answering in Context (2018. 8)
Information-Seeking dialog
,Challenge
,Without Evidence
- arXiv, leaderboard
- CoQA: A Conversational Question Answering Challenge (2018. 8)
Abstractive with Extractive Rationale
,Challenge
,Coreference and Pragmatic Reasoning
- arXiv, leaderboard
- Contextual Parameter Generation for Universal Neural Machine Translation (2018. 8)
Parameter Generation
,Language Embedding
,EMNLP 2018
- arXiv
- Evaluating Theory of Mind in Question Answering (2018. 8)
Dataset
,Higher-order Beliefs
,EMNLP 2018
- arXiv
- Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text (2018. 9)
GRAFT-Net
,KB+Text Fusion
,EMNLP 2018
- arXiv
- HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (2018. 9)
Dataset
,Multi-hop
,Sentence-level Supporting Fact
,EMNLP 2018
- arXiv
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018. 10)
BERT
,Discriminative
,Pre-trained
,Transfer Learning
,NAACL 2019 Best
- arXiv
- Trellis Networks for Sequence Modeling (2018. 10)
TrellisNet
,Structural bridge between TCN and RNN
,NAACL 2019
- arXiv
- CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge (2018. 11)
CommonsenseQA
,Dataset
,Multiple-Choice
,NAACL 2019 Best
- arXiv
- Cross-lingual Language Model Pretraining (2019. 1)
XLM
,MLM + TLM
,Cross-lingual Pre-trained
,Low-Resource
- arXiv
- Better Language Models and Their Implications (2019. 2)
- Parameter-Efficient Transfer Learning for NLP (2019. 2)
Adapter tuning
,Bottleneck
,BERT
- arXiv
- To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (2019. 3)
Fine-tuning vs Feature
,BERT and ELMo
,Empirically analyze
- arXiv
- Linguistic Knowledge and Transferability of Contextual Representations (2019. 3)
Analysis CWRs
,LSTM, Transformer
,Transferable
,NAACL 2019
- arXiv
- ERNIE: Enhanced Representation through Knowledge Integration (2019. 4)
ERNIE
,Masking Strategies
,Dialog Language Model
,Pre-trained
,Transfer Learning
- arXiv
- CNM: An Interpretable Complex-valued Network for Matching (2019. 4)
CNM
,Quantum Physics
,Interpretable
,NAACL 2019 Best
- arXiv
- Unsupervised Recurrent Neural Network Grammars (2019. 4)
RNNG
,Syntax Tree
,Variational Inference
- arXiv
- The Curious Case of Neural Text Degeneration (2019. 4)
Nucleus Sampling
,Decoding Method
,Generation
- arXiv
- Unified Language Model Pre-training for Natural Language Understanding and Generation (2019. 5)
UniLM
,Uni + Bi + S2S
,Generation
- arXiv
- SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems (2019. 5)
SuperGLUE
,Benchmark
,Understanding
- arXiv, leaderboard
- SpanBERT: Improving Pre-training by Representing and Predicting Spans (2019. 7)
SpanBERT
,Span Boundary Objective (SBO)
,Pre-train
,Transformer
- arXiv
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (2019. 7)
RoBERTa
,Data-BatchSize
,Pre-train
,Transformer
- arXiv
- ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (2019. 7)
ERNIE
,Continual Pre-training
,Word-Struct-Semantic
,Transformer
- arXiv
- StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (2019. 8)
StructBERT (ALICE)
,Language Structure
,Pre-train
,Transformer
- arXiv
- Matching Networks for One Shot Learning (2016. 6)
Matching Nets
,Non-Parametric
,DeepMind
- arXiv, the morning paper
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017. 3)
- SMASH: One-Shot Model Architecture Search through HyperNetworks (2017. 8)
SMASH
,HyperNet
,Prior Knowledge
- arXiv, open_review
- Reptile: a Scalable Metalearning Algorithm (2018. 3)
Reptile
,Meta-Learning
,Few-Shot
,OpenAI
- arXiv, openai_blog
- Understanding the difficulty of training deep feedforward neural networks (2010)
- On the difficulty of training Recurrent Neural Networks (2012. 11)
Gradient Clipping
,RNN
- arXiv
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (2015. 2)
- A Simple Way to Initialize Recurrent Networks of Rectified Linear Units (2015. 4)
Weight Initialization
,RNN
,Identity Matrix
- arXiv
- Cyclical Learning Rates for Training Neural Networks (2015. 6)
CLR
,Triangular, ExpRange
,Longtherm Benefit
- arXiv
- On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima (2016. 9)
Generalization
,Sharpness of Minima
- arXiv
- Neural Optimizer Search with Reinforcement Learning (2017. 9)
Neural Optimizer Search (NOS)
,PowerSign
,AddSign
- arXiv
- On the Convergence of Adam and Beyond (2018. 2)
AMSGrad
,Convex optimization
- open_review
- Adafactor: Adaptive Learning Rates with Sublinear Memory Cost (2018. 4)
Adafactor
,Adaptive Method
,Update Clipping
- arXiv
- Revisiting Small Batch Training for Deep Neural Networks (2018. 4)
Generalization Performance
,Training Stability
- arXiv
- Reconciling modern machine learning and the bias-variance trade-off (2018. 12)
Double Descent Risk Curve
,Highly Complex Models
- arXiv
- Progressive Neural Networks (2016. 6)
ProgNN
,Incorporate Prior Knowledge
- arXiv, the morning paper
- Neural Architecture Search with Reinforcement Learning (2016. 11)
NAS
,Google AutoML
,Google Brain
- arXiv
- Third-Person Imitation Learning (2017. 3)
Imitation Learning
,Unsupervised (Third-Person)
,GAN + Domain Confusion
- arXiv
- Noisy Networks for Exploration (2017. 6)
- Efficient Neural Architecture Search via Parameter Sharing (2018. 2)
ENAS
,Google AutoML
,Google Brain
- arXiv
- Learning by Playing - Solving Sparse Reward Tasks from Scratch (2018. 2)
- Investigating Human Priors for Playing Video Games (2018. 2)
prior knowledge
,key factor
- open_review
- World Models (2018. 3)
Generative + RL
,VAE (V)
,MDN-RNN (M)
,Controller (C)
- arXiv
- Unsupervised Predictive Memory in a Goal-Directed Agent (2018. 3)
MERLIN
,Memory + RL + Inference
,Partial Observability
- arXiv, google_ai_blog
- ...
- Auto-Encoding Variational Bayes (2013. 12)
- Generative Adversarial Networks (2014. 6)
- Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data (2016. 5)
DVBF
,Variational Inference
,SVGB
- arXiv, open_review
- SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient (2016. 9)
- Structured Inference Networks for Nonlinear State Space Models (2016. 9)
Structured Variational Approximation
,SVGB
- arXiv
- beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework (2016. 11)
Beta-VAE
,Disentangled
- open_review
- A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning (2017. 10)
Kalman VAE
,LGSSM
- arXiv
- Self-Attention Generative Adversarial Networks (2018. 5)
SAGAN
,Attention-Driven
,Spectral Normalization
- arXiv
- Unsupervised Data Augmentation (2019. 4)
UDA
,TSA Schedule
,Semi-Supervised
- arXiv