- LeCun, Yann A., et al. "Efficient backprop." Neural networks: Tricks of the trade. Springer Berlin Heidelberg, 2012. 9–48.
- Cybenko, George. "Approximation by superpositions of a sigmoidal function." Mathematics of Control, Signals, and Systems (MCSS) 2.4 (1989): 303-314.
- Hornik, Kurt. "Approximation capabilities of multilayer feedforward networks." Neural networks 4.2 (1991): 251-257.
- Sonoda, Sho, and Noboru Murata. "Neural network with unbounded activation functions is universal approximator." Applied and Computational Harmonic Analysis (2015).
- Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learning long-term dependencies with gradient descent is difficult." IEEE transactions on neural networks 5.2 (1994): 157-166.
- Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. "Deep Sparse Rectifier Neural Networks." International Conference on Artificial Intelligence and Statistics. 2011.
- Goodfellow, Ian, et al. "Maxout Networks." Proceedings of the 30th International Conference on Machine Learning (ICML-13). 2013.
- Springenberg, Jost Tobias, and Martin Riedmiller. "Improving deep neural networks with probabilistic maxout units." arXiv preprint arXiv:1312.6116 (2013).
- Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." Proceedings of the 2nd International Conference on Learning Representations. (2014).
- Clevert, Djork-Arné, Thomas Unterthiner, and Sepp Hochreiter. "Fast and accurate deep network learning by exponential linear units (elus)." arXiv preprint arXiv:1511.07289 (2015).
- Gulcehre, Caglar, et al. "Noisy Activation Functions." Proceedings of The 33rd International Conference on Machine Learning. 2016.
- Klambauer, Günter, et al. "Self-Normalizing Neural Networks." arXiv preprint arXiv:1706.02515 (2017).
- Zhang, Jian, Ioannis Mitliagkas, and Christopher Ré. "YellowFin and the Art of Momentum Tuning." arXiv preprint arXiv:1706.03471 (2017).
- Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. "On the difficulty of training recurrent neural networks." Proceedings of The 30th International Conference on Machine Learning. 2013.
- Chawla, Nitesh V., et al. "SMOTE: synthetic minority over-sampling technique." Journal of artificial intelligence research 16 (2002): 321–357.
- Nguyen, Hien M., Eric W. Cooper, and Katsuari Kamei. "Borderline over-sampling for imbalanced data classification." International Journal of Knowledge Engineering and Soft Data Paradigms 3.1 (2011): 4–21.