**Publications & Preprints**

**Continuous-Time Meta-Learning with Forward Mode Differentiation.**T. Deleu, D. Kanaa, L. Feng, G.Kerg, Y. Bengio, G. Lajoie, P.-L. Bacon. Accepted at the*Tenth**International Conference on Learning Representations (ICLR), 20**22.*openreview**Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization.**S. Jastrzebski, D. Arpit, O. Astrand, G.Kerg, H. Wang, C. Xiong, R. Socher, K. Cho and K. Geras. Accepted at the*Thirty-eighth International Conference on Machine Learning (IMCL)**, 2021.*arxiv**Network-level computational advantages of single-neuron adaptation.**V. Geadah, G. Lajoie, G.Kerg, S. Horoi and G. Wolf. Accepted at*Computational and Systems Neuroscience (COSYNE), 2021.***Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization.**S. Jastrzebski, D. Arpit, O. Astrand, G.Kerg, H. Wang, C. Xiong, R. Socher, K. Cho and K. Geras. Accepted at the*Proceedings of the NeurIPS 2020 Workshop on Optimization**in Machine Learning (OPT), 2020**.*paper**Untangling trade-offs between recurrence and self-attention in artificial neural networks.**G.Kerg*, B. Kanuparthi*, A. Goyal, K. Goyette, Y. Bengio and G. Lajoie. Accepted at the*Advances in Neural Information Processing Systems (NeurIPS), 2020.*arxiv**Guarantees for stable signal and gradient propagation in self-attentive recurrent networks.**G.Kerg, B. Kanuparthi, A. Goyal, K. Goyette, Y. Bengio and G. Lajoie. Accepted at*DeepMath 2020 (Conference on the Mathematical Theory of Deep Neural Networks), 2020.***Learning Long-term Dependencies Using Cognitive Inductive Biases in Self-attention RNNs.**G.Kerg*, B. Kanuparthi*, A. Goyal, K. Goyette, Y. Bengio and G. Lajoie. Accepted at the*ICML 2020 Inductive biases, invariances and generalization in RL workshop, 2020.*Also accepted at the*Montreal AI Symposium (MAIS), 2020.*paper**Advantages of biologically-inspired adaptive neural activation in RNNs during learning.**V. Geadah, G.Kerg, S. Horoi, G. Wolf and G. Lajoie. arxiv**Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics.**G.Kerg*, K. Goyette*, M.P. Touzel, G. Gidel, E. Vorontsov, Y. Bengio and G. Lajoie. Accepted at the*Advances in Neural Information Processing Systems (NeurIPS), 2019.*arxiv**h-detach: Modifying the LSTM gradient towards better optimization.**B. Kanuparthi*, D. Arpit*, G.Kerg, R. Ke, I. Mitliagkas and Y. Bengio. Accepted at the*Seventh International Conference on Learning Representations (ICLR), 2019.*openreview arxiv**Safe Screening for Support Vector Machines.**J. Zimmert, C. Schröder de Witt, G.Kerg, and M. Kloft. Accepted at the*Proceedings of the NIPS 2015 Workshop on Optimization in Machine Learning (OPT), 2015.*paper**On Neretin's group of tree spheromorphisms.**Master thesis for MSc in pure mathematics,*Université Libre de Bruxelles*, 2013. thesis**Expansion in groups.**Essay for Part III of the Math Tripos (equivalent to Master thesis),*University of Cambridge*, 2012. thesis