Multi-lingual agents through multi-headed neural networks


  • Jonathan D. Thomas University of Bristol
  • Ra´ul Santos-Rodr´ıguez University of Bristol
  • Mihai Anca University of Bristol
  • Robert Piechocki University of Bristol



Cooperative AI, Multi-Agent Reinforcement Learning, Emergent Communication, Continual Learning


This paper considers cooperative Multi-Agent Reinforcement Learning, focusing on emergent communication in settings where multiple pairs of independent learners interact at varying frequencies. In this context, multiple distinct and incompatible languages can emerge. When an agent encounters a speaker of an alternative language, there is a requirement for a period of adaptation before they can efficiently converse. This adaptation results in the emergence of a new language and the forgetting of the previous language. In principle, this is an example of the Catastrophic Forgetting problem which can be mitigated by enabling the agents to learn and maintain multiple languages. We take inspiration from the Continual Learning literature and equip our agents with multi-headed neural networks which enable our agents to be multi-lingual. Our method is empirically validated within a referential MNIST-based communication game and is shown to be able to maintain multiple languages where existing approaches cannot.


Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pages 330–337, 1993.

Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 2145–2153, 2016. ISBN 9781510838819. doi: 10.5555/3157096.3157336.

Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, DJ Strouse, Joel Z Leibo, and Nando De Freitas. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In International Conference on Machine Learning, pages 3040–3049. PMLR, 2019.

Tom Eccles, Yoram Bachrach, Guy Lever, Angeliki Lazaridou, and Thore Graepel. Biases for emergent communication in multi-agent reinforcement learning. Advances in neural information processing systems, 32, 2019. doi: 10.5555/3454287.3455463.

Robert M French. Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences, (4):128–135, 1999. ISSN 1364-6613. doi: 01294-2.

Sainbayar Sukhbaatar, Arthur Szlam, and Rob Fergus. Learning multiagent communication with backpropagation. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 2252–2260, 2016. ISBN 9781510838819. doi: 10.5555/3157096.3157348.

Abhishek Das, Th´eophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Mike Rabbat, and Joelle Pineau. Tarmac: Targeted multi-agent communication. In International Conference on Machine Learning, pages 1538– 1546. PMLR, 2019.

Jiechuan Jiang, Chen Dun, Tiejun Huang, and Zongqing Lu. Graph convolutional reinforcement learning. In International Conference on Machine Learning, 2020.

Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, and Yann Dauphin. On the pitfalls of measuring emergent communication. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2019. doi: 10.5555/3306127.3331757.

Hengyuan Hu, Adam Lerer, Alex Peysakhovich, and Jakob Foerster. “otherplay” for zero-shot coordination. In International Conference on Machine Learning, pages 4399–4410. PMLR, 2020. doi: 10.5555/3524938.3525347.

Johannes Treutlein, Michael Dennis, Caspar Oesterheld, and Jakob Foerster. A new formalism, method and open issues for zero-shot coordination. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 10413–10423. PMLR, Jul 2021.

Kalesha Bullard, Douwe Kiela, Franziska Meier, Joelle Pineau, and Jakob Foerster. Quasi-equivalence discovery for zeroshot emergent communication. arXiv preprint arXiv:2103.08067, 2021.

L. Shapley. Stochastic games*. Proceedings of the National Academy of Sciences, 39:1095 – 1100, 1953. doi: 10.1073/pnas.39.10.1095.

Matthias Delange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Greg Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2021. doi: 10.1109/ TPAMI.2021.3057446.

Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 647–655. PMLR, Jun 2014. doi: 10. 5555/3044805.3044879.

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. 2019. doi: 10.5555/3454287.3455008.

Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010. URL http: //