The Darwinian Mind of the Machine: Rethinking LLM Training as Evolution

Marc van Oostendorp; Roberta D'Alessandro

doi:10.7557/12.8313

Authors

Marc van Oostendorp Radboud University Nijmegen https://orcid.org/0009-0000-8830-6976 (unauthenticated)
Roberta D'Alessandro Utrecht University https://orcid.org/0000-0002-0165-5901 (unauthenticated)

DOI:

https://doi.org/10.7557/12.8313

Keywords:

LLMs, language acquisition, grammar, evolution, poverty of the stimulus, evolutionary competence

Abstract

This essay challenges the prevailing metaphor of “learning” used to describe Large Language Model (LLM) training, proposing instead that these systems represent a form of hyper-accelerated, data-driven evolution. Through analysis of Daniel Dennett’s hierarchy of evolutionary competence and examination of the poverty of the stimulus problem, we argue that LLMs are Darwinian creatures evolved at computational speeds in environments of pure text. This framework explains their linguistic capabilities through convergent evolution rather than learning, resolves paradoxes about their competence without understanding, and for our understanding of the relevance for these models for generative grammar.

References

Attah, Nuhu Osman. 2025. Do language models lack communicative intentions? Synthese 205 187. https://doi.org/10.1007/s11229-025-05022-6. DOI: https://doi.org/10.1007/s11229-025-05022-6

Baker, Mark C. 2008. The macroparameter in a microparametric world. Linguistic Analysis 34 1–2: 1–46. https://doi.org/10.1075/la.132.16bak. DOI: https://doi.org/10.1075/la.132.16bak

Battaglia, PeterW., Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinícius Flores Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, C¸ aglar G¨ulc¸ehre, H. Francis Song, Andrew J. Ballard, Justin Gilmer, George E. Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt

Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pascanu. 2018. Relational inductive biases, deep learning, and graph networks. arXiv preprint 1806.01261. https://doi.org/10.48550/arXiv.1806.01261.

Bowdon, Chris. 2025. How many parameters does GPT-5 have? Available at https://www.r-bloggers.com/2025/08/how-many-parameters-does-gpt-5-have/, accessed 2025.

Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901. Curran Associates.

Browning, Jacob. 2025. Intentionality all-stars redux: Do language models know what they are talking about? In Communicating with AI: Philosophical Perspectives, edited by Herman Cappelen and Rachel Sterken. Oxford University Press, Oxford. Preprint available at https://philarchive.org/rec/BROIAR-4.

Chomsky, Noam. 1959. Review of B. F. Skinner’s Verbal Behavior. Language 35: 26–58. DOI: https://doi.org/10.2307/411334

Chomsky, Noam. 1965. Aspects of the Theory of Syntax. MIT Press, Cambridge, Ma. DOI: https://doi.org/10.21236/AD0616323

Chomsky, Noam. 1981. Lectures on Government and Binding: The Pisa Lectures. Foris, Dordrecht. https://doi.org/10.1515/9783110884166. DOI: https://doi.org/10.1515/9783110884166

Chomsky, Noam. 1986. Knowledge of Language: Its Nature, Origin, and Use. Praeger, New York.

Chomsky, Noam. 1995. The Minimalist Program. MIT Press, Cambridge, Ma. https://doi.org/10.7551/mitpress/9780262527347.001.0001. DOI: https://doi.org/10.7551/mitpress/9780262527347.001.0001

Chomsky, Noam. 2001. Derivation by phase. In Ken Hale: A Life in Language, edited by Michael Kenstowicz, no. 36 in Current Studies in Linguistics, pp. 1–52. MIT Press, Cambridge, Ma. https://doi.org/10.7551/mitpress/4056.003.0004. DOI: https://doi.org/10.7551/mitpress/4056.003.0004

Chomsky, Noam. 2005. Three factors in language design. Linguistic Inquiry 36 1: 1–22. https://doi.org/10.1162/0024389052993655. DOI: https://doi.org/10.1162/0024389052993655

Chomsky, Noam and Robert C. Berwick. 2017. Why Only Us: Language and Evolution. MIT Press, Cambridge, Ma. https://doi.org/10.7551/mitpress/9780262034241.001.0001. DOI: https://doi.org/10.7551/mitpress/9780262034241.001.0001

Dennett, Daniel C. 2017. From Bacteria to Bach and Back: The Evolution of Minds. W. W. Norton & Company, New York.

Diester, Ilka, Miklíos Bartos, Jörg Bödecker, Andreas Kortylewski, Christian Leibold, Johannes Letzkus, Mohamed M. Nour, Matthias M. Schönauer, Alexander Straw, Andreas Vlachos, and Thomas Brox. 2024. Internal world models in humans, animals, and AI. Neuron 112 16: 2661–2824. https://doi.org/10.1016/j.neuron.2024.06.019. DOI: https://doi.org/10.1016/j.neuron.2024.07.021

Giorgi, Alessandra and Giuseppe Longobardi. 1991. The Syntax of Noun Phrases: Configuration, Parameters and Empty Categories. No. 57 in Cambridge Studies in Linguistics. Cambridge University Press, Cambridge.

Gregory, Richard L. 1963. Distortion of visual space as inappropriate constancy scaling. Nature 199: 678–680. https://doi.org/10.1038/199678a0. DOI: https://doi.org/10.1038/199678a0

Gregory, Richard L. 1970. The Intelligent Eye. Weidenfeld & Nicolson, London.

Hao, Shibo, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, and Zhiting Hu. 2023. Reasoning with language model is planning with world model. arXiv preprint 2305.14992. https://doi.org/10.48550/arXiv.2305.14992.

Hu, Jennifer, Kyle Mahowald, Gary Lupyan, Anna Ivanova, and Roger Levy. 2024. Language models align with human judgments on key grammatical constructions. Proceedings of the National Academy of Sciences 121 36. https://doi.org/10.1073/pnas.2400917121. DOI: https://doi.org/10.1073/pnas.2400917121

Intuition Lab. 2024. Mechanistic interpretability: Understanding AI and LLMs. Available at https://intuitionlabs.ai/articles/mechanistic-interpretability-ai-llms, accessed 2025.

Jackendoff, Ray. 1977. X-bar Syntax: A Study of Phrase Structure. Linguistic Inquiry Monographs. MIT Press, Cambridge, Ma.

Jin, Charles and Martin Rinard. 2024. Emergent representations of program semantics in language models trained on programs. arXiv preprint 2305.11169. https://doi.org/10.48550/arXiv.2305.11169.

Kayne, Richard S. 1994. The Antisymmetry of Syntax. No. 25 in Linguistic Inquiry Monographs. MIT Press, Cambridge, Ma.

McCoy, R. Thomas, Robert Frank, and Tal Linzen. 2020. Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks. Transactions of the Association for Computational Linguistics 8: 125–140. https://doi.org/10.1162/tacl_a_00304. DOI: https://doi.org/10.1162/tacl_a_00304

Moro, Andrea. 2016. Impossible Languages. MIT Press, Cambridge, Ma. https://doi.org/10.7551/mitpress/9780262034890.001.0001. DOI: https://doi.org/10.7551/mitpress/9780262034890.001.0001

Moro, Andrea. 2025. Linguistics in a battlefield. A short note on syntax and the “Newtonian style of research”. Available at https://ling.auf.net/lingbuzz/008827.

Mulders, Iris and Eddy Ruys. 2024. ChatGPT as an informant. Nota Bene 1 2: 242–260. https://doi.org/10.1075/nb.00015.mul. DOI: https://doi.org/10.1075/nb.00015.mul

Müller, Stefan. 2025. Large language models: The best linguistic theory, a wrong linguistic theory, or no theory at all? Journal of the Linguistic Society of Germany 44 1. https://doi.org/10.18148/zs/2025-2001.

Murphy, Elliot, Evelina Leivada, Vittoria Dentella, Fritz Gunther, and Gary Marcus. 2025. Fundamental principles of linguistic structure are not represented by o3. arXiv preprint 2502.10934. https://doi.org/10.48550/arXiv.2502.10934. DOI: https://doi.org/10.5964/bioling.19021

Nalpas, Maud. 2024. LLM sizes. Available at https://web.dev/articles/llm-sizes, accessed 2025.

Ngaihlian, Dorothy. 2025. Machine learning algorithms: Simulating intentionality in artificial intelligence. https://doi.org/10.2139/ssrn.5271061. DOI: https://doi.org/10.2139/ssrn.5271061

Ouyang, Long, JeffreyWu, Xu Jiang, Diogo Almeida, CarrollWainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744. Curran Associates. DOI: https://doi.org/10.52202/068431-2011

Piantadosi, Steven T. 2024. Modern language models refute Chomsky’s approach to language. In From Fieldwork to Linguistic Theory: A Tribute to Dan Everett, edited by Edward Gibson and Moshe Poliak, no. 15 in Empirically Oriented Theoretical Morphology and Syntax, pp. 353–414. Language Science Press, Berlin. https://doi.org/10.5281/zenodo.12665933.

Piantadosi, Steven T. and Yuan Yang. 2022. Reply to Murphy and Leivada: Program induction can learn language. Proceedings of the National Academy of Sciences 119 23: e2202925119. https://doi.org/10.1073/pnas.2202925119. DOI: https://doi.org/10.1073/pnas.2202925119

Popper, Karl. 1963. Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge, London.

Popper, Karl. 1972. Objective Knowledge: An Evolutionary Approach. Oxford University Press, Oxford.

Qiu, Zhuang, Xufeng Duan, and Zhenguang G. Cai. 2024. Evaluating grammatical well-formedness in large language models: A comparative study with human judgments. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pp. 189–198. Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.cmcl-1.16. DOI: https://doi.org/10.18653/v1/2024.cmcl-1.16

Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. Tech. rep., OpenAI.

Skinner, B. F. 1938. The Behavior of Organisms: An Experimental Analysis. Appleton-Century-Crofts, New York.

Skinner, B. F. 1953. Science and Human Behavior. Macmillan, New York.

Stowell, Timothy. 1981. Origins of Phrase Structure. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, Ma.

Tak, Ala N., Amin Banayeeanzade, Anahita Bolourani, Mina Kian, Robin Jia, and Jonathan Gratch. 2025. Mechanistic interpretability of emotion inference in large language models. In Findings of the Association for Computational Linguistics: ACL 2025, pp. 13090–13120. Association for Computational Linguistics, Vienna. https://doi.org/10.18653/v1/2025.findings-acl.679. DOI: https://doi.org/10.18653/v1/2025.findings-acl.679

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, vol. 30. Curran Associates.

Yildirim, Ilker and L.A. Paul. 2024. From task structures to world models: what do LLMs know? Trends in Cognitive Sciences 28 5: 404–415. https://doi.org/10.1016/j.tics.2024.02.008. DOI: https://doi.org/10.1016/j.tics.2024.02.008

The Darwinian Mind of the Machine: Rethinking LLM Training as Evolution

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Latest publications

Information

Keywords