The Bigger the Better?
The Size of Language Models and the Dispute over Alternative Architectures
Keywords:Database, Digital Infrastructures, Language Models, Machine Learning, Hallucination
This article looks at a controversy over the ‘better’ architecture for conversational AI that unfolds initially along the question of the ‘right’ size of models. Current generative models such as ChatGPT and DALL-E follow the imperative of the largest possible, ever more highly scalable, training dataset. I therefore first describe the technical structure of large language models and then address the problems of these models which are known for reproducing societal biases or so-called hallucinations. As an ‘alternative’, computer scientists and AI experts call for the development of much smaller language models linked to external databases, that should minimize the issues mentioned above. As this paper will show, the presentation of this structure as ‘alternative’ adheres to a simplistic juxtaposition of different architectures that follows the imperative of a computable reality, thereby causing problems analogous to the ones it tried to circumvent.
Copyright (c) 2023 Susanne Förster
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyrights are held by the individual authors of articles.
The journal is free of charge for readers.
APRJA does not charge authors for Article Processing Costs (APC)