The Bigger the Better?

The Size of Language Models and the Dispute over Alternative Architectures

Authors

  • Susanne Förster

DOI:

https://doi.org/10.7146/aprja.v12i1.140444

Keywords:

Database, Digital Infrastructures, Language Models, Machine Learning, Hallucination

Abstract

This article looks at a controversy over the ‘better’ architecture for conversational AI that unfolds initially along the question of the ‘right’ size of models. Current generative models such as ChatGPT and DALL-E follow the imperative of the largest possible, ever more highly scalable, training dataset. I therefore first describe the technical structure of large language models and then address the problems of these models which are known for reproducing societal biases or so-called hallucinations. As an ‘alternative’, computer scientists and AI experts call for the development of much smaller language models linked to external databases, that should minimize the issues mentioned above. As this paper will show, the presentation of this structure as ‘alternative’ adheres to a simplistic juxtaposition of different architectures that follows the imperative of a computable reality, thereby causing problems analogous to the ones it tried to circumvent.

Downloads

Published

2023-09-07

Issue

Section

Articles