The Bigger the Better?

The Size of Language Models and the Dispute over Alternative Architectures

Authors

  • Susanne Förster University of Siegen

DOI:

https://doi.org/10.7146/aprja.v12i1.140444

Keywords:

Database, Digital Infrastructures, Language Models, Machine Learning, Hallucination

Abstract

This article looks at a controversy over the ‘better’ architecture for conversational AI that unfolds initially along the question of the ‘right’ size of models. Current generative models such as ChatGPT and DALL-E follow the imperative of the largest possible, ever more highly scalable, training dataset. I therefore first describe the technical structure of large language models and then address the problems of these models which are known for reproducing societal biases or so-called hallucinations. As an ‘alternative’, computer scientists and AI experts call for the development of much smaller language models linked to external databases, that should minimize the issues mentioned above. As this paper will show, the presentation of this structure as ‘alternative’ adheres to a simplistic juxtaposition of different architectures that follows the imperative of a computable reality, thereby causing problems analogous to the ones it tried to circumvent.

Author Biography

Susanne Förster, University of Siegen

Susanne Förster is a PhD candidate and research associate in the project “Agentic Media: Formations of Semi-Autonomy” at the University of Siegen. Her work deals with imaginaries and infrastructures of conversational artificial agents. Previously, she coordinated exhibitions at Haus der Kulturen der Welt (HKW), Berlin.

Downloads

Published

2023-09-07

Issue

Section

Articles