Distinguished Lecture Series: Sepp Hochreiter

February 7, 2024, 11:30 a.m. (CET)

Memory Concepts for Large Language Models

Time: February 7, 2024, 11:30 a.m. (CET)
  Universitätstraße 32.101, Campus Vaihingen of the University of Stuttgart. The talk will take place in person.
Download as iCal:

The Stuttgart ELLIS Unit is pleased to announce the upcoming ELLIS Distinguished Lecture Series talk by Sepp Hochreiter (Johannes Kepler Universität Linz).

No reservation required - we are looking forward to seeing you there!

With any questions, or to request a meeting with Prof. Hochreiter, please contact the new coordinator of the Stuttgart ELLIS Unit, Katrin Fauß, at ellis-office@uni-stuttgart.de

Title: Memory Concepts for Large Language Models

Abstract:

Currently, the most successful Deep Learning architecture for large language models is the transformer. The attention mechanism of the transformer is equivalent to modern Hopfield networks, therefore is an associative memory. However, this associative memory has disadvantages like its quadratic complexity with the sequence length when mutually associating sequences elements, its restriction to pairwise associations, its limitations in modifying the memory, its insufficient abstraction capabilities. The memory grows with growing context. In contrast, recurrent neural networks (RNNs) like LSTMs have linear complexity, associate sequence elements with a representation of all previous elements, can directly modify memory content, and have high abstraction capabilities. The memory is fixed independent of the context. However, RNNs cannot store sequence elements that were rare in the training data, since RNNs have to learn to store. Transformer can store rare or even new sequence elements, which is one of the main reasons besides their high parallelization why they outperformed RNNs in language modelling. I think that future successful Deep Learning architectures should comprise both of these memories: attention for implementing episodic memories and RNNs for implementing short-term memories and abstraction.

Bio:

Sepp Hochreiter leitet das Institut für Machine Learning, das LIT AI Lab und das Audi.JKU Deep Learning Center der Johannes Kepler Universität in Linz. Er forscht auf dem Gebiet des maschinellen Lernens und ist ein Pionier des boomenden Forschungsfeldes Deep Learning, das gerade die künstliche Intelligenz revolutioniert. Bekannt wurde Prof. Hochreiter durch die Entdeckung und Entwicklung von “Long Short Term Memory” (LSTM) in seiner Diplomarbeit im Jahre 1991, welche später 1997 publiziert wurde. In jüngster Zeit hat sich LSTM zur besten Methode für Sprach- und Textverarbeitung entwickelt, wo es neue Rekorde aufstellte. Seit 2012 wird LSTM in Google’s Android Spracherkenner verwendet, seit 2015 in Google’s Voice Transcription, seit 2016 in Google’S Allo und auch seit 2016 in Apple’s iOS 10. Zurzeit treibt Prof. Hochreiter die theoretischen Grundlagen von Deep Learning voran, indem er den Gradientenfluss durch neuronale Netze analysiert, das Konzept von “Self-Normalizing Networks” entwickelt und “Generative Adversarial Networks” (GANs) sowie “Reinforcement”-Algorithmen verbessert. Derzeitige Projekte umfassen Deep Learning für Medikamentenentwicklung, für Text- und Sprachanalyse, für Bildverarbeitung und im speziellen für autonomes Fahren.

ellis events 

Sepp Hochreiter
[Picture: Barbara Klaczak]

Further events can be found on the websites of the institutes and the faculty.

 

To the top of the page