Google’s new neural-net LLM architecture separates memory components to control exploding costs of capacity and compute


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


A new neural-network architecture developed by researchers at Google might solve one of the great challenges for large language models (LLMs): extending their memory at inference time without exploding the costs of memory and compute. Called Titans, the architecture enables models to find and store during inference small bits of information that are important in long sequences. 

Titans combines traditional LLM attention blocks with “neural memory” layers that enable models to handle both short- and long-term memory tasks efficiently. According to the researchers, LLMs that use neural long-term memory can scale to millions of tokens and outperform both classic LLMs and alternatives such as Mamba while having many fewer parameters. 

Attention layers and linear models

The classic transformer architecture used in LLMs employs the self-attention mechanism to compute the relations between tokens. This is an effective technique that can learn complex and granular patterns in token sequences. However, as the sequence length grows, the computing and memory costs of calculating and storing attention increase quadratically.

More recent proposals involve alternative architectures that have linear complexity and can scale without exploding memory and computation costs. However, the Google researchers argue that linear models do not show competitive performance compared to classic transformers, as they compress their contextual data and tend to miss important details.

The ideal architecture, they suggest, should have different memory components that can be coordinated to use existing knowledge, memorize new facts, and learn abstractions from their context. 

“We argue that in an effective learning paradigm, similar to [the] human brain, there are distinct yet interconnected modules, each of which is responsible for a component crucial to the learning process,” the researchers write.

Neural long-term memory

“Memory is a confederation of systems — e.g., short-term, working, and long-term memory — each serving a different function with different neural structures, and each capable of operating independently,” the researchers write.

To fill the gap in current language models, the researchers propose a “neural long-term memory” module that can learn new information at inference time without the inefficiencies of the full attention mechanism. Instead of storing information during training, the neural memory module learns a function that can memorize new facts during inference and dynamically adapt the memorization process based on the data it encounters. This solves the generalization problem that other neural network architectures suffer from.

To decide which bits of information are worth storing, the neural memory module uses the concept of “surprise.” The more a sequence of tokens differs from the kind of information stored in the model’s weights and existing memory, the more surprising it is and thus worth memorizing. This enables the module to make efficient use of its limited memory and only store pieces of data that add useful information to what the model already knows.

To handle very long sequences of data, the neural memory module has an adaptive forgetting mechanism that allows it to remove information that is no longer needed, which helps manage the memory’s limited capacity.

The memory module can be complementary to the attention mechanism of current transformer models, which the researchers describe as “short-term memory modules, attending to the current context window size. On the other hand, our neural memory with the ability to continuously learn from data and store it in its weights can play the role of a long-term memory.”

Titan architecture

Example of Titan architecture (source: arXiv)

The researchers describe Titans as a family of models that incorporate existing transformer blocks with neural memory modules. The model has three key components: the “core” module, which acts as the short-term memory and uses the classic attention mechanism to attend to the current segment of the input tokens that the model is processing; a “long-term memory” module, which uses the neural memory architecture to store information beyond the current context; and a “persistent memory” module, the learnable parameters that remain fixed after training and store time-independent knowledge.

The researchers propose different ways to connect the three components. But in general, the main advantage of this architecture is enabling the attention and memory modules to complement each other. For example, the attention layers can use the historical and current context to determine which parts of the current context window should be stored in the long-term memory. Meanwhile, long-term memory provides historical knowledge that is not present in the current attention context.

The researchers ran small-scale tests on Titan models, ranging from 170 million to 760 million parameters, on a diverse range of tasks, including language modeling and long-sequence language tasks. They compared the performance of Titans against various transformer-based models, linear models such as Mamba and hybrid models such as Samba. 

Titans (red line) outperforms other models, including GPT-4, on long-sequence tasks in both few-shot and fine-tuned settings (source: arXiv)

Titans demonstrated a strong performance in language modeling compared to other models and outperformed both transformers and linear models with similar sizes.

The performance difference is especially pronounced in tasks on long sequences, such as “needle in a haystack,” where the model must retrieve bits of information from a very long sequence, and BABILong, where the model must reason across facts distributed in very long documents. In fact, in these tasks, Titan outperformed models with orders of magnitude more parameters, including GPT-4 and GPT-4o-mini, and a Llama-3 model enhanced with retrieval-augmented generation (RAG).

Moreover, the researchers were able to extend the context window of Titans up to 2 million tokens while maintaining the memory costs at a modest level.

The models still need to be tested at larger sizes, but the results from the paper show that the researchers have still not hit the ceiling of Titans’ potential.

What does it mean for enterprise applications?

With Google being at the forefront of long-context models, we can expect this technique to find its way into private and open models such as Gemini and Gemma.

With LLMs supporting longer context windows, there is growing potential for creating applications where you squeeze new knowledge into your prompt instead of using techniques such as RAG. The development cycle for developing and iterating over prompt-based applications is much faster than complex RAG pipelines. Meanwhile, architectures such as Titans can help reduce inference costs for very long sequences, making it possible for companies to deploy LLM applications for more use cases.

Google plans to release the PyTorch and JAX code for training and evaluating Titans models.



Source link

Share

Latest Updates

Frequently Asked Questions

Related Articles

TikTok returns on Apple, Google app stores as Donald Trump delays ban

TikTok returned on the US app stores of Apple and Google on Thursday,...

Confused Senator Rages That Self-Driving Cars Are Woke

Senator Ted Cruz (R-TX) believes that topics as diverse as solar eclipses and self-driving...

AI’s biggest obstacle? Data reliability. Astronomer’s new platform tackles the challenge

Join our daily and weekly newsletters for the latest updates and exclusive content...
SULTAN88
SULTANSLOT
RAJA328
JOIN88
GFC88
HOKIBET
RUSIASLOT88
TAHU69
BONANZA99
PRAGMABET
MEGA55
LUXURY777
LUXURY333
BORJU89
QQGAMING
KEDAI168
MEGA777
NAGASLOT777
TAKSU787
KKSLOT777
MAS77TOTO
bandar55
BOS303
HOKI99
NUSA365
YUHUSLOT
KTP168
GALAXY138
NEXIA138
PETIR33
BOOM138
MEGA888
CABE888
FOSIL777
turbospin138
KAPAKBET
SUPERJP
sultankoin99
dragon88
raffi888
kenzobet
aladin666
rgo365
ubm4d
GERCEP88
VIVA99
CR777
VOXY88
delman567
intan69
CABE888
RNR303
LOGO303
PEMBURUGACOR
mpo383
cermin4d
bm88
ANGKA79
WOWHOKI
ROKET303
MPOXL
GURITA168
SUPRASLOT
SGCWIN
DESA88
ARWANA388
DAUNEMAS
ALADDIN666
BIOWIN69
SKY77
DOTA88
NAGA138
API5000
y200m
PLAYBOOK88
LUXURY12
A200M
MPO700
KENANGAN4D
cakrabola
PANDAGENDUT
MARVEL77
UG300
HOKI178
MONTE77
JASABOLA
UNTAR4D
LIDO88
MAFIABOLA77
GASPOL189
mpo999
untung138
TW88
JAGUAR33
MPOBOS
SHIO88
VIVO4D
MPOXL
JARISAKTI
BBO303
AONCASH
ANGKER4D
LEVIS4D
JAGO88
REPUBLIK365
BOSDEAL88
BOLA168
akunjp
WARTEGBET
EZEBET
88PULSA
KITAB4D
BOSDEAL88
STUDIOBET
MESINKOIN
BIMA88
PPNUSA
ABGBET88
TOP77
BAYAR77
YES77
BBTN4D
BBCA4D
VSLOTS88
MPO800
PAHALA4D
KPI4D
JURAGAN77
QQ188
BOLAPELANGI
C200M
QQ998
GWKTOGEL
MEGABANDAR
COLOWIN
VIP579
SEVEN4D
MPO188
DEWATA88
SURAT4D
SINAR123
LAMBO77
GUDANG4D
AWAN4D
PLANETLIGA
GT88
ROYALSPIN88
MAMAJITU
MITO99
PEDIA4D
WIBU69JP
333HOKI
SIDARMA88
NAGAEMAS99
HOLA88
CAKAR76
KINGTOTO
RATUGAMING
SSI168
PILAR168
ACTOTO
EYANGTOGEL
KAISAR328
SLOT628
KAISAR88
DOTA88
MAXWIN369
ALIBABA99
MM168
SQUAD777
NAGABET88
JAYABOLA
SEMPATIGAME
PANDAJAGO
PIKAT4D
SINGA77
YUYU33
MASTERPLAY99
VICTORY39
NASA4D
PERMATA55
SAKAUSLOT
CK303
MPOTOWER
CIPUTRABET
WINJUDI
DEWI5000
IYA777
MAHIRTOTO
GOSLOT88
TIPTOP4D
RAJA787
JBO680
JOKER188
EPICPLAY88
TRIVABET
KAISAR189
JOKER81
JPSPIN88
MAYORA4D
DJARUMPLAY
OVO88
BAKTI78
WINGSLOT77
ICAFE4D
PDTOTO
JETPLAY88
PORN VIDEO
https://link.space/@Hikaribet
https://bio.site/Hikaribet
https://heylink.me/Hikaribet39
CMBET88
CMBET88
didascaliasdelteatrocaminito.com
glenellynrent.com
gypsumboardequipment.com
realseller.org
https://harrysphone.com/upin
gyergyoalfalu.ro/tokek
vipokno.by/gokil
winjospg.com
winjos801.com/
www.logansquarerent.com
internationalfintech.com/bamsz
condowizard.ca
jawatoto889.com
hikaribet3.live
hikaribet1.com
heylink.me/hikaribet
www.nomadsumc.org
condowizard.ca/aromatoto
euro2024gol.com
www.imaracorp.com
daftarsekaibos.com
stuffyoucanuse.org/juragan
Toto Macau 4d
Aromatoto
Lippototo
Mbahtoto
Winjos
152.42.229.23
bandarlotre126.com
heylink.me/sekaipro
www.get-coachoutletsonline.com
wholesalejerseyslord.com
Lippototo
Zientoto
Lippototo
Situs Togel Resmi
Fajartoto
Situs Togel
Toto Macau
Winjos
Winlotre
Aromatoto
design-develop-test.com
winlotre.online
winlotre.xyz
winlotre.us
winlotrebandung.com
winlotrepalu.com
winlotresurabaya.shop
winlotrejakarta.com
winlotresemarang.shop
winlotrebali.shop
winlotreaceh.shop
winlotremakmur.com
Dadu Online
Taruhantoto
a Bandarlotre
bursaliga
lakitoto
aromatoto
Rebahin
untungslot.pages.dev
slotpoupler.pages.dev
rtpliveslot88a.pages.dev
tipsgameslot.pages.dev
pilihslot88.pages.dev
fortuertiger.pages.dev
linkp4d.pages.dev
linkslot88a.pages.dev
slotpgs8.pages.dev
markasjudi.pages.dev
saldo69.pages.dev
slotbenua.pages.dev
saingtoto.pages.dev
markastoto77.pages.dev
jowototo88.pages.dev
sungli78.pages.dev
volatilitas78.pages.dev
bonusbuy12.pages.dev
slotoffiline.pages.dev
dihindari77.pages.dev
rtpdislot1.pages.dev
agtslot77.pages.dev
congtoto15.pages.dev
hongkongtoto7.pages.dev
sinarmas177.pages.dev
hours771.pages.dev
sarana771.pages.dev
kananslot7.pages.dev
balitoto17.pages.dev
jowototo17.pages.dev
aromatotoding.com
unyagh.org
fairparkcounseling.com/gap/
impress-newtex.com/ajax/
SULTAN88
SULTANSLOT
RAJA328
JOIN88+
HOKIBET
GFC88
RusiaSlot88
Tahu69
BONANZA99
Pragmabet
mega55
luxury777
luxury333
borju89
qqgaming
KEDAI168
mega777
nagaslot777
TAKSU787
kkslot777
MAS77TOTO
BANDAR55+
BOS303
Login-HOKI99/
NUSA365
YUHUSLOT
ktp168
GALAXY138