Top
New
🌕
Physics of Language Models: Architecture Design and the Magic of Canon Layers
by
nkko
on 5/4/2025, 4:25:07 PM with
1
comment
by
darknoon
on 5/15/2025, 12:19:34 AM
anyone know why they mix in the 3 previous tokens? could have just as easily done 5 or 2 right?
anyone know why they mix in the 3 previous tokens? could have just as easily done 5 or 2 right?