Hacker News

Reproducing the deep double descent paper

by stpnon 6/5/2025, 6:34:23 PM with 2 comments

by davidguettaon 6/5/2025, 10:05:23 PM
is this not because the longer you train, the more neurons 'die' (not uilized anymore cause the gradient is flat on the dataset) so you effectively get a smaller models as the training goes on ?
by lcrmorinon 6/6/2025, 4:11:56 PM
Do you change regularisation ?