Reproducing the deep double descent paper

  • is this not because the longer you train, the more neurons 'die' (not uilized anymore cause the gradient is flat on the dataset) so you effectively get a smaller models as the training goes on ?

  • Do you change regularisation ?