Have you tried with different sizes like
a = torch.randn(10, 20, 30) a = torch.randn(20, 40, 60) a = torch.randn(30, 60, 90) ...
Have you tried with different sizes like
Is the "4µs" a constant difference or it's proportional to the size of the matrix?