Ask HN: Why PyTorch einsum is significantly slower than transpose

  • Have you tried with different sizes like

      a = torch.randn(10, 20, 30)
      a = torch.randn(20, 40, 60)
      a = torch.randn(30, 60, 90)
      ...
    
    Is the "4µs" a constant difference or it's proportional to the size of the matrix?