The Philosophy of Linear Algebra (2021)

  • > While nowadays most people learn linear algebra through the perspective of vectors and transformations, the matrix and determinant are much older than the ideas of vectors in a multi-dimensional space.

    Determinants, yes. Matrices, no. Matrices were originally invented by Arthur Cayley as a formalism for describing linear transformations.

    Mathematicians making historical claims ought to cite their sources. Mathematicians should also define what their terms mean, otherwise it's not clear what they're saying. I'm defining matrices as a certain number system which supports operations called "plus" and "times" (for the pedants, a preadditive category).

    [edit]

    Secondary sources: If we look here [1], matrices were invented in parallel by Sylvester and Cayley. Cayley invented the notion of a linear transformation first (1845), and in parallel Sylvester developed a notion of "matrix" to study determinants (1850), and then Cayley finally combined the two ideas to form the modern matrix algebra (1858). Therefore the claim that matrices are older than their modern perspective in terms of vectors and transformations isn't true.

    [1] - https://www.quora.com/Why-and-who-invented-matrices

  • I agree with prior comments about the article (not great). But I love LA, and will indulge a bit.

    For me, the most amusing feature of LA is how quickly results and tools escalate from trivial and obvious to surprising and hard. LA is often introduced as slick notation to express systems of linear equations. Gaussian elimination and matrix inverses (when they exist) are natural evolutions of this, but not what I consider surprising.

    Matrix multiplication is an elementary next step, but it is not at all clear why one would multiply matrices. My first real mathematical surprise (I was in jr high at the time) is a result of matrix multiplication: there is a matrix A of real values that satisfies A @ A = -I. (The matrix is [[0,1],[-1,0]]). I was aware of complex numbers, but until that moment, I didn't know you could get "minus identity" without first artificially introducing the idea of "square root of -1". But this shows one can construct "square root of -1" without introducing anything like that.

    When one gets to determinants, characteristic polynomials, eigenstructure, fixed subspaces, projection operators, SVD, orthogonal transformations, and so-on, the work really begins. That determinants are multiplicative is really wonderful, but not at all obvious. SVD in full generality was only shown in 1936 and modern algorithms emerged in the 1950's to compute it [1].

    Finally, check out the LDU decomposition of the Walsh matrix [2].

    [1] https://en.wikipedia.org/wiki/Singular_value_decomposition#H...

    [2] https://en.wikipedia.org/wiki/Walsh_matrix

  • Articles like this help scale mathematical knowledge to run on a greater number of minds, so helpful, and thank you to the OP! While practical skills and competence are the necessary conditions for understanding maths that can only come from practicing and doing, we also need a bigger funnel of people with intuition and curiosity so as to move more people through the pipeline of curiosity, to competence, to mastery, to innovation. I hope as a culture we start to encourage more treatments of difficult topics like this article.

  • A philosopher's tools are paper and pencil. A mathematician's tools are paper, pencil, and wastebasket. There is no "philosophy" of linear algebra. A vector space is an abelian group with a scalar multiplication that satisfies distributive laws. Unlike most other mathematical objects, they can be classified by a single quantity: their dimension. Every vector space is isomorphic to the set of functions from S to F, F^S = {v:S -> F}, where S is a set and F is the underlying field. It's dimension is the cardinality of S. A linear transformation is a function preserving the linear structure. A matrix is a representation of a linear transformation using a particular basis. Matrix multiplication is composition of linear transformations. As Whistler remarked when an aficionado said "There are two great painters in the world, you and Velazquez." His reply, "Why drag Velazquez into this?" Why drag philosophy into this? Just buckle down and learn the math, as everyone before you did as a matter of course.

  • Linear algebra isn't about matrices. And I heavily disagree with last paragraph.

    >Computers certainly helped accelerate the development of linear algebra and vice-versa.

    Computers have helped with matrix analysis given floating point constraints inherit to computers. But linear algebra has been in a pretty solid place as theory since Halmos published "Finite Dimensional Vector Spaces" in 1942.

  • For an alternative to Linear Algebra, there's also Clifford Algebra (also known as Geometric Algebra. It makes rotations much more intuitive, among other things. The only drawback I found is that the current Python libraries seem to struggle with very high dimensionality (which is common in ML).

  • These articles remind me that one should look in the right places for the right respective knowledge.

    That article is not at all an exposition of what the Philosophy of Linear Algebra is.

    >While nowadays most people learn linear algebra through the perspective of vectors and transformations, the matrix and determinant are much older than the ideas of vectors in a multi-dimensional space.

    That is not true. Anyway, what is the article about? According to the title it's about linear algebra. Then, the basic objects are vectors, and the interesting topic of discussion are maps. Some maps can be represented by matrices. That's it.

    Yet, the article begins by describing matrices. Is it like those fiction books who randomize the order of the chapters to make them more interesting?

    Or is the article supposed to be about matrices in general, not about linear algebra? (Niche) uses of matrices outside of linear algebra do exist.

    >An upper triangular matrix conveys a whole lot more than just the fact that the numbers form a triangle; it tells you that the matrix is describing a nonsingular system that is solvable.

    That is not true in general.

    If the matrix is an upper triangular matrix AND all the coefficients in the diagonal are non-zero, THEN it's regular (so invertible, so non-singular).

    But philosophically, WHY does it do that?

    The answer is because you are basically trying to invert the matrix but you notice half-way through that you can already use the upper triangular matrix to get the result with non-matrix means, so you just stop there instead (it's a lot less work). It can be shown that the kind of manipulations you do to get from the triangular matrix to the inverse matrix are all conserving the determinant, so you're in the clear.

    That is slightly interesting from a philosophical standpoint.

    >The diagonal matrix describes a system with elements that only operate on themselves,

    It just means the elements don't mix the different axes of the multiplicand in the result. Coordinate along axis one in the result = coordinate along axis one multiplied by diagonal element one. And so on, for two, three.

    >symmetric matrix tells the story of a system that is bi-directional

    Argh. It just means that the eigenvectors are all orthogonal to each other and the eigenvalues are all real. It's nice to have these properties (for example because it's intuitive to have orthogonal coordinate axes), so it's often used for calculations.

    Also, eigenvectors and eigenvalues are interesting because they give you a better basis for your calculation (literally).

    If you choose your coordinate axes wrong, it will tell you the "better" ones (the one intrinsic to the problem--so it's much easier to calculate things there).

    To summarize, to save others the work: that article is not about the philosophy of linear algebra. It's just showing a small example with matrices and not explaining any WHYs in the process. Where it does explain things, it's wrong multiple times.

    ----

    Maybe a small real Philosophy of Linear Algebra here:

    The real Philosophy of Linear Algebra is that there are a lot of systems that are linear, therefore it's nice to have a complete mathematical theory for systems of linear equations. That's it. It's not magical, it's not general enough (cough non-linear systems), and (looked at it right) it's simple.

    Linear algebra's objects of discussion are vectors and maps.

    What vectors are everyone learned in school. Suffice to say vectors are any objects which can be added together, multiplied by a scalar, and then form a commutative ring with respect to addition; and multiplication and addition distribute over in the usual way (like for real numbers). Notably, there's no division! This is not a complete definition, but you get the idea.

    (It is also useful, but not necessary, to introduce a so-called "inner" product. Inner because the result stays inside the vector space. Nevermind!)

    In any case, you can then have equations with vectors in them. Like 5 v⃗ = w⃗, where v⃗, w⃗ are vectors.

    This works like any other system of equations: there are not enough equations for those two unknowns v⃗ and w⃗.

    So what about this system of equations: { 5 v⃗ = w⃗, 5 v⃗ = w⃗ } ? That's silly you say?

    Ok then, what about { 5 v⃗ = w⃗, 10 v⃗ = 2 w⃗ } ? Still doesn't work?

    So what's the rule here? Turns out it has to do with those upper triangular matrices.

    A system that has no interesting solution (that is not 0⃗):

       5 v⃗ = w⃗
       5 v⃗ = 2 w⃗
    
       w⃗ = 2 w⃗
       0⃗ = 2 w⃗ - w⃗
       0⃗ = w⃗
    
    Booring.

    Anyway, let's introduce functions on vectors, so-called "maps" (here, f and g are maps).

       f(v⃗) := 5 v⃗
       g(v⃗) := (5/2) v⃗
    
       f(v⃗) = w⃗
       g(v⃗) = w⃗
    
    THAT is the basic philosophy of linear algebra.

    Now for some more practical stuff:

    It's annoying to have those abstract v⃗ and w⃗ be around. In a lot of cases, you are only interested in coordinates with respect to some basis (the basis is a set of axes that you choose--usually your axes of measurement. That is the axes your measurement devices measure along). It can be shown that the number of basis vectors you need for that (to represent any abstract vector!) is equal to the dimension of the vector space, usually 3.

    Let the basis be {b⃗_1, b⃗_2, b⃗_3}. These basis vectors obviously must be independent, so b⃗_1 := b⃗_2 is out because that's cheating. b⃗_1 := 2 b⃗_2 ? Out. b⃗_1 := 2 b⃗_2 + 3 b⃗_3 ? Stop it already.

    Now you can find numbers q, r, s such that v⃗ = q b⃗_1 + r b⃗_2 + s b⃗_3.

    If you want to be more conserative with your letter consumption, better is to call q and r and s: v^1, v^2, v^3 instead (it's just a notation, and the ^1 and ^2 and ^3 are part of the name for now).

    So

       v⃗ = v^1 b⃗_1 + v^2 b⃗_2 + v^3 b⃗_3.
       w⃗ = w^1 b⃗_1 + w^2 b⃗_2 + w^3 b⃗_3.
    
    Or in non-annoying notation

       v⃗ = ∑_i v^i b⃗_i
       w⃗ = ∑_i w^i b⃗_i
    
    Or in even less annoying notation, let the sum be implied when an index repeats in a term

       v⃗ = v^i b⃗_i
       w⃗ = w^i b⃗_i
    
    Ahhh. Much better.

    If the f is a LINEAR map, then you can do this (by definition, which you have to look up):

       f(∑_i v^i b⃗_i) = ∑_i v^i f(b⃗_i)
    
    So in order to specify the linear map f completely, you only need {_i f(b⃗_i) }, i.e. how f acts on the individual basis vectors each. That's it.

    So for f and g in non-annoying notation:

      f(v^i b⃗_i) = v^i f(b⃗_i)
      g(v^i b⃗_i) = v^i g(b⃗_i)
    
    Also,

      f(v⃗) = w⃗ = w^i b⃗_i
      g(v⃗) = w⃗ = w^i b⃗_i
    
    So

      f(v^i b⃗_i) = v^i f(b⃗_i) = w^i b⃗_i
      g(v^i b⃗_i) = v^i g(b⃗_i) = w^i b⃗_i
    
    So

      v^i f(b⃗_i) = w^i b⃗_i
      v^i g(b⃗_i) = w^i b⃗_i
    
    And NOW we can go to matrices.

    Those b⃗_i are annoying, and the f(b⃗_i) are even more annoying.

    So let's get rid of them.

    Let c⃗_i := f(b⃗_i). Let d⃗_i := g(b⃗_i).

    Now we'd like all those abstract basis vectors to disappear. But c⃗_i and b⃗_i are different bases (if they are bases at all--neeevermind for now :P), we'd like these to be specified only with the b⃗_i. Let c⃗_i = c^j_i b⃗_j

       v^i c⃗_i = w^i b⃗_i
       v^i c^j_i b⃗_j = w^i b⃗_i
       v^i c^j_i b⃗_j = w^i δ^j_i b⃗_j
    
       c^j_i v^i = w^j
    
       c := (_i (_j c^j_i)) That's the first matrix!
       v := (_i v^i)        That's a column "vector"--a matrix
       w := (_i w^i)        That's a column "vector"--a matrix
    
    Note: δ^j_i := {1 if i=j, otherwise 0}

       c v = w
    
    Likewise for the g, with the d⃗_i killed and replaced by the b⃗_i.

    And now we are at the philosophically completely uninteresting result. Good job.

  • > Interestingly enough, linear algebra only exposed its deepest secrets to us in the 20th century, despite having existed for many centuries.

    I think this becomes less surprising when phrased as: the deepest knowledge about linear algebra that we have so far is that gained most recently—at which point it sounds almost tautological. There's no reason to think that we've plumbed all the mysteries of linear algebra, no matter how much more we know (especially about the computational side of it) than we used to.

  • This reminds me of the discussions in the book "Linear Algebra done right" by Sheldon Axler. He has a beautiful series for videos on YouTube that can serve as a companion to the book.

  • linear algebra looks seems easy because the majority of promblems are non linear problems and as such untractable except when you approximate them to linear problems

  • Okay but somebody please explain to me the reason for prefix operator column-major notation.

    I digress…

    >> linear algebra doesn’t exist simply because humanity managed to concoct a system that coincidentally finds usage in many seemingly unrelated subjects

    I’m not a stranger to this field, but this article doesn’t make this case at all, and I’m not sure it can be made. Linear algebra is fundamentally a set of notation rules that people built upon and found ways to represent their problems with. It’s ubiquity in many fields is no more surprising than the observation that many different genre of literature are written in English, and somehow nobody is confused enough to think that flight instruction manuals are the domain of English rather than the range of it. There is no practical limit to the number of academic papers one can write which awkwardly contort a simple intuitive computation into an indecipherable representation in linear algebra. If you want the concept of linear algebra to consume all linearity that exists in the world, then fine, whatever, but then the notation sucks. You may call a continuous Fourier transform a linear operation, but try writing down the values of that matrix.

    No; there’s nothing fundamentally tying the inner-workings of the world to the mechanics of linear algebra, but has been considerable effort to adapt some narrowly-chosen set of mechanics to represent the world, and some subset of those things work out well, so that’s what we focus on. But this is just path dependency. A different set of rules for writing linear relationships could have worked out better, or worse, but it would have to get there regardless. I expect that the current system will disappear soon as the world shifts to talking about computational tensors, and more flexible rules about how to apply operations. Who says you can’t multiply a row by a row? That’s stupid; we can and should do better. Is the determinant magical, or is is just a an arbitrary series of shift-product-sum which is far more applicable in the general sense? Do we really need a whole matrix to describe a permutation?

    Linearity may be considered well-described when there is no longer any need for a separate course to teach the notation and tricks behind it. When I was in grad school, one of my favorite books was Gene Golub’s “Matrix Computations”. It really is an impressive book from an incredible mind, like a book on chess positions written by Bobby Fisher. Regardless, it was hard for me to go through it without wondering why such a thing exists. Much like chess, the human mind can become enamored with any simple set of rules that can be extended to sufficient complexity. Faced with a real problem, sometimes it helps me to get to a solution, and sometimes it distracts me from the solution. I can never be sure which is more common.

  • I should made a linear algebra YouTube video. probably make me $1 million given how much interest there seems to be in the subject

    The subject is poorly taught imho. It's used for almost everything and yet most instruction focus on the matrix and vector operations without tying it to other things.