Understanding Memory Management, Part 1: C

  • This isn't proper usage of realloc:

        lines = realloc(lines, (num_lines + 1) * sizeof(char *));
    
    In case it cannot service the reallocation and returns NULL, it will overwrite "lines" with NULL, but the memory that "lines" referred to is still there and needs to be either freed or used.

    The proper way to call it would be:

        tmp = realloc(lines, (num_lines + 1) * sizeof(char *));
    
        if (tmp == NULL) {
            free(lines);
            lines = NULL;
            // ... possibly exit the program (without a memory leak)
        } else {
            lines = tmp;
        }

  • Thanks for such a detailed article.

    In my spare time working with C as a hobby I am usually in "vertical mode" which is different to how I would work (carefully) at work, which is just getting things done end-to-end as fast as possible, not careful at every step that we have no memory errors. So I am just trying to get something working end-to-end so I do not actually worry about memory management when writing C. So I let the operating system handle memory freeing. I am trying to get the algorithm working in my hobby time.

    And since I wrote everything in Python or Javascript initially, I am usually porting from Python to C.

    If I were using Rust, it would force me to be careful in the same way, due to the borrow checker.

    I am curious: we have reference counting and we have Profile guided optimisation.

    Could "reference counting" be compiled into a debug/profiled build and then detect which regions of time we free things in before or after (there is a happens before relation with dropping out of scopes that reference counting needs to run) to detect where to insert frees? (We Write timing metadata from the RC build, that encapsulates the happens before relationships)

    Then we could recompile with a happens-before relation file that has correlations where things should be freed to be safe.

    EDIT: Any discussion about those stack diagrams and alignment should include a link to this wikipedia page;

    https://en.wikipedia.org/wiki/Data_structure_alignment

  • The example strdup implementation:

      char *strdup(const char *str) { 
        size_t len = strlen(str);
        char *retval = malloc(len);
        if (!retval) {
          return NULL; 
        }
        strcpy(retval, str);
        return retval;
      }
    
    Has a very common defect. The malloc call does not reserve enough space for the NUL byte required for successful use of strcpy, thus introducing heap corruption.

    Also, assuming a NULL pointer is bitwise equal to 0 is not portable.

  • Memory arenas should be taught to all programmers and become the default method of memory management.

  • This post caused me to create an account. This C code is not good. Writing C is absolutely harder than Python, but you're making it so much harder than it has to be. Your program is buggy as heck, has very finicky cleanup code, and so on.

    Here's a much easier way to write the program:

    1. Dump whole file into buffer as one string

    2. Find newlines in buffer, replace with NULs. This also let's you find each line and save them in another buffer

    3. Sort the buffer of all the lines you found

    4. qsort the buffer

    5. Print everything

    6. Free both buffers

    Or, as a C program: https://godbolt.org/z/38nq1MorM

  • i am no C programmer, but doesnt the first pseudocode make no sense (and others after since they reuse it)?

      address = X
      length = *X
      address = address + 1
      while length > 0 {
        address = address + 1
        print *address
      }
    
    1) length is never updated so while is infinite loop (if length is not 0)

    2) the first character is never output since at address 0 (assuming X=0 at the start) is the value length but then the pointer is incremented twice so the first print *address prints the character at address 2?

    if I am mistaken I'd be happy if someone explained why it makes sense

  • Too bad the first program in the article leaks its file descriptor.

    Memory is but one resource you need to manage. File descriptors are the first oft overlooked resource in a long list of neglected finite resources.

  • > If we just concatenate the values in memory, how do we know where one line ends and the next begins? For instance, maybe the first two names are "jim" and "bob" or maybe it's one person named "jimbob", or even two people named "jimbo" and "b".

    Don't we have a newline character? I thought we can read newline as `0xA` in Rust?

  • Using abort() every time malloc and kin fail isn't really satisfying anything except the idea that the program should crash before showing incorrect results.

    While the document itself is pretty good otherwise, this philosophical failing is a problem. It should give examples of COPING with memory exhaustion, instead of just imploding every time. It should also mention using "ulimit -Sd 6000" or something to lower the limit to force the problems to happen (that one happens to work well with vi).

    Memory management is mature when programs that should stay running - notably user programs, system daemons, things where simply restarting will lose precious user data or other important internal data - HANDLE exhaustion, clean up any partially allocated objects, then either inform the user or keep writing data out to files (or something) and freeing memory until allocation starts working again. E.g. Vi informs the user without crashing, like it should.

    This general philosophy is one that I've seen degrade enormously over recent years, and a trend we should actively fight against. And this trend has been greatly exacerbated by memory overcommit.

  • This is a fantastic post. I really feel like these concepts should be introduced to programmers much earlier on in their education and this article does a great job of presenting the info in an approachable manner.

  • This was a great (re)introduction to the fundamentals. Worthy of a bookmark.

  • Just no.

        address = X
        length = *X
        address = address + 1
        while length > 0 {
            address = address + 1
            print *address
        }

  • Great post for intermediary programmers, who started programming in Python, and who should now learn what's under the hood to get to the next level of their education. Sometimes (perhaps most of the time), we should ignore the nitty gritty details, but the moment comes where you need to know the "how": either because you need more performance, sort out an issue, or do something that requires low-level action.

    There are few sources like this post targeting that intermediate group of people: you get lots of beginner YouTube clips and Web tutorials and on HN you get discussions about borrow checking in Rust versus garbage collection in Go, how to generate the best code for it and who has the best Rope implementation; but little to educate yourself from the beginner level to the level where you can begin to grasp what the second group are talking about, so thanks for this educations piece that fills a gap.

  • thanks for sharing these are core concept to better understand the coding

  • The comics at the beginning hahaha :D

  • Avoid as much as you can the C standard lib allocator, go directly to mmap system call with your own allocator if you know you won't use CPU without a MMU.

    If you write a library, let the user code install its own allocator.