Hacker News

C Is Not a Low-level Language (2018)

by bmeron 10/16/2023, 10:46:10 AM with 54 comments

by GuB-42on 10/16/2023, 1:38:53 PM
C is low level for at least one reason: manual memory management. Especially with modern hardware, memory management is at the center of programming. For example, Rust prides itself in being memory safe without a garbage collector, memory management is more or less the entire reason for Rust to exist. Why is C fast? Memory. Why is C unsafe? Mostly memory. One of the big reason parallel computing is hard? Concurrent memory access. Functional programming is often surrounded by plenty of mathematical concepts, but a good part of it is to pretend that objects are immutable when behind the scenes, the compiler works with mutable memory.
In C, every call to the allocator is explicit, that is, if you are using an allocator at all. Compare to oldschool C++, with new/delete and raw pointers, where you may call the allocator explicitly, but still, a lot happen in destructors, automatically. In modern C++, with smart pointers, it is essentially like a garbage collected language in the sense that allocation and deallocation all happen automatically.
by pizlonatoron 10/16/2023, 12:30:51 PM
Friendly local C programmer and compiler writer here to remind you that C definitely is a low level language for those who understand it and use it professionally. If you’re looking for a low level language, then C (and its relatives) are your best bet.
If you’re new to the language and want to understand how to use it like a pro then ignore this post - it will only confuse you and reduce your ability to use C effectively.
by ndiddyon 10/16/2023, 2:37:00 PM
I disagree with the author's point that CPU instruction sets should expose more of the CPU's implementation. This has been tried in the past and failed to work long-term. One example of this is branch delay slots from some RISC processors (such as MIPS and SuperH) designed in the late 80s and early 90s. For those unfamiliar with the concept, it basically means that the instruction after a branch instruction will get run regardless of if the branch was taken or not. This was a short-term benefit, as it meant the job of avoiding pipeline stalls after a branch was left to the programmer, so the processor could be simpler and cheaper than designs without them. However, as time went on, the processor designs evolved with more complex pipelines, so the single instruction wasn't enough to cover the branch delay. Instead, it became a legacy issue that future processors had to deal with for compatibility reasons and made their branch prediction and pipeline logic more complex.
by wolframhempelon 10/16/2023, 12:19:57 PM
I feel, low to high level is a spectrum, not a binary. C is arguably in the lowest third of languages, exposing you to a lot of machine primitives like memory and thread management. It may not be as low level as assembly, but it is arguably lower level than Java or Go, and definitely nowhere near the Pythons and JS of this world.
by Gazoo101on 10/16/2023, 12:11:25 PM
I think this statement at the end of the article - 'There is a common myth in software development that parallel programming is hard.' - is misleading. Granted the author denotes explicit situations where it is not hard, but if it's applicable in general, then it is hard. Not a common myth.
Is parallel programming hard? Without any further details or specifics, yes it is. It is far harder to conceptualize code instructions executing simultaneously, than one-at-a-time in a sequential order.
by abainbridgeon 10/16/2023, 12:43:02 PM
This article is correct that your computer is not a fast PDP-11 but wrong that this has anything to do with C. Eg, "another core part of the C abstract machine's memory model: flat memory. This hasn't been true for more than two decades."
This has nothing to do with C. The hardware insists on this abstraction. And its a good job too, otherwise your programs would stop working when moved to a machine with different cache.
by wzddon 10/16/2023, 3:01:40 PM
This is now five years old, and while obviously the premise is more correct than ever (computers don't look much like a PDP-11 architecturally), the conclusion ("imagining a non-C processor") seems less strong. We are seeing (and were seeing, even in 2018) a strong separation between linear and highly-parallel code, most obviously in the rise of Python for machine learning and scientific computing. It is still very convenient, when performance isn't paramount, to write in a single-threaded style and to a flat memory model. When performance is important, it's then appropriate to switch to a language better suited to parallel programming -- one of the computational-graph languages in something like Pytorch, some other set of primitives on top of CUDA, or even something more experimental like Futhark. Performance-critical code has always had its domain-specific languages, and they seem to be becoming more common, not less, and the hardware is being built to match -- as the CPU+GPU combination common to desktop PCs, as vector extensions to x86 (which have their own primitives making, essentially, a DSL of their own), or things like the M1, which bolt a GPU to a CPU to give both high-speed access to the same system memory.
In other words, perhaps what's really out of date is not C, but the concept of a general-purpose language which is equally well-suited to any type of task.
by HarHarVeryFunnyon 10/16/2023, 1:15:56 PM
If the sophistication of modern CPUs makes C no longer a "low level" language, then the same applies to assembly language .. things like out of order execution and register naming applies there too.
I guess the sophistication of compilers in recent decades adds to the argument since even the assembler (object code) the C compiler generates isn't going to be as expected due to hoisting things out of loops, common subexpression elimination, etc, etc.
Still, I think the notion of C being a "low level" language is still a useful label ... if not we need to retire this designation altogether.
by openasocketon 10/16/2023, 4:53:03 PM
I feel like the article advances on two different lines of argument that are difficult to reconcile. The first is that C is not a low-level language, and gives examples like struct padding and signed overflow being undefined behavior. That part makes sense to me, and the argument seems constructive: it seems to propose language features for a hypothetical "real low-level" language.
The second argument is that, because of the dominance of C, CPU designers have had to bend over backwards to create something that runs C naturally. Here there are examples like register renaming, flat memory, caching, etc. This argument also makes sense to me, but in the context of the first argument, and the title of the article, I'm not sure how it relates. Taken at face value, this seems to imply that it isn't even possible to create a low-level language on modern hardware, and even machine code is "high-level". This seems to argue that we would have to create a new generation of hardware that exposes much more complexity to the instruction set architecture, and only then could we design a low-level language to take advantage of that.
I think both of these arguments have merit, but it's a little disconcerting to put both of them in the same article, and to make the title "C is not a Low-Level Language". I suppose the first argument could go here, and the second argument could have been done in a follow-up article entitled "Machine code is not a Low-Level Language Either".
by thatjoeoverthron 10/16/2023, 12:36:48 PM
Reminds me of VLIW. As per Wikipedia, from the Itanium page:
> One VLIW instruction word can contain several independent instructions, which can be executed in parallel without having to evaluate them for independence. A compiler must attempt to find valid combinations of instructions that can be executed at the same time, effectively performing the instruction scheduling that conventional superscalar processors must do in hardware at runtime.
If your CPU exposed the single-stream parallelism at the interface, you can do it at compile-time or even decide it with in-line assembler.
I wonder if it hasn't caught due strictly to the business dynamics of the industry, or are there technical reasons this isn't really a good strategy?
by Waterluvianon 10/16/2023, 12:20:11 PM
My sense is that this is really a communication issue (when is it not?)
On a relative scale, C is very low level compared to how we program today if you think about levels of abstraction.
If “low level” means “runs on the CPU almost literally as written.” then no it’s not.
by wrsh07on 10/16/2023, 1:24:27 PM
This is one of the most interesting programming articles I've read in a while. And it's well written and easy to read! Don't stop at the (inflammatory?) title.
* We all agree that c gives you a lot of control to write efficient sequential code
* Modern processors aren't merely sequential processors
* Optimizing c code for a modern processor is hard because c is over-specified - in order to allow humans to manually optimize their programs (given the c memory model etc), it's hard for compilers to make assumptions about what optimizations they can make
It doesn't seem like this is a fundamental problem, though, and c could provide symbols that denote "use a less strict model here" (or even a compiler flag, although I bet incremental is the way to go)
by karmakazeon 10/16/2023, 12:54:56 PM
This is a great article (worth reading if interested in performance/parallel computing) but the complications it gets into are mostly in the CPU architecture/hardware to which compilers add additional complexity. Even without the compiler optimizations there's still branch prediction and associated parallel execution of serial machine code.
To anyone debating whether C is low/not-low level language note that this discussion is at a much lower level so 'low' has a lower than common meaning.
by bee_rideron 10/16/2023, 12:51:15 PM
I think the title that the authors decided to give this article was unnecessarily provocative in a distracting manner. I’m pretty sure there is a technical definition of low level language they are referencing that excludes C, and pretty much only includes assembly as a low level language. Ok, fine, whatever.
Their bigger point seems to be that C is no longer very mechanically sympathetic to huge modern cores, because the abstraction pretends there’s only one instruction in flight at a time. Is anyone aware of a language that fits the hardware better? Maybe Intel needs to release a “CUDA of CPUs” type language.
by titzeron 10/16/2023, 4:02:19 PM
> On a modern high-end core, the register rename engine is one of the largest consumers of die area and power.
Another red herring. Register rename isn't the result of some PDP fetishizing. It is a direct result of using more hardware resources than are exposed in the architectural model. Even if it were a stack machine or a dataflow graph architecture, register renaming is what you do when you have more dynamic names for storage than static names in the ISA.
by layer8on 10/16/2023, 3:03:51 PM
> Consider another core part of the C abstract machine's memory model: flat memory.
The C abstract machine only has a flat memory model within a given malloc allocation (and within each local or static object). Relational pointer comparison between different allocations is UB (see e.g. https://stackoverflow.com/a/34973704).
So C is perfectly fine with a non-flat memory model as long as each object is confined within a flat memory region (by virtue of being allowed to alias it as a char array). You can imagine a C runtime library that provides functions to obtain pointers to different types of memory that don’t share a flat address space.
The only restriction is that pointers must carry enough information to compare unequal if they point to different objects. Of course, you might be able to construct a virtual flat memory model from the bit representation of void* or char*, but that’s not quite the same as imposing an actual flat memory model.
by hoobyon 10/16/2023, 1:29:42 PM
"Low-level" is not a perfectly well-defined technical term, and does mean (slightly) different things to different people.
I feel that the article does explain well enough, how the author defines "low-level" for the sake of this article - and the definition being used seems just as fine as any other. And sticking with this specific definition, the conclusions of the article do seem to check out. (But I'm no expert on the subject matter, so I might be wrong about that).
I feel that the "value" of the article lies in challenging certain conceptions about C.
To me, it doesn't really matter if the article is (completely) right or not - the somewhat indignant response I see happening to the title of the article, and the discussion I see about what "low-level" actually means, seems to prove that some dogmatic beliefs about C are pretty deep-seated.
I feel it's always worthwhile to question such dogmatic beliefs.
by titzeron 10/16/2023, 3:58:42 PM
> The root cause of the Spectre and Meltdown vulnerabilities was that processor architects were trying to build not just fast processors, but fast processors that expose the same abstract machine as a PDP-11.
No, Spectre is the direct result of processors speculatively executing code without respecting the conditions that guard the code. Hands down, processors hallucinate conditions in code. It has nothing to do with the particular computational model, but would happen in any system that speculates conditions.
And not just one branch, but a whole series of them. In fact, the processor is usually running with a whole buffer full of instructions that are executing in parallel, having been loaded into the reorder engine using nothing more than (normally highly accurate) statistical predictions.
by bazoom42on 10/16/2023, 2:15:20 PM
Apparently there is plenty disagreement about what “low level” means. Historically assembler was considered low level, and languages like C with named variables and functions and nested expressions were considered high level. I have also seen C described as mid-level to indicate it is higher level than assembler but “closer to the metal” than say Java. And apparently it is now called low-level by some - wonder what assembler is then?
In any case, at this point, low level and high level are only meaningful relative to other languages.
The article is questioning how “close to the metal” C actually is, but some of the arguments also applies to assembler, which is not that close to the metal either these days.
by falcriston 10/16/2023, 2:56:56 PM
To make an article about how C maps to the processor and fail to make any distinction between application programming and embedded programming seems strange to me. After all, C is by far the most common language for programs running on micro-controllers, and it actually does map well to many micro-controller architectures in use today.
I'm clearly not the target audience for this article, but I still feel like the author would be well advised to put a little note at the top that says "we're talking about CISC and high-end microprocessors rather than microcontrollers."
I'm also not seeing suggestions for languages that do map well to modern microprocessors.
by HumblyTossedon 10/16/2023, 1:35:16 PM
I've been programming since the mid 80s, started with the C=64. People have been having the argument that C is low-level vs c is not since at least then.
Why do so many smart people waste their friggin' time on such nonsense?
by cat_plus_pluson 10/16/2023, 3:08:29 PM
Computation is only a small part of computing, addressed by languages such as OpenCL and by no means simple, observe constant GameReady driver releases from Nvidia to support each new major game. C is still pretty good at many other parts of low level computing, such as managing state of hardware or allocation of system memory to different tasks. Such tasks are not well suited to parallelism, as they must maintain a globally consistent state.
It is perhaps true that CPUs and compilers should execute C code mostly as it is, with only local optimizations to spare programmer of having to decide whether x + x, or x * 2 or x << 1 is faster for example. This would improve system security and reliability while freeing up time to work on great compute languages for vectorizable computations.
But, at the end of the day, CPU makers and compiler writers are humans motivated by both career success and less tangible bragging rights. So OF COURSE they will chase benchmarks at the expense of everything else, even when benchmarks have little to do with real life performance in an average case. I have a 13 year old 17 inch MacBook pro I use for some favorite old games. When I fire it up, I don't see any differences in my computing experience vs a 2023 laptop. So whatever advances in CPU/compiler design were made since do not seem to help with tasks I am actually interested in.
by bluGillon 10/16/2023, 1:02:23 PM
Assembly is not the lowest level language you can work in. I've programmed in raw binary opcodes before, that is the lowest level. (though there is a valid argument that microcode is even lower level - I disagree but still acknowledge the argument is valid) Often a single assembly language instruction can be one of more than 30 different opcodes as registers are often encoded in the opcode. Of course at this level you have to have your CPU instruction manual as they are all different.
by throwaway87543on 10/16/2023, 12:22:09 PM
Other than assembly, which barely qualifies as a language, what programming language is lower than C?
by vonwoodsonon 10/16/2023, 3:25:01 PM
This article begins with victim blaming the software engineers in full-throated support of hardware engineers. If, and I do mean if, anyone should be exalted it is the fact that software engineers have been coping with C as a stable-but-difficult programming language specifically for the benefit of the hardware engineers’ desire to have a stable target. The fact that the specification is ambiguous at all is so that hardware manufacturers can port a reasonably small, expressive, and powerful language to their hardware. And, no, making a new language that targets the platform for the ease of hardware development and exploitation of system-specific benefits is not the answer. In fact, it’s the literal reason why C is still as popular as it is.
Nobody wants to learn your programming language, write thousands-to-millions of dollars worth of software, just to have it become obsolete two days after the new-hotness processor comes out. Been there, done that.
Alternatively, perhaps, we can place the blame on hardware manufacturers who were looking to cut corners for improved performance and produced insecure machines because they lied to us non-expert hardware users about how fast their systems could go and what we were getting for our money.
by cmrdporcupineon 10/16/2023, 1:50:50 PM
Yes, C is a set of abstractions like any other language (even assembly.) which attempt to mimic a machine of far less complexity.
Unfortunately it's also the wrong set of abstractions for the contemporary era.
That said, if you're working in low-level embedded microcontroller world, C's memory model and program structure does in fact look a lot more like those systems.
by adamrezichon 10/16/2023, 3:10:41 PM
I've been working on making games for the Playdate (https://play.date) over the past few weeks, using their C SDK. it's my first time using C in a decade, since I first learned it in college, and I'm having a surprisingly great time with it. sure, there's tons of weird quirks that take some getting used to—but there's a lot that I've been surprised to find that I missed about it! it's fun to write code that does what you tell it to do, without having to worry about object ownership or any higher-level concerns like that—you just manage the memory yourself, so you know where everything is, and write functions that operate on it straightforwardly. if it's been awhile since you've touched C, I highly recommend giving it a try for a small game project.
by intalentiveon 10/16/2023, 4:29:14 PM
“The abstract machine C assumes no longer resembles modern architectures” implies that it might be nice to have a language that maps more directly to what is really happening under the hood. I agree. It would be nice to take the guesswork out of, “How should I write this so that the compiled code has fewer cache misses?”
Maybe there is a sweet-spot level of abstraction that allows for more fine-grained control of the modern machine, in the sense that compiled code more or less reflects written code, but not so fine-grained as to be unwieldy or non-portable.
Vectorized code that is native to the language could be done with either map functions or Python / NumPy / PyTorch style slicing, which is fairly intuitive. Multithreaded OTOH I’m not sure there is an easy answer.
by usrnmon 10/16/2023, 12:01:23 PM
Of course it isn't, but what's the alternative?
by OnlyMortalon 10/16/2023, 5:21:32 PM
When compared to assembler, I’d agree.
I grew up with 6502 and 68k. To me, back in the early 90s, C (Mac MPW C to be precise) was an abstract assembler. The code-gen was perfectly readable.
Compared to the likes of Python, it most certainly is low-level. These types of language allow developers to rapidly get something going and not just because of the libraries.
I’d find it very hard to justify a business position where C has any other role than binding and breaking out into something more abstract. Be that Go or C++, for an example.
An argument I used to hear was “performance” from C. I’m not entirely convinced as in a higher language your algorithm may well be better as you can deal with the abstraction.
But… people make money coding C.
by eigenformon 10/16/2023, 12:54:05 PM
Even before that, this is ultimately about that fact that an ISA for a general-purpose computer can be seen as a way to abstract away parallelism. Even in your favorite assembly language, the effects are largely supposed to happen one after another.
That abstraction is leaky, but the alternative is VLIW machines - even in that case, you probably end up using a compiler so that you don't have to worry about parallelism. Reasoning about parallel things is hard, that's why we spend so much time trying to avoid it ¯\_(ツ)_/¯
by jansanon 10/16/2023, 12:17:14 PM
If you have never heard of the PDP-11 before like I did until yesterday (I should probably be banned for this from HN), this is really something worth learning about. There is an awesome project for a PDP-11 front panel replica running an emulator on a Rhaspberyy PI (the whole thing is called PiDP-11, haha). Here is more information:
https://obsolescence.wixsite.com/obsolescence/pidp-11
by vivekvon 10/16/2023, 1:44:05 PM
I am neither a compiler writer or an OS guru. Just an old c programmer. In this article it looks like the entire CPU instruction set was designed to emulate pdp11 to ensure c compatibility. So my naive question. What is stopping a Microprocessor manufacturer to have two instruction sets one that is compatible and one that allows us to fully utilise the modern CPU but with a different programming paradigm? Is that too expensive or hard to do? I genuinely don't know.
by phendrenad2on 10/16/2023, 8:50:26 PM
[delayed]
by ori_bon 10/16/2023, 1:07:57 PM
If you accept the premise of the article, you also need to accept that assembly is not a low level language, and that it is impossible to program any CPU currently for sale in a low level language.
The abstraction CPUs give you is more or less a fast pdp11 with some vector registers bolted on.
The implementation internally is not.
by hluskaon 10/16/2023, 2:01:31 PM
If we ever wonder why more people don’t get into low level programming, this article and the responses are an excellent case study. We’re allowed to make what we know accessible to newcomers and many of us should tone down our arrogance when we have really deep technical conversations.
by assimpleaspossion 10/16/2023, 6:22:29 PM
The paper talks about how C is designed for a PDP architecture and that's the problem. Is there any language that is not that way and can handle parallelism and all the things mentioned in the paper?
Yes, I do see Erlang mentioned but I don't think it was considered a solution.
by charles_fon 10/16/2023, 2:56:02 PM
Interesting take, but I think it goes out of its way to prove the definition of low-level to be wrong, while missing that the definition it gives and claim is wrong, in itself is very flexible.
What is irrelevant? To a data-scientist, typescript is low-level. You're required to think about structure and compile stuff!
To a web developer, C# and Java are low-level because you need to think about the execution platform
To an IT developer, C and C++ are low level because you need to think about memory.
To a game developer assembly is low level because you need to think about everything.
To electronocians everything is high level. To accountants VBA in Excel is low level. To a product manager a word document with any sort of technical words is too low level.
If you need to optimize your software to the point where some CPU specific instructions are required, C is too high level because its hiding stuff that is not irrelevant.
by mgaunardon 10/16/2023, 1:05:24 PM
With the same argument you could even argue that the x86 ISA is a high-level language, since under the hood it's decomposed to micro-ops which are scheduled on a superscalar infrastructure and run out of order.
by mpweiheron 10/16/2023, 3:40:32 PM
I am really surprised that such a bad take has gotten so much airtime, almost as much as that such a gifted developer came up with it.
The only way that the title is true is one that is not mentioned in the article: when C became popular, anything that was not assembly was a "high level language". Heck, even some Macro assemblers were considered high level, IIRC.
The factors that are mentioned in the article fall roughly into two categories:
1. The machine now works differently.
This may be true, but it does so almost entirely invisibly, and the exact same arguments given in the article apply in the same way not just to assembly language, but even to raw machine language.
I have a hard time seeing how machine level is not low level. But I guess opinions can differ. What seems inarguable is that machine language is the lowest level available. And if the lowest available level does not qualify as "low" in your taxonomy, then maybe you need to rethink your taxonomy.
2. C compilers do crazy shit now
This is also true, but it is true exactly because C is a low level language. As a low-level language, it ties execution semantics to the hardware, resulting in lots of undefined (and implementation defined) behavior that makes a lot of optimisations that some people really, really want to do (but which are far less useful than they claim) really really hard.
So C compiler engineers have defined a new language C' which has semantics that are much more amenable to optimisation. Nowadays they try to infer that language C' from the C source code and then optimize that program. And manhandle the C standard, which is intentionally somewhat loose, in order to make the C'' language that looks like C but maps to C' the official C language.
Since they were moderately successful, it can now be argued that C has morphed or been turned into a language that is no longer low level. However, the shenanigans that were and continue to be necessary to accomplish this make it pretty obvious that it is not the case that this "is" C.
Because, once again, those shenanigans were only necessary because C is a low level language that isn't really suited to these kinds of optimisations. Oh, and of course the rationale document(s) for the original ANSI C standard, which explicitly state that C should be suitable as a "portable assembly language".
But then again we already established that assembly is no longer a low level language...so whatever.
by PhilipRomanon 10/16/2023, 3:28:22 PM
>take a property described by a multidimensional value
>project it into a single dimension
>split it in the middle, thus inventing two useless artificial categories ("low level", "high level")
>get a bunch of highly functioning hackernews 0.1xers to argue endlessly about said useless categories
>submit weekly articles "thing X is NOT in my imaginary category Y!!!"
>profit
Arguing whether or not C is a low level language is about as useful as arguing whether dog-headed men have souls
Next up: IO is not a Monad, x86 machine code is not a low level language, RISC-V is not actually RISC, GPL is not actually open source and so on
by ngrillyon 10/16/2023, 1:16:31 PM
Wasn't it the idea of RISC to have a simpler CPU and push the optimization responsibility towards the programmer and the compiler?
by jokoonon 10/16/2023, 12:56:59 PM
I once read that C is the new assembly, because all CPU have a C compiler.
I then decided to make a language that compiles to C, it's just about adding strings, list and tuple. I almost finished the parser and the "translator" will take more time (I encourage anybody to try lexy as a parser combinator). Basically it will use a lot of the C semantics and even give C compiler errors, so it will save me a lot of work.
Of course I am very scared that I will run into awful problems, but that will be fun anyways.
by ultra_nickon 10/16/2023, 2:11:32 PM
I just write English these days and have my LLM compile it to Python, so...
by danielmarkbruceon 10/16/2023, 2:56:07 PM
On the flip side, maybe CPUs are trying to be too general purpose.
by BeefyMcGheeon 10/16/2023, 12:20:36 PM
(2018)
by mbfgon 10/16/2023, 2:09:26 PM
PDP-11 is a fast machine?
by dborehamon 10/16/2023, 1:46:55 PM
2017
by acqqon 10/16/2023, 12:20:13 PM
(2018)
by i_am_a_peasanton 10/16/2023, 12:47:08 PM
[dead]
by somaton 10/16/2023, 12:11:04 PM
I don't think it was ever claimed that C was a low level language. In fact I have always heard it as the canonical reference for an example of a high level language. I will admit that in this day and age C feels like a low level language.
Lower level is something that maps more directly to machine operation (assembly, maybe forth).
Higher level is something that has it's own semantics of operation and need to be converted to into the machine operation, the more conversion the higher the level.
by draw_downon 10/16/2023, 12:37:25 PM
[dead]