The suggested transformations look nice for the trivial, tiny examples used, but would really hurt readability of code in general.
Code is not read linearly as a book - it is 'scanned' and reviewed back-and forth; and the compactness of the code is important for readability.
Also, on currently standard(sadly) computers, especially laptops, vertical space is very restricted in standard LCD dimensions. If you spread out a screen of semanically linked code to two screens, then you suddenly can't grasp it all at once w/o scrolling through the pages back and forth, and that is a real loss. Newlines and empty lines can and must be used to group things in "paragraphs", but the OP suggestions waste far too many lines.
Unfortunately if you focus on beauty you sometimes break pragmatism. The JavaScript example in particular is not merely a typographical convention, but a way to avoid common errors.
var i=1
, j=2
, k=3
You can remove any of the comma prefixed lines there (even the last one) and not introduce an error. You can add another similarly prefixed line anywhere to the list and not introduce an error. It's obvious if a comma is missing (which is good, because you don't have a compiler to let you know).It can be difficult to spot the lack of a trailing comma, or the end of this declaration list having a comma instead of a semicolon (, vs ;), both of which will break the execution of your script.
So please, do not change your code to make it look better without understanding why it's like that in the first place.
The classic example is tchanging the following to allman/gnu style braces would break it in JavaScript:
// works, returns {a:1,b:2}
return {
a: 1,
b: 2
}
// semicolon inserted after return, returns undefined
return
{
a: 1,
b: 2
}
When I first started programming in C, I did the same thing with pointers (i.e.,
char* str;
instead of char *str;
).Unfortunately, this creates the wrong impression that
char* str1, str2;
creates two pointer-to-char variables, whereas actually str1 is a char pointer and str2 is simply a char. Indeed, I was confused on this point myself when I was a newbie, which led to great confusion later on.The clearest way I've ever found to think about C declarations (ironically, I think I read this in some article maligning C's syntax in favor of Go's), is that each declaration is of the format
[type] [expressions--one for each new variable--that equate to type];
Thus, the way I think of declaring a char pointer is char [de-referencing the variable (which is a char pointer) to arrive at the char];
char *str;
(Obviously, not everyone is going to agree that that is simple, but it works for me and my brain.)Anyway, the point is that while typography is great, it can be just as harmful as helpful if you communicate the wrong impression to the reader. And to be fair, it really looks like the author is not at home in C: besides his misconception about pointer declaration, he didn't bat an eye at the old-style argument-declaration syntax that has been obsolete since ANSI C.
Also, I hope I never write a for loop that looks so massively bloated--a matter of opinion I guess.
I positively hate that for loop rewrite. Having the semi-colon so expressly indented looks awful, especially on it's own line.
I really feel that the for loop should be converted to a while loop - that would probably make the intent clearer.
On a bit of a related subject one thing I've always idly wondered about Lisp and typography is how hard the curvy parens, together with not much indentation make it hard to line stuff up vertically.
A random elisp example:
(while (> count 0)
(re-search-backward regexp bound)
(when (and (> (point) (point-min))
(save-excursion (backward-char) (looking-at "/[/*]")))
(forward-char))
(setq parse (parse-partial-sexp saved-point (point)))
(cond ((nth 3 parse)
(re-search-backward
(concat "\\([^\\]\\|^\\)" (string (nth 3 parse)))
(save-excursion (beginning-of-line) (point)) t))
((nth 7 parse)
(goto-char (nth 8 parse)))
((or (nth 4 parse)
(and (eq (char-before) ?/) (eq (char-after) ?*)))
(re-search-backward "/\\*"))
(t
(setq count (1- count))))))
See where the ends of the parenthesis point? Not straight up or down, but diagonally.Just a random thought... maybe it's just me.
While I like the idea of talking about typography in code, I don't think it's likely that we're as badly off as the post suggests. In fact, pretty much all of those examples make it harder for me to read.
>Weâve also re-set the data type such that there is no space between char and * - the data type of both of these variables is âpointer to charâ
No. No no no. That is how you end up with people who can't parse `char* a, b;`. What you're doing is declaring that `* to` and `* from` are `char`s. The common reading that the post suggests is wrong and detrimental to reading C properly.
I'm having trouble deciding whether this is a serious post or a troll. The C "improvements" are particularly hideous.
Reading Code Complete (and listening to Crockford's talks) has really opened my mind to writing clearer code constructs. For example, the for-loop's job in the first example should be to track indexes. There shouldn't be code that "does stuff" between parens. Instead of superficially breaking the for-loop into several lines and wasting time on aligning semicolons, it could be re-written as a while loop with clarity in mind. Like this:
while (*from != 0) {
*to = *from;
from++
to++;
}
To me this looks much saner (unless I'm doing something wrong, I'm a bit rusty on pointers). But I notice that a lot of "C-hackers" try to cram as much into one line as they can, often including every possible pointer incrementation and assignment. At least here, the variables are properly named. A lot of C code uses one-letter variables and reading those isn't a lot of fun, e.g. while((*t++ = *f++) != 0 )
(Note, I don't really know if this is correct.)Looking at this thread is why we need tools that let us structure code the way "WE" like it, but that outputs said code in a defaulted format.
Basically IDE formatting should be outside of the actual code formatting allowing people to format code how they see fit. Some are going to like to see the code the "standards" way and that's fine; Many of us will not.
@tmoertel's point is a prime example of this, as the only relationship I personally see in the aligned code is that they are all variables. His perceived parallelism was not present in my personal view of that same code snippet. Now for me white-space denotes parallelism more poignantly than alignment.
var x = shape.left(),
y = shape.right(),
<--**
numSides = shape.sides();
meh, I give up trying to format this code block. :P*This denotes parallels between variables to me. Not the alignment.
The only parallel that is draw for me is that all the items are variables (this would also be reinforced by the color I give variables). I group things with white-space; so that means in the context of the function using the variables as I've laid them out above; x,y are related and most likely linked, where numSides is needed but not necessarily associated inside the function with the x,y variables.
So ya, this stuff is very subjective and as @tmoertel said "subtle and tricky".
O.
If you also consider what commit diffs will look like if you have to insert or delete a variable, (particularly the last one) then comma first creates less cruft in your history.
I think looking at musical typesetting is even closer to the problem of typesetting source code. lilypond.org does explain some ideas about musical typesetting. For example when and when not to align notes, etc.
The OP does seem to favour a table-like grid layout for source code, yet I rather feel like source code describes hierarchical trees (that is why we like the indenting). Arranging things in a tabular way is not principally beneficial. Sometimes you encounter things like this:
int x = get_width();
const long double y = 1.5 * get_width();
But what use is the aligning? Type, identifier and value of `x` are far apart, it is easy to switch lines here.Now most programming languages have a big problem with indentation and layout because their syntax is weird. C's habit of putting types in front, etc. That is probably why the GNU C styleguide is proposing something like this:
int
strcpy( ....
Type and identifier on separate lines. This style is not widely adopted, and that probably shows another aspect of typography: What is typographically correct depends on what is common.The C example is a great demonstration that typography is not an objective practice. That someone could take the original strcpy() and, with the express goal of improving its appearance, produce something so unpleasant to read ...
When I was younger I did a lot more interior alignment between lines, like with the list of variable declarations. Over the years I've found that they add a lot of busywork effort to the editing process. Everyone talks about optimizing for reading, but not optimizing for editing, which in some cases is really what you need to be optimizing for. But even laying that aside, the readability improvement of such spacing was debatable. Sometimes it's making visible a repetition (presumably one that couldn't be done with an actual loop), but a lot of times it feel more like a novelist who decided that the main verbs of their sentences should be vertically aligned. Agreed, it's making something visible -- but is that really what the average reader cares about? Ultimately the typography should not be a distraction from discerning the sense of a piece of code.
The elephant in the room is that many languages have adopted or have been influenced by the C language's miserable practices of ending a statement with a semicolon, and using the equals sign as an assignment operator. Both of these practices break with conventional usages to no discernible advantage.
Another impediment to readability is the insistence upon representing code by using hideous, low contrast colors on a dark background, especially when the code snippets are mixed with the conventional representation of black text on a white background.
My favorite quote about C's synatx was written by Erik Naggum: "If you care to know my opinion, I think semicolon-and-braces-oriented syntaxes suck and that it is a very, very bad idea to use them at all. It is far easier to write a parser for a syntax with the Lisp nature in any language than it is to write a parser for thet stupid semiconcoction. Whoever decided to use the semicolon to _end_ something should just be taken out and have his colon semified. (At least COBOL and SQL managed to use a period.)"
Given that we can't get developers to agree on where to place { I highly doubt we'd ever be able to settle on more esoteric formatting issues.
I'm also not sure that I want a developer spending much of his brain space or time prettying up the code beyond what is currently considered well formatted code. While it might be nice, there's also probably better things she could be working on.
A comma maybe is not so important in a language between humans like here --->, but in a programming language (as human-human AND human-computer medium) a comma is often an important separator. Missing it breaks often correctness, therefore having it first in the line based on importance is ok for me. No syntax compile tool needed to visualize its ok...
Would like to know whether the author feels his rewrite is more successful than the original. The following takes me longer to read:
for (
;
(*to = *from) != 0;
++from, ++to
)
;
Whereas the idiomatic version seems simpler: for (; (*to = *from) != 0; ++from, ++to);
I think more important than a particular typographic style is consistency. Most of us move between large codebases that are formatted differently and I find that I can quickly adapt to a different style as long as it's consistent.
Also, I think color is really useful and something that's not used as much in traditional typography.
It's better to just use an autoformatter like clang-format. The miniscule time you'll save reading the code with clever formatting is going to be wasted in arguments with fellow contributors about how clever the formatting is and whether it's appropriate.
I've always thought source code should use different fonts, and perhaps even non-monospace fonts in some cases (perhaps for strings, comments)
Why are we forced to stick with a single fixed-width font and color, limited use of italics, and no use of boldface?
I come from a photography and design background, and I've tended to naturally write code much like the author is suggesting.
However, the for loop is mystifying to me, but I don't fully understand the actual code there. Would anyone care to explain it?
We could definitely do a better job with source code typography, but if the goal is to make the code more readable, I think the points discussed here are setting the bar pretty low for what we could achieve instead of todayâs norms.
For example, even fairly basic textbooks make extensive use of footnotes, endnotes, marginal notes, sidebars, illustrations, tabular material, bibliographies, and other similar tools. What these all have in common is that they remove valuable supporting information from the main text, but keep it available (with varying degrees of immediacy) for readers who want to refer to it, and provide conventions to help the reader find it. In some cases, they also present that material in a structured way that is more effective than plain text.
With modern interactive IDEs, we have the ability to use not only all the same tricks that traditional book typesetting does but also many more because ours can be dynamic, interactive, graphical, or any combination of the above. We can add supporting material on any of four sides around a code listing, or even overlay information on the listing itself, and we can change the information we show in those places automatically with context and/or at the readerâs request based on what they want to do next. And yet, except for debugging and for navigating around large projects, we tend to make very little use of these tools. We still mostly try to present code as a single static column of monospaced material with the occasional extra vertical space and a bit of horizontal alignment to emphasize structure, and maybe a left sidebar with line numbers and a couple of icons for bookmarks or breakpoints, or perhaps a bit of trivial dynamic highlighting during things like find/replace work.
What if instead we tried to make the main code area focus on the core logic and data, and move anything else out of the way? There are many opportunities to do that. Type annotations? Supporting data. Comments? Supporting data. Long list of module.names.before.useful.identifier? Probably supporting data as long as itâs unambiguous, probably something you want to draw attention to if itâs not. Even keywords like âvarâ in the example code snippets donât help a human reader much. And thatâs just with the kind of conventions and languages we use today, without even considering the endless possibilities of languages and tools designed with alternative presentation in mind from the start.
None of this even needs to get in the way of the tried and tested practice of storing code in plain text files. It could all be dealt with in your editor/IDE using exactly the same kinds of techniques as the standardised-formatting tools other posters have mentioned here, and saving a file could record the supporting data in a standardised text format that is friendly to version control systems, code review tools, automated diffs, and so on.
In short, if weâre going to address readability and presentation in programming, letâs think outside the box a bit. We have the most powerful information sharing and presentation tools in human history at our disposal, and a mountain of data about usability and interface design. We can do more than worrying about how many spaces we should have between tab stops.
Personally, I've always found code with a "table structure" prettier but less readable in practice. It's also fussy and time-consuming to maintain. I favor information density and flow.
"occasional performance concerns require putting readability in the backseat, but this is rare"
occasional? Seriously? rare? Seriously?
"Source code should be written to be understood by people." Nope, Source code should be written to be executed. If people can understand it easily, its a plus point, not a baseline.
Considering that the keyboard is the primary way we write sources, I find it difficult enough to keep my fingers in speed with my thoughts. In addition to that if I have to press tabs to align each of the statements in my 'for's, I'll be left in a much poorer way.
There's a big ol' elephant in the middle of the room that this post is not addressing.
The irrational love programmers have for horrible monospace fonts.
My eyes aren't great, but I had trouble reading the code.
Typography is a subtle and tricky thing. What seems âbetterâ may actually provide misleading cues.
For example, consider the alignment in the following, improved snippet from the blog post:
It may be âbetterâ typographically, but it also suggests a false parallelism.The eye can't help but interpret closely packed things as groups. So the subliminal cue presented by the formatting above is that there is a parallel assignment from the group of expressions on the right to the group of variables on the left. That is, at some level the eye can't help but see the code above as
But the evaluation and assignment are not parallel! They are sequential. The difference may not matter in this example, but it's easy to imagine this kind of formatting applied to examples where it would.The less-formatted original version below actually represents the reality more faithfully because its shape does not suggest parallelism:
It looks more like a sequence, which it is.Indeed, typography is a subtle and tricky thing.