Reminds me of the boom in voice assistants, when we were told it was the interface of the future.
I’d ask my Google Home what the temperature was going to be today. It would tell me. I’d ask what the temperature was yesterday. It would tell me it didn’t understand the question.
ChatGPT etc obviously aren’t quite that bad but the core problem remains discoverability. It isn’t at all clear what these interfaces can and cannot do, all you can do is keep trying different things until you get a result.
While I agree with the title, I find it very lackluster that it entirely focuses on some specific AI interfaces.
First of all, ChatGTP implies by its name that it's designed to chat. It's a sophisticated chatbot, but a chatbox nonetheless, you are not going to have a chat by filling a form. Then the examples given for StableDiffusion are not natural language. In this case, there can probably be a better interface than a single textbox, but the issue is the textbox, not that it expects natural language (it doesn't).
Other types of interface for AI stuff do exist. Copilot is also an LLM, but it takes surrounding code as input, not a natural language prompt. Plenty (most) of models take whatever format is convenient for the task and output an adapted format (image/video/text classification, feature detection...).
On the other hand, there are some interfaces that force natural language processing where it is one of the most unnatural and ineffective option, and no AI whatsoever is involved. Anyone who ever tried to book a train ticket in France in the past couple of years know what I mean[0]. Having to spell out an itinerary and date information is very, very confusing and error prone.
I'd like to take issue with the characterization of copilot as not needing textual prompting.
To get it to work, you need to use comments. The more the better. You can put huge amounts of context and information in them. This involves writing and description. It's exactly the same as what the author is arguing against.
It can only be good when the interface already knows a massive amount of context. For example the LLM in a future use case knows all your email and you just ask it to draw a time line of this thread and etc.
If you are asking it to do stuff from scratch like most of us do on ChatGPT, it’s quite a pain
"There is nothing, I repeat nothing, more intimidating than an empty text box staring you in the face."
Talk about a hyperbolic opening line.
Is it really that intimidating to have an empty text box on Whatsapp or your favorite SMS app? No, as you expect to have an appropriate response coming from the other side, pretty much regardless of what your input is.
As a frequent user of ChatGPT, I've come to expect the same in there. And it works great, without me having to study any "prompt engineering". In fact, as it gets updated, I get frustrated less and less often — unlike my experience using Bard, which can be better for a few tasks but often returns opaque errors that do feel frustrating. The solution here is clearly for the model to improve, and one doesn't even need a leap of faith — just look at what OpenAI is already delivering!
Talking to a competent LLM is nothing like talking to bash or dos. I also get frustrated when I sometimes have to ask for the same thing in a slightly different way... but that's still almost always faster than searching for the right button or submenu in most creation-oriented software. Whoever is waiting for Word or Google Docs to add a "write this in business-formal email tone" dropdown menu to the UI clearly hasn't grokked the true shift we're about to go through in computing.
Incidentally, I am often using ChatGPT to help me do more advanced / rarely used tasks in software from Avid Pro Tools to Adobe Premiere. And I can't remember a single time when doing this was slower or more frustrating than reaching out to either Google or the software's own "help" section.
Of course we'll have more input options. It makes tons of sense for things like image or video generation. I bet the models will also soon be outputting more and more "interactive elements" that will aid in refining results. But I have a feeling the opening text box (or, better yet, the open ears of a friendly audio assistant) is here to stay.
Some odd Python "class" syntax in there.
## Python class called Car
def Car(make, model, year):
"""This function creates a new car object"""
car = {"make": make, "model": model, "year": year}
return car
Very relevant to my post from January, “Natural language is the lazy user interface”.
This article isn't about natural language or AI, IMHO.
Typing into a keyboard is not natural. Adding buttons makes it even less so. We don't have AI avatars today. But we will within my lifetime. I'll be able to converse with most any historical figure in VR. We will see each others facial and hand expressions. Let's not go backwards and add buttons. Put your efforts into going forwards.
Natural language is the ultimate programming language. We use it to program us humans all the time
I agree with the general intention of the post.
> Natural language is an unnatural interface
However, I think the title is misleading. Maybe I am too much of a technical person and take interface too broadly. But natural language is probably the most natural interface as humans learn this really early and use it every day.
Also, the word "natural" may not be suitable for this discussion. Maybe the author has this in mind already but I think we should rather talk about intuitive, efficient, motivating, … interfaces.
In general, I guess HCI is not about natural but rather useful.
We have been using natural language interface where I work for 5+ years (pre LLM) and honestly if applied correctly it can be very sticky and effective.
For example...Your application may have multiple capabilities serving multiple user types.
Rather than smack an empty text box in the front page you can try embedding the box into a specific capability and develop parsers focused on the particular domain of that capability. This limits the scope and thus chances for not delivering on the users intention.
Most of the arguments I hear from my mates-who-code about adding deterministic old school things like buttons and "ui" on top of LLMs is "sometimes it's faster and more specific"... Ok, sure, but go to voice to text, and give the llm in a voice version, to a grandma. They can just use it.
and if you, the programmer need specifics "I want to do root cause analysis of these 20 incidents, break them up into mobile and web app, then draw common threads in each type of RCS".. then write/say a better prompt that exactly details what you want. Might take 2 or 3 goes, but you will get there. Adding metadata tabs like some vector-db franken-sharepoint seems like a step back. At least from a Rich Sutton world view. Let the LLM work it out- and if it fails, improve it for a many uses cases, not just one.
IMHO (tldr- strong disagree)
Strong disagree. These are straw man arguments about the limitations of models. Senior executives, politicians and people in high leadership positions can see their wildest dreams realized using little more than natural language and discussions with the world’s leading experts. Satya Nadella or Joe Biden don’t enjoy and special abstractions and yet they are able to get everything they want using natural language.
The problem is that most people don't know how to express what they want (and often they dont know what they want). But it's a bit myopic to stick to language interfaces, the LLMs are already integrated with images, and over time they will be integrated with entire GUIs.
I'm really confused by "Anecdotally, most people use LLMs for ~4 basic natural language tasks" and "Most LLM applications use some combination of these four". I'm not sure about the `ELI5` use-case, and feels like this is only true for a very limited type of use-cases people currently use LLMs for.
For conversational FAQ-type use-cases like the ones described by OP perhaps a few basic prompts suffice (although anything requiring the agent to have "agency" in its replies would necessarily require prompt engineering) - but what about all the other ways that people can use LLMs to analyze, transform and generate data?
As opposed to what?
I don't know about you, but for me on a personal level ChatGPT is a SERIOUS paradigm shift. I work on OSX, I use an app (MacGPT) that is always a keyboard shortcut away, and more often than not the responses are several times better than what Google will give me.
I know there are a lot of areas with high friction, but those will go away eventually and compared to what I was doing before, the friction is much much less.
Not to mention that it is a tool _I didn't have before_ and for me in some cases - specially those tasks that are unavoidable, you have to generally start from zero, and are a burden - my productivity has increased 3X
I'll take an unnatural interface any day if that's the end result.
I think on the input side natural language is a pretty reasonable interface. For me the output side is much more problematic. Natural language makes search output so uniform not only can't you tell whether something's real or not, if it is you can't tell whether it came from the Encyclopedia Britannica or the Youtube comment section.
Taking the ability to discern sources yourself away turns me off from using these systems anywhere where that is relevant.
Also of course not outputting structured data greatly diminishes having AI systems interface with other traditional automated systems. Natural language is not a great format for processing or analysis of any kind.
I don't particularly agree with the title but agree that there's a present awkwardness in the way that users are expected to derive correct insights from LLMs. In the same way that tokenization exists to overcome a would-be shortcoming of present computing resources and architectures, but may eventually become unnecessary: a more streamlined interface would be helpful in tiding us over this hump of awkwardness even if it too eventually becomes unnecessary.
The GUIfication of LLMs. Yuck. I’d better join the local model camp before bad ideas like this catch on and fully nerf this tech.
I started reading and I found the text not engaging, perhaps it's me being not in the right mood, but while deciding to stop reading, Substack showed a popup. What the hell. Enough enshittification. I dropped the thing and started to complain here. (shrugs)
Yes, this is what also is frustrating about saying a few words for over the phone voice interactions to describe problems.
Many companies laid off UX designer for the new AI trend
> unnatural interface
A simple example might be the problem of "pick a color". Even the best natural-language interface is going to suck about as much as if you're trying to ask another human to do it for you, even if that assistant is capable of displaying 1-5 color swatches in their replies.
Instead of just seeing the entire palette and choosing, you need to say "I want a gold color", "lighter than that" and "darker" and "less like urine" etc.
> People fundamentally don’t know how to prompt. There are no better examples than Stable Diffusion prompts.
You know, this reminds me of the Good Old Days of internet search engines, where a little expertise in choosing terms/operators was very powerful, before the advanced-case was cannibalized to help the average-case.