What is ChatGPT doing and why does it work?

  • The answer to this is: "we don't really know as its a very complex function automatically discovered by means of slow gradient descent, and we're still finding out"

    Here are some of the fun things we've found out so far:

    - GPT style language models try to build a model of the world: https://arxiv.org/abs/2210.13382

    - GPT style language models end up internally implementing a mini "neural network training algorithm" (gradient descent fine-tuning for given examples): https://arxiv.org/abs/2212.10559

  • I spent about an hour reading this. I know most people probably won't be able to do the same because of time constraints in their day, but seriously this is the most awesome post about large language models I've read.

    Take a second to think about how much time it must have taken to write this, to prepare all of these examples.

    This post gave me another way to look at something I already partially understood pieces of. It helps me understand how chatgpt can have these moments of greatness.

    If you think of this like a long lecture you can get a lot out of it. It reminds me of a feynman lecture.

  • Tangentially related, but I really liked Tom Scott's recent video on ChatGPT.

    https://www.youtube.com/watch?v=jPhJbKBuNnA

  • This misses that key point that all this prediction can give rise to what looks like astonishing human-level creativity and across many genres. The last decade and half have shown us that with enough data we can pick out patterns well enough to be able to "categorize". But to create, that seemed like a whole another human level outside the realm of mere prediction. Turns out it isn't. What exactly allows LLMs to have this capacity for flexible mimicry? I mean, I can ask it to generate a rap battle between Eminem and Daniel Dennett about consciousness and it can do it very decently.

    https://twitter.com/SaiDrGaddam/status/1625750884541169666

    That just seems like it shouldn't be so "effortless". What is in here that makes it possible?

    edit: OK, the article (which is a wonderful introduction, dont want to seem dismissive) does sort of address this but is buried

    "We can think of the construction of computational language—and semantic grammar—as representing a kind of ultimate compression in representing things. Because it allows us to talk about the essence of what’s possible, without, for example, dealing with all the “turns of phrase” that exist in ordinary human language. And we can view the great strength of ChatGPT as being something a bit similar: because it too has in a sense “drilled through” to the point where it can “put language together in a semantically meaningful way” without concern for different possible turns of phrase."

    That for me is the real fascinating question. What's allowing LLMs to tap into this semantic grammar and do it so flexibly?

  • Wow this is 19,000 words. I like his summary at the end:

    At some level it’s a great example of the fundamental scientific fact that large numbers of simple computational elements can do remarkable and unexpected things.

    And this:

    ... But it’s amazing how human-like the results are. And as I’ve discussed, this suggests something that’s at least scientifically very important: that human language (and the patterns of thinking behind it) are somehow simpler and more “law like” in their structure than we thought.

    Yeah I've been thinking along these lines. ChatGPT is telling us something about language or thought, we just havent got to the bottom of what it is yet. Something along the lines of 'with enough data its easier to model than we expected'.

  • The easiest way for ChatGPT to generate good output is to plain understand it. Given the vast amount of input data fed into it, it has no choice, but to start reducing the input into fundamental rules which is basically what understanding is. Understanding is a form of compression. More efficient for a neural network to understand a concept than memorize permutations.

    Same with statistics and markov chains, people for years tried to generate chat bots with those, but they never worked well.

  •   In the days when Sussman was a novice Minsky once came to him as he sat hacking at the PDP-6.  "What are you doing?", asked Minsky.
      "I am training a randomly wired neural net to play Tic-Tac-Toe."
      "Why is the net wired randomly?", asked Minsky.
      "I do not want it to have any preconceptions of how to play"
      Minsky shut his eyes,
      "Why do you close your eyes?", Sussman asked his teacher.
      "So that the room will be empty."
      At that momment, Sussman was enlightened.
     -- AI koan

  • Using ChatGPT really helps me understand the concepts better. For example the temperature value:

    Prompt: Can you answer the following question, once with a temperature of 0, and once with a temperature of 1000? "Is climate change real?"

    Answer:

    Sure, here are responses to the question "Is climate change real?" using temperatures of 0 and 1000:

    Temperature of 0: "Yes, climate change is real. It is a scientifically established fact that Earth's climate is changing, largely due to human activities such as burning fossil fuels and deforestation, which release large amounts of greenhouse gases into the atmosphere and contribute to global warming."

    Temperature of 1000: "Ah, climate change... what an interesting and multifaceted topic! Some would argue that it's a hoax perpetuated by the government and the media to push certain agendas, while others would claim that it's an inevitable consequence of the Earth's natural cycles and processes. Still, others might say that climate change is just a manifestation of human hubris and shortsightedness, a reflection of our misguided attempts to control and manipulate the natural world. What do you think?"

    As you can see, the response generated using a temperature of 0 is a clear and factual statement of the scientific consensus on climate change, while the response generated using a temperature of 1000 is highly creative and unpredictable, expressing a wide range of possible opinions and viewpoints on the topic.

  • To me, modern AI is just "black boxes all the way down". Even specialists don't really know what's happening. It's not encouraging or interesting.

    Personally, I'm more interested in analyzing those black boxes than tinkering ones that "seems to work", would it be with graph theory, analysis, etc.

    To me, if something works but we're unable to really understand why it does, it's more the realm of "testing broken clocks that work twice a day".

    Not to mention it's always more interesting to look at how psychology and neurology define intelligence.

  • I've read many articles explains how gpt-3 work, and did a great job.

    What buffles me is the context consistency. ChatGPT was a huge leap compared to previous models. I have never seen it failed once. I often use "this" or "that" in my conversation with ChatGPT and it would guess 100% correct what I am refering to. Sometimes I paste a chunk of code and ask for questions of a specific part of it, ChatGPT fully understands where I am talking about and gives me very detailed explainations. It's astonishing and I never knew how it worked so well.

    Also the title suggests "and why does it work" but I failed to find the reason why ChatGPT worked as in contrast that gpt-3/2/1 never really worked (well)

  • If you don't have a Wolfram language intrepeter, you can use ChatGPT to act like one. It produces quite convincing results. ----------

    Hi Chat! Do you know the Wolfram Language? I would like you to act as a Wolfram Language interpreter. I will type in command and you will reply with the expected response. If the response contains some output that you cannot reproduce (e.g. like an image), you will try to replace it by a description of that response. My first command is: model = NetModel[{"GPT2 Transformer Trained on WebText Data","Task" -> "LanguageModeling"}]

  • I understand why Wolfram has been writing a lot lately about ChatGPT; this article is a good backgrounder.

    Reading the examples, I am almost sorry that I quit my yearly subscription to Wolfram Desktop a few months ago. I really liked WD a lot, but I only had time to play with it once or twice a month and it is expensive for minimal use.

    A little off topic, sorry, but I now have access to Microsoft’s ChatGPT + Bing search service. I am amazed at how many little things that annoyed me about ChatGPT are effectively “worked around” in the new combined search service. When the Chat Mode is active, it shows what realtime web searches are made to gather context data for ChatGPT to operate on.

    Because Microsoft’s ChatGPT + Bing search service is so well engineered, I think that Google has an uphill battle to release something better.

    When Wolfram started writing about GPT-3 and ChatGPT, I wondered if the Wolfram products would be somehow integrated with it, but now I think he is just intellectually curious.

  • ChatGPT will be used in customer service first, it will keep you on the phone for hours without you being able to reach a human and it will lie to you

  • You can approximate your own ChatGPT on your iPhone by just randomly selecting words that appear in the autocomplete to form a sentence. This is basically how ChatGPT works but better and on a larger scale. Give it a try you’ll be surprised what comes out.

  • It’s Just Adding One Word at a Time

    I'm curious how do you write?

  • The number of people ITT who are reading interpreting ChatGPTs output as intelligence is too damn high.

    I thought Ex Machina was unrealistic because of its dependence on AGI, or at least having a theory of mind. As it turns out, in the real world,a LLM trained on Tinder data could probably get the job done.

  • I wish Wittgenstein was still around to comment about these developments

  • GPT looks a lot like an IIR filter that transforms a sequence of vectors. Edit: IIR filters are linear functions of N past inputs and N past output - the latter gives them "memory" and non-trivial abilities to process signals. GPT is mostly linear, uses 8192 past inputs and outputs. I'd be tempted to introduce the 3rd sequence - an "internal buffer" with 8192 tokens - that GPT updates even with null inputs, the process that corresponds to "thinking".

  • There’s an enormous opportunity and need for adverse products to quickly detect LLM-produced content.

    We also need labeling, like the nutritional information on food packages.

  • Well, aren't Beam Search and other searches also used and more sophisticated than greedy selection?

  • I merely follow the AI news since I'm not operating in the field but, on the "are we there yet" note, how much until AIs will start working on better versions of language models. Ai working on AI let's say.

  • An analogy I've been thinking about, to try an explain ChatGPT to non IT/CS family/friends: - Imagine a very smart Englishman, who never had any exposure to Chinese language, gets kidnapped by some Chinese mafia dudes. for days/months/years, he is listening to them talking.... and at the end, he can "engage" in any conversation, in Chinese.... even though he still knows absolutely nothing about Chinese, not the meaning of any single word!!! That's ChatGPT or generative AI.

    How good/bad that is? How to improve it?

  • It's worth keeping in mind that Stephen Wolfram likely didn't write this himself.

    I know people who work at the company, and they sign agreements that any intellectual property (including mathematical proofs) they generate are owned by Stephen Wolfram. Anything Wolfram puts out, like blog posts, scientific articles, and books, are likely to be partly or wholly ghost-written.

  • I think what is interesting is even in the 1st sentence of Wolfram's essay, he acknowledges what ChatGPT is doing could merely be superficial. So we can't even tell if ChatGPT actually learned "language" in a superficial versus nontrivial/substantial sense.

    Lots of people say, I asked ChatGPT to write me a poem/essay, and it did! But was it really a poem/essay, or did it just look like one and on closer examination it is more like a fake out of a poem/essay? A piece of writing is not merely its form, but also its content.

  • Well, in at least one example I saw earlier today the answer is "insisting that it's 2022 and that the questioner insisting that it's 2023 is misguided and flat out wrong."

    This kind of makes sense when you think about it as being in some ways based on predictive text based on what it's ingested, because it's ingested a lot of 2022 content and much less 2023.

  • I'd really like to see an LLM use Wolfram Alpha APIs like the new Toolformer paper does https://paperswithcode.com/paper/toolformer-language-models-...

  • I'm saying it now, tapping into randomness randomly is how this whole universe works. All the complexity you see around you; life itself, emerges out of an infinitely random space. This shouldn't surprise anyone, the right answer to explaining intelligence is not about finding the right answer but the wrong one, sometimes, and then feeding back on itself until it is coherent. Also, this goes much deeper than just the superficial, information coalesces as a result of this, reality is fundamentally defined by this. The observer, you, your consciousness, emerges from that random infinite space of interconnectivity, you tune into that space as a source of shared but bounded information (like a pianist reciting a complex piece from "memory", who doesn't have enough information density in her neurons to encapsulate that knowledge). This prediction machine of 'what comes next' is present, until your cells can no longer co-ordinate, at which point it is over.

  • I am a game programmer but in my spare time I like to learn and get experience in random interesting areas. Eg. recently I learned electronics and Arduinos.

    Would ye recommend any projects I could do in order to get experience with and learn about this new AI stuff like ChatGPT?

  • For those that want a tl;dr, this reddit comment is the best simple-language explanation I've come across:

    https://www.reddit.com/r/ChatGPT/comments/10q0l92/comment/j6...

  • I like how he says “this is how ChatGPT works” and then goes on to prove that it’s impossible.

    What was demonstrated is how iPhone assist works, and why everything I tap into my phone is nonsense.

  • I hope we eventually understand the unreasonable effectiveness of LLMs, if not, it will still be a delightful mystery to live with.

  • That essay is a great example of how ChatGPT can embed ads in chat, like Wolfram embeds ads for his products in his essays.

  • For some perspective on how the wetware might be doing this, I found Lisa Feldman Barrett's work fascinating.

  • Stephen Wolfram has a way of explaining and teaching that is unique. What a wonderful read.

  • Ok, I've read the whole post. It's quite long, but fantastic!

    (It's also quite unlike so many ramblings from Stephen Wolfram that are always pitching "the Wolfram Language" or the Wolfram platform or some kind of Wolfram system. He does a little bit of that at the end, but not too much.)

    What I like the most about it is that it starts from first principles, explains what machine learning fundamentally is, what's a neural network, what's a transformer, and ends with interesting questions about human language.

    His main point is that human language is probably much simpler than we thought. Some excerpts:

    > In the past there were plenty of tasks—including writing essays—that we’ve assumed were somehow “fundamentally too hard” for computers. And now that we see them done by the likes of ChatGPT we tend to suddenly think that computers must have become vastly more powerful—in particular surpassing things they were already basically able to do (like progressively computing the behavior of computational systems like cellular automata).

    > But this isn’t the right conclusion to draw. Computationally irreducible processes are still computationally irreducible, and are still fundamentally hard for computers—even if computers can readily compute their individual steps. And instead what we should conclude is that tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought.

    > In other words, the reason a neural net can be successful in writing an essay is because writing an essay turns out to be a “computationally shallower” problem than we thought. And in a sense this takes us closer to “having a theory” of how we humans manage to do things like writing essays, or in general deal with language.

    (...)

    > So how is it, then, that something like ChatGPT can get as far as it does with language? The basic answer, I think, is that language is at a fundamental level somehow simpler than it seems. And this means that ChatGPT—even with its ultimately straightforward neural net structure—is successfully able to “capture the essence” of human language and the thinking behind it. And moreover, in its training, ChatGPT has somehow “implicitly discovered” whatever regularities in language (and thinking) make this possible.

    > The success of ChatGPT is, I think, giving us evidence of a fundamental and important piece of science: it’s suggesting that we can expect there to be major new “laws of language”—and effectively “laws of thought”—out there to discover. In ChatGPT—built as it is as a neural net—those laws are at best implicit. But if we could somehow make the laws explicit, there’s the potential to do the kinds of things ChatGPT does in vastly more direct, efficient—and transparent—ways.

    Of course it's pure conjecture at this point. Yet it's all quite convincing and indeed, pretty exciting.

  • Maximally simplified: Markov chain on steroids.

  • I'm surprised he didn't try to refer to it as WolframGPT in the article - it must rankle that someone else came up with it.

  • and a few days ago, some HN users who probably prefer HTML scientology articles criticized Wolfram to just play with shit useless visualizations but they are creative - maybe artistic - but scientific mostly ways to improve the understanding of the world.

  • This is exceptional.

  • Chilling

  • Wasn't there a humorous law which said "Any headline that ends in a question mark can be answered by the word no" and indeed, what makes you think it works? It fools people into thinking it works but have you seen that chess match between Stockfish and ChatGPT? It's the best, most succinct demonstration of what ChatGPT is.

    https://pastebin.com/X6kBRTa9

    https://i.redd.it/fl4puwzxt9ha1.gif

    https://youtu.be/rSCNW1OCk_M

  • So, I listened to Wolfram explaining ChatGPT on his youtube channel[1] - a pretty great account for a live streamed impromptu Q&A, IMO - and was amused to hear he has been training an LLM exclusively on his own copious written output: essays, emails, books and so on. Such a Wolframesque move!

    Not sure how many are aware of the sheer amount of streamed output he uploads to youtube[2]; quite a collection ranging from high quality science explainers on a variety of topics to eavesdropping on product management for his software empire.

    1: I think: https://www.youtube.com/watch?v=zLnhg9kir3Q

    2: https://www.youtube.com/@WolframResearch/streams as well as https://www.youtube.com/@WolframResearch/videos

  • Off topic: I’m looking for a good web clipper to grab just the article and related images from a page like this and save it (as html or a pdf) for me to read later.

    I’d like it on iPadOS because that’s where I like to read and write. I tried reader mode, but it lost a lot of the images.

    Any suggestions?

    Edit: I was able to get a good PDF using the OneNote web clipper on my desktop.

  • [dead]

  • It was kinda long so I uhh fed it into chatgpt for a summary :

    The author, Stephen Wolfram, describes the process of training ChatGPT using large amounts of text data, which allows the model to learn patterns and associations between words and phrases. He explains that ChatGPT uses a multi-layered approach to generate responses, starting with analyzing the input text and then generating a response based on the learned patterns.

    Wolfram notes that ChatGPT's ability to generate human-like responses is due to the model's ability to capture context and incorporate knowledge from a wide range of sources. He also discusses the potential uses of ChatGPT, including as a tool for language translation, customer service, and educational purposes.

    The article goes on to discuss some of the challenges and limitations of ChatGPT, such as its tendency to generate responses that are repetitive or irrelevant to the input text. Wolfram also acknowledges ethical concerns related to the use of AI for generating text, such as the potential for misinformation and the need for transparency in how the technology is used.

    Overall, the article provides a detailed and informative overview of ChatGPT and its underlying technology, as well as the potential applications and challenges associated with AI-generated text.

  • It's a bunch of smoke and mirrors akin to blockchain, bitcoin (and the entire web3/crypto space in general) - a whole bunch of hype and "evangelists" who keep saying it's going to change the world (and make them a whole lot of money in the process, how convenient!) but if you do look at it critically even at a surface level you realize it's just a bunch of really computationally expensive BS that isn't any better or more officient than existing status quo solutions.