>We’ve limited the ability for DALL·E 2 to generate ... adult images.
I think that using something like this for porn could potentially offer the biggest benefit to society. So much has been said about how this industry exploits young and vulnerable models. Cheap autogenerated images (and in the future videos) would pretty much remove the demand for human models and eliminate the related suffering, no?
EDIT: typo
A few comments by someone who's spent way too much time in the AI-generated space:
* I recommend reading the Risks and Limitations section that came with it because it's very through: https://github.com/openai/dalle-2-preview/blob/main/system-c...
* Unlike GPT-3, my read of this announcement is that OpenAI does not intend to commercialize it, and that access to the waitlist is indeed more for testing its limits (and as noted, commercializing it would make it much more likely lead to interesting legal precedent). Per the docs, access is very explicitly limited: (https://github.com/openai/dalle-2-preview/blob/main/system-c... )
* A few months ago, OpenAI released GLIDE ( https://github.com/openai/glide-text2im ) which uses a similar approach to AI image generation, but suspiciously never received a fun blog post like this one. The reason for that in retrospect may be "because we made it obsolete."
* The images in the announcement are still cherry-picked, which is therefore a good reason why they tested DALL-E 1 vs. DALL-E 2 presumably on non-cherrypicked images.
* Cherry-picking is relevant because AI image generation is still slow unless you do real shenanigans that likely compromise image quality, although OpenAI has likely a better infra to handle large models as they have demonstrated with GPT-3.
* It appears DALL-E 2 has a fun endpoint that links back to the site for examples with attribution: https://labs.openai.com/s/Zq9SB6vyUid9FGcoJ8slucTu
Some freely available models
GLID-3: https://colab.research.google.com/drive/1x4p2PokZ3XznBn35Q5B...
and a new Latent Diffusion notebook: https://colab.research.google.com/github/multimodalart/laten...
have both appeared recently and are getting remarkably close to the original Dall-E (maybe better as I can't test the real thing...)
So - this was pretty good timing if OpenAI want to appear to be ahead of the pack. Of course I'd always pick a model I can actually use over a better one I'm not allowed to...
A friend of mine was studying graphic design, but became disillusioned and decided to switch to frontend programming after he graduated. His thesis advisor said he should be cautious, because automation/AI will soon take the jobs of programmers, implying that graphic design is a safer bet in this regard. Looks like his advisor is a few years from being proven horribly wrong.
This is a niche complaint, but I get frustrated at how imprecise open AI's papers are. When they describe the model architecture, it's never precise enough to reproduce exactly what they did. I mean, it pretty much never is in ML papers[0], but open AI's bigger products are worse than average with it. And it makes sense, since they're trying to be concise and still spend time on all the other important stuff besides methods, but it still frustrates me quite a bit.
[0] Which is why releasing your code is so beneficial.
I can see how this has the potential to disrupt the games industry. If you work on a AAA title, there is a small army of artists making 19 different types of leather armor. Or 87 images of car hubcaps.
Using something like this could really help automate or at least kickstart the more mundane parts of content creation. (At least when you are using high resolution, true color imagery.)
Preventing Harmful Generations
We’ve limited the ability for DALL·E 2 to generate violent,
hate, or adult images. By removing the most explicit content
from the training data, we minimized DALL·E 2’s exposure to
these concepts. We also used advanced techniques to prevent
photorealistic generations of real individuals’ faces,
including those of public figures.
"And we've also closed off a huge range of potentially interesting work as a result"I can't help but feel a lot of the safeguarding is more about preventing bad PR than anything. I wish I could have a version with the training wheels taken off. And there's enough other models out there without restriction that the stories about "misuse of AI" will still circulate.
(side note - I've been on HN for years and I still can't figure out how to format text as a quote.)
The most interesting item to me is the variations on the garden shop and bathroom sink idea. The realism of these leaks the AI lacking intuition of the requirements. This makes for a number of nonsensical designs that look right at first like: This Sink lacks sensical faucets. https://cdn.openai.com/dall-e-2/demos/variations/modified/ba...
This doorway is downright impossible https://cdn.openai.com/dall-e-2/demos/variations/modified/fl...
Something about this makes me nauseous. Perhaps is the fact that soon the market value for creatives is going to fall to a hair about zero for all but the most famous. We will be all the poorer for it when 95% of images you see are AI generated. There will be niches of course but in a few short years it'll be over for a huge swathe of creative professionals who are already struggling.
Some of the images also hit me with a creep factor, like the bears on the corgis in the art gallery, but that maybe only because I know it's AI generated.
Is there an 'explain it like I'm 15' for how this works? It seems like black magic. I've been a computer hobbyist since the late 1980's and this is the first time I cannot explain how a computer does what it does. Absolutely the most amazing thing I've ever seen, and I have zero clue how it works.
This is mind blowing. I was not expecting the sketch style images to actually look like sketches. Style transfer based sketches never look like sketches.
This and the current AI generated art scene makes it looks like that artwork is now a "solved" problem. See AI generated art on twitter etc.
There is a strong relation between the prompt and the generated images but just like GPT-3, it fails to fully understand what was being asked. If you take the prompt out of the equation and see the generated artwork on its own, its upto your interpretation just like any artwork.
Apologies for an open-ended question but: does anyone know if there is a term for something like Turing-completeness within AI, where a certain level of intelligence can simulate any other type of intelligence like our brains do?
For example, using DeMorgan's theorem, we can build any logic circuit out of all NAND or NOR gates:
https://www.electronics-tutorials.ws/boolean/demorgan.html
https://en.wikipedia.org/wiki/NAND_logic
https://en.wikipedia.org/wiki/NOR_logic
Dall-E 2's level of associative comprehension is so far beyond the old psychology bots in the console pretending to be people, that I can't help but wonder if it's reached a level where it can make any association.
For example, I went to an AI talk about 5 years ago where the guy said that any of a dozen algorithms like K-Nearest Neighbor, K-Means Clustering, Simulated Annealing, Neural Nets, Genetic Algorithms, etc can all be adapted to any use case. They just have different strengths and weaknesses. At that time, all that really mattered was how the data was prepared.
I guess fundamentally my question is, when will AGI start to become prevalent, rather than these special-purpose tools like GPT-3 and Dall-E 2? Personally I give it less than 10 years of actual work, maybe less. I just mean that to me, Dall-E 2 is already orders of magnitude more complex than what's required to run a basic automaton to free humans from labor. So how can we adapt these AI experiments to get real work done?
This reminds me of the holodeck in Star Trek. Someone could walk into the Holodeck and say “make a table in the center of the room. Make it look old.” It seemed amazing to me that the computer could make anything and customize it with voice. We are pretty close to star trek technology now in computer ability (ship’s computer, not Commander Data). I guess to really be like the holodeck it needs to be able to do 3d and be in real time but that seems a lot closer now. It will be cool when this could be in VR and we can say make an astronaut riding a horse, then we can jump on the back of the horse and ride to a secret moon base.
It's becoming clear that efficient work in the future will hinge upon one's ability to accurately describe what one wants. Unpacking that -- a large piece is the ability to understand all the possible "pitfalls" and "misunderstandings" that could happen on the way to a shared understanding.
While technical work will always have a place -- I think that much creative work will become more like the management of a team of highly-skilled, niche workers -- with all the frustrations, joys, and surprises that entails.
I would probably pay good money to have a OLED painting in my house that I can just tell what kind of painting to generate each day.
Imagine waking up and telling your (preferably locally hosted) voice assistant that today really feels like a Rembrandt day and the AI just generates new paintings for you.
I don’t want to dismiss this new model and achievements but we are getting to the point where I feel like what we saw in the open source versus close source systems we see in new ml models another one is forming for open and closed models. I think that larger and larger models will have disclaimers either restricting you from using it commercially (a great deal of academics and NVIDIA models are doing this. And OpenAI just puts it behind an API with the rules :
Curbing Misuse Our content policy does not allow users to generate violent, adult, or political content, among other categories. We won’t generate images if our filters identify text prompts and image uploads that may violate our policies. We also have automated and human monitoring systems to guard against misuse.
Very cool stuff. For me, the most interesting was the ability to take a piece of art and generate variations of it.
Have a favorite painter? Here's 10,000 new paintings like theirs.
I made a YouTube series last summer on the massive potential future of DALL-E and multimodal AI models.
Imagine not just DALL-E 2 but a single model which be trained on different kinds of media and generate music, images, video and more.
The series talks about:
- essential lessons for AI creatives of the future
- shares details on how to compete creatively in the future
- talks about how to make money through Multimodal AI
- make predictions about AI’s effects on society
- at a very basic level, discusses the ethics of multimodal AI and the philosophy of creativity itself
By my understanding, it's the most comprehensive set of videos on this topic.
The series is free to watch entirely on YouTube: GPT-X, DALL-E, and our Multimodal Future https://www.youtube.com/playlist?list=PLza3gaByGSXjUCtIuv2x9...
This is incredible work.
From the paper:
> Limitations > Although conditioning image generation on CLIP embeddings improves diversity, this choice does come with certain limitations. In particular, unCLIP [Dall-E 2] is worse at binding attributes to objects than a corresponding GLIDE model.
The binding problem is interesting. It appears that the way Dall-E 2 / CLIP embeds text leads to the concepts within the text being jumbled together. In their example "a red cube on top of a blue cube" becomes jumbled and the resulting images are essentially: "cubes, red, blue, on top". Opens a clear avenue for improvement.
I've been playing around with it today and have been super impressed with its ability to generate pretty artful digital paintings. Could have big implications for designers and artists if and when they allow you use custom palettes, etc.
Here's an example from my prompt ("a group of farmers picking lettuce in a field digital painting"): https://labs.openai.com/s/jb5pzIdTjS3AkMvmAlx69t7G
Am I the only one to think that the AI world is divided into 2 groups:
1. Deepmind, who solved go, protein folding, and that seems really onto something.
2. Everyone else, spending billions to build machines that draw astronauts on unicorns, and smartish bot toys.
Most of the conversation around this model seems to be about its direct uses.
This seems to me like a big step towards AGI; a key component of consciousness seems (in my opinion) to be the ability to take words and create a mental picture of what's being described. Is that the long term goal WRT researching a model like this?
"Any sufficiently advanced technology is indistinguishable from magic"
Is anyone looking into what it means when we can generate infinite amounts of human-like work without effort or cost?
> Curbing Misuse [...]
That's great, nowadays the big AI is controlled by mostly benevolent entities. How about when someone real nasty gets a hold of it? In a decade the models anyone can download will make today's GPT-3 etc look like pong right?
Recommender systems etc are already shaping society and culture with all kinds of unintended effects. What happens when mindless optimizing models start generating the content itself?
I'm genuinely curious to hear Sam Altman's (and/or the OpenAI team's) perspective on why these products need to be waitlisted. If it's a compute issue, why not build a queuing system? If it's something else (safety related? hype related?) I'd love to understand the thinking behind the decision. More often than not, I sign up for waitlists for things like this and either (1) never get in to the beta or (2) forget about it when I eventually do get in.
Regardless of how much cherry-picking there was, some of these pictures are just beautiful.
The correct response here from the artists point of view should be a widespread coming together against their art being used as training data for ML models. With a quickly spread new license on most major art submission sites that explicitly forbids AI algorithms from using their work, artists would effectively starve OpenAI and others from using their own works to put them out of a job.
The timing of the Dall-E 2 launch an hour ago seems to correspond with a recent piece of investigative journalism by Buzzfeed News about one of Sam Altman's other ventures, published 15 hours ago and discussed elsewhere actively on HN right now:
https://news.ycombinator.com/item?id=30931614
I point this out because while Dall-E 2 seems interesting (I'm out of my depth, so delegating to the conversation taking place here), the timing of its release as well as accompanying press blasts within the last hour from sites like TheVerge—verified via wayback machine queries and time-restricted googling—seems both noteworthy and worth a deeper conversation given what was just published about Worldcoin.
To be clear, it's worth asking if Dall-E 2 was published ahead of schedule without an actual product release (only a waitlist) to potentially move the spotlight away from Worldcoin.
At this point with WaveNet, GPT-3, Codex, DeepFakes and Dall-E 2, you cannot believe anything you see, hear, watch, read on the internet anymore as an AI can easily generate nearly anything that can be quickly believable by millions.
The internet's own proverb has never been more important to keep in mind. A dose of skepticism is a must.
To be honest the Girl with a Pearl Earing "variations" look a little bit like a crime against art. It's like the person who built this has no idea why the Girl with a Pearl Earing is good art. "Here's the Girl with a Pearl Earing " - "OK, well here's some girls with turbans"
Art is truth.
Sam's Twitter thread today was more impressive than the website.
https://twitter.com/sama/status/1511724264629678084?s=20&t=6...
Could somebody build this for SVG icons? I’d invest in it.
Related and kind of fun:
Sam Altman demonstrates Dall-E 2 using twitter suggestions - https://news.ycombinator.com/item?id=30933478 - April 2022 (3 comments)
If you're interested in generative models, Hugging Face is putting on an event around generative models right now called the HugGAN sprint, where they're giving away free access to compute to train models like this.
You can join it by following the steps in the guide here: https://github.com/huggingface/community-events/tree/main/hu...
There will also be talks from awesome folks at EleutherAI, Google, and Deepmind
Some more examples: https://twitter.com/sama/status/1511724264629678084
Impressive results no doubt, but I’m reserving judgment until beta access is available. These are probably the best images that it can generate, but what I’m most interested in is the average case.
They're using training set restriction and prompt engineering to control its output
> By removing the most explicit content from the training data, we minimized DALL·E 2’s exposure to these concepts
> We won’t generate images if our filters identify text prompts and image uploads that may violate our policies
The 'how to prevent superintelligences from eating us' crowd should be taking note: this may be how we regulate creatures larger than ourselves in the future
And even how we regulate the ethics of non-conscious group minds like big companies
Some initial video by Yannic Kilcher: https://www.youtube.com/watch?v=gGPv_SYVDC8
Cartoonists, say good-bye to your job.
This reminds me of a discussion I had with the high school band teacher in the 90s. I was telling him that one day computers would play music and you won't be able to tell the difference. He got mad at me and told me that a computer could never play as well as a human with feelings, who can feel the piece and interpret it.
I think we passed that point a while ago, but seeing this makes me think we aren't too far off from computers composing pieces that actually sound good too.
I'm curious, is this something feasible to train (and inference) on a consumer level machine, or this is something can only be done by institutes?
In the thread Sam Altman giving a demo of this [*] I find multiple people trying to query "solar panels" or "rabbit", are they some meme in the context of AI-generated arts?
Interesting, yes, but I went to the link, and browsed the 'generated artwork' and all if it was subjectively inferior to the original that it generated from. Every single piece. So I am not sure what the 'value' in it is, at this stage.
As far as the text driven, I would have to mess with some non pre-canned presentations to see how useful it was.
Yeah, I mean you're right that ultimately the proof is in the pudding.
But I do think we could have guessed that this sort of approach would be better (at least at a high level - I'm not claiming I could have predicted all the technical details!). The previous approaches were sort of the best that people could do without access to the training data and resources - you had a pretrained CLIP encoder that could tell you how well a text caption and an image matched, and you had a pretrained image generator (GAN, diffusion model, whatever), and it was just a matter of trying to force the generator to output something that CLIP thought looked like the caption. You'd basically do gradient ascent to make the image look more and more and more like the text prompt (all the while trying to balance the need to still look like a realistic image). Just from an algorithm aesthetics perspective, it was very much a duct tape and chicken wire approach.
The analogy I would give is if you gave a three-year-old some paints, and they made an image and showed it to you, and you had to say, "this looks like a little like a sunset" or "this looks a lot like a sunset". They would keep going back and adjusting their painting, and you'd keep giving feedback, and eventually you'd get something that looks like a sunset. But it'd be better, if you could manage it, to just teach the three-year-old how to paint, rather than have this brute force process.
Obviously the real challenge here is "well how do you teach a three-year-old how to paint?" - and I think you're right that that question still has a lot of alchemy to it.
I am disappointed to hear it wasn't released, but what disappointed me more is that people actually approve this decision. Seriously? We shouldn't teach people how to write because that can be abused, can be used to transfer malicious ideas. Sounds absurd? So does limiting people's access to AI tools.
So what does the future of human creativity look like when an AI can generate possibly infinite variations of an idea.
I tried to comment here previously, but I dont see it posted. It was about the meaning of 'open' and whether the question of suffering and freedom of the AIs was being taken into ethical consideration, not just the ability of humans to use them as tools for their own possibly paper-clippy purposes.
NFT world is mostly filled with modern art forms which are never seen. If Dall-E can make such images out of the box in seconds, then it looks like AIs can take over NFT world like storm. May be already its happening, and i just didn't know!
Is this bringing us closer to combining image and language understanding within one model?
My main question is - is this really 'open' meaningfully? And are concepts of kindness and freedom being applied to the minds inside the boxes? I dont know where the 'openai' brand is at on these things personally.
This is really cool, but before you may use it you must give out your name and a phone number. I was almost taken in by it, but OpenAI is and probably always will be invasive and overbearing. It's really a shame.
What confusing pricing[1]:
> Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words. This paragraph is 35 tokens.
Further down, in the FAQ[2]:
> For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.
> To learn more about how tokens work and estimate your usage…
> Experiment with our interactive Tokenizer tool.
And it goes on. When most questions in your FAQ are about understanding pricing—to the point you need to offer a specialised tool—perhaps consider a different model?
While we're being distracted by endless social media and meaningless news, AI technology is advancing at a mind blowing pace. I'd keep my eye on that ball instead of "the current thing."
What happens when they train this thing to make videos? We're about to be dealing with a flood of AI-generated visual/video content. We already have to deal with text bots everywhere... wow.
This is what magic looks like.
Great work.
Looking forward to when they start creating movies from scripts.
Is there a geometric model relative to this? EG: "corgi near the fireplace" but the output is a 3d model of the corgi and fireplace with shaders rather than an image.
Maybe this will be what finally puts an end to the whole art NFT shenanigans. A piece of art isn't so unique if there are infinite slight variations on the market.
This is extremely interesting. We’ve had some amazing AI models come out in the past few days. We’re getting closer and closer to AI becoming a facet of everyday life.
This is going to be mostly a rant on OpenAI's "safer than thou" approach to safety, but let me start with that I think this technology I think is really cool, amazing, powerful stuff. Dall-E (and Dall-E 2) is an incredible advance over GANs, and no doubt will have many positive applications. It's simply brilliant. I am someone who has been interested in and has followed the progress of ML generated images for nearly a decade. Almost unimaginable progress has been made in the last five years in this field.
Now the rant:
I think if OpenAI genuinely cared about the ethical consequences of the technology, they would realise that any algorithm they release will be replicated in implementation by other people within some short period of time (a year or two). At that point, the cat is out of the bag and there is nothing they can do to prevent abuse. So really all they are doing is delaying abuse, and in no way stopping it.
I think their strong "safety" stance has three functions:
1. Legal protection 2. PR 3. Keeping their researchers' consciences clear
I think number 3 is dangerous because researchers are put under the false belief that their technology can or will be made safe. This way they can continue to harness bright minds that no doubt have ethical leanings to create things that they otherwise wouldn't have.
I think OpenAI are trying to have the cake and eat it too. They are accelerating the development of potentially very destructive algorithms (and profiting from it in the process!), while trying to absolve themselves of the responsibility. Putting bandaids on a tumour is not going to matter in the long run. I'm not necessarily saying that these algorithms will be widely destructive, but they certainly have the potential to be.
The safety approach of OpenAI ultimately boils down to gatekeeping compute power. This is just gatekeeping via capital. Anyone with sufficient money can replicate their models easily and bypass every single one of their safety constraints. Basically they are only preventing poor bad actors, and only for a limited time at that.
These models cannot be made safe as long as they are replicable.
To produce scientific research requires making your results replicable.
Therefore, there is no ability to develop abusable technology in a safe way. As a researcher, you will have blood on your hands if things go wrong.
If you choose to continue research knowing this, that is your decision. But don't pretend that you can make the algorithms safer by sanitizing models.
One of my teachers once said “An art piece is never done”. So, I wonder what could that mean for the model to keep making improvements to the piece.
What jobs will be there in 5~10 years when we consider all the progress done with Dall-E, GPT-3, Codex/GitHub Copilot, Alpha* and so on?
I never actually found a way to use Dall-E 1, did they ever Open that up to people outside their building?
It's very nice, just the name of the parent org is wrong - it should be called ClosedAI.
Can this also be used the other way around to create Alternative texts for screen-reader users?
Do you think some of these techniques could be slightly modified, and applied to DNA sequences?
“Preventing Harmful Generations”? = Fail.
Caravaggio is probably chortling from wherever he is ..
"Computer, render Bella and Gigi Hadid playing tennis in bikinis"
This would be great for generating Minecraft levels from voice commands.
The big question to me is, "What does Dall-E like the most?"
Sam Altman took some user requests on Twitter: https://twitter.com/sama/status/1511724264629678084
One step closer to combining Scribblenuats with emoticons!
So I can't do Teddy Bears Riding a Horse?
Deep Learning plows through yet another wall.
perhaps in the not so distant future, we can simply feed a movie script to the program and out comes a feature film.
Wow - mindblowing and kinda scary really.
This and NFT come in hand in hand.
has openai tried their ideas on music ?
undefined
undefined
So two Indians and two Chinese authors. The new world is incredible.
gamechanger
Dall-E 2 seems to be incapable to catch the essence of the art. I'm not really surprised by it, I'd be surprised a lot if it could. But nevertheless: if you looked in the eye of a Girl With A Pearl Earring[1], you'd be forced to stop and to think what does she have on her mind right now. Or may be you had some other question in your mind, but it really stops people to think. But none of Dall-E interpretations have this quality. Works inspired by Girl With A Pearl Earring sometimes have at least part of that power, like Girl With a Babmoo Earring[2]. But none of Dall-E interpretations have such a power.
And this observation may lead to a great consequences for visual arts. I had a lot of joy of looking at different Dall-E interpretations to find what the flaw of the interpretation that forbids it to be a piece of art of an equal value to the original. It is a ready made tool to search for explanations of the Power of Art. It cannot say what detail make a picture to be an artwork, but it allow to see multiple data points, and to narrow the hypothesis space. My main conclusion is that the pearl earring have nothing to do with the power of art. It is something in the eye, and probably with the slightly opened mouth. (Somehow Dall-E pictured all interpretations with closed lips, so it seems to be an important thing, but I need more variation along this axis to be sure).
[1] https://en.wikipedia.org/wiki/Girl_with_a_Pearl_Earring [2] https://yourartshop-noldenh.com/awol-erizku-girl-with-the-pe...
I'm only part way through the paper, but what struck me as interesting so far is this:
In other text-to-image algorithms I'm familiar with (the ones you'll typically see passed around as colab notebooks that people post outputs from on Twitter), the basic idea is to encode the text, and then try to make an image that maximally matches that text encoding. But this maximization often leads to artifacts - if you ask for an image of a sunset, you'll often get multiple suns, because that's even more sunset-like. There's a lot of tricks and hacks to regularize the process so that it's not so aggressive, but it's always an uphill battle.
Here, they instead take the text embedding, use a trained model (what they call the 'prior') to predict the corresponding image embedding - this removes the dangerous maximization. Then, another trained model (the 'decoder') produces images from the predicted embedding.
This feels like a much more sensible approach, but one that is only really possible with access to the giant CLIP dataset and computational resources that OpenAI has.