I guess we will have text-to-image startups in the next batch of YC.
I spun up an aws ubuntu ec2 with 2 Tesla M60. When I run python3 image_from_text.py --text='alien life' --seed=7
I get this error detokenizing image Traceback (most recent call last): File "/home/ubuntu/work/min-dalle/image_from_text.py", line 44, in <module> image = generate_image_from_text( File "/home/ubuntu/work/min-dalle/min_dalle/generate_image.py", line 74, in generate_image_from_text image = detokenize_torch(image_tokens) File "/home/ubuntu/work/min-dalle/min_dalle/min_dalle_torch.py", line 107, in detokenize_torch params = load_vqgan_torch_params(model_path) File "/home/ubuntu/work/min-dalle/min_dalle/load_params.py", line 11, in load_vqgan_torch_params params: Dict[str, numpy.ndarray] = serialization.msgpack_restore(f.read()) File "/usr/local/lib/python3.10/dist-packages/flax/serialization.py", line 350, in msgpack_restore state_dict = msgpack.unpackb( File "msgpack/_unpacker.pyx", line 201, in msgpack._cmsgpack.unpackb msgpack.exceptions.ExtraData: unpack(b) received extra data.
Almost worked on my 2021 Macbook Pro (M1 Pro) - https://github.com/kuprel/min-dalle/issues/1#issue-128676523...
What is wandb.ai, and Why does it keep asking for an API key?
It's not listed in the requirements
I've posted it as an issue
Does it just download pre-trained DALL-E Mini models and generate images using them? Because I can't seem to find any logic in that repo other than that. I'm not into that field, just curious if I'm missing something.
Anyone have some stats on inference time and RAM requirements? (on specific hardware)
NB: Needs a weights & biases account in order to download the models.
Surly wandb is not a bare essential?
Now we just need to containarise it (there are a few docker python nvidia images)
Good Job. What are the hardware requirements?
Interesting when testing with inputs like "Oscar Wilde photo" or "marilyn monroe photo" and comparing to a Google image search. After some iterations we can have quite similar images but the faces are always blurry.
What is the maximum resolution possible with this?
If it depends on the hardware, what would be the limit when one rents the biggest machine available in the cloud?
The google collab link works if you replace the computed path to flax_model.msgpack on line 10 in load_params.py with ‘/content/pretrained/vqgan/flax_model.msgpack’
Edit: actually it's easier to open a terminal and move /content/pretrained/vqgan to /content/min-dalle/pretrained/vqgan
TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float32, float16.
That dinosaur image has fantastic meme potential
People might also like this one: https://github.com/saharmor/dalle-playground Really easy to work with
Free idea: Same but for making short video clips, and then eventually producing entire movies.
Love this, before I only ever saw code that ran this model through jax. This seems to perform much better on my m1.
WTF is Wandb.ai? This seems like a sneaky way to get people to sign up for this wandb thingy.
Has anyone got this running on M1?
The results are amazingly poor. Try "biden plays chess against napoleon"
What is the license of the generated artwork?
Comment deleted
has anyone applied compression techniques to large models like dalle-2?
[dead]
undefined
Gamechanger
Clone before kill.
Cloning....
This stuff is so cool and it makes me happy that we're democratizing artistic ability. But I can't help but think RIP to all of the freelance artists out there. As these models become more mainstream and more advanced, that industry is going to be decimated.
If people just want to run text to image models locally by far the easiest way I've found on Windows is to install Visions of Chaos.
It was originally a fractal image generation app but it's expanded over time and now has a fairly foolproof installer for all the models you're likely to have heard of (those that have been released anyway).
https://softology.pro/tutorials/tensorflow/tensorflow.htm