Hacker News

Releasing v1 of GPT-JT, fork of GPT-6B fine-tuned on 3.53B tokens

by b_mc2on 11/30/2022, 2:21:40 AM with 5 comments

by ipsum2on 11/30/2022, 4:03:25 AM
If anyone's wondering, this model is not the GPT-3-killer. The model will be mostly only useful for classification, and not general text generation. It's not an apple to apples comparison, since the other models were not fine-tuned on the same dataset.
Interesting that they didn't compare the model to Flan-T5 or TK-Instruct, both of which were fine-tuned on similar data and should display comparable results using the same amount of parameters. See the leaderboard here: https://huggingface.co/spaces/ought/raft-leaderboard
Nonetheless, props for open sourcing the model and attempting to develop new techniques for decentralized training of large scale transformers, this is no easy feat.
by selcukaon 11/30/2022, 4:33:51 AM
Text summarization examples [1] are fun:
> Input: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as 'Jumbo'.
> Output: Not as Advertised
[1] https://huggingface.co/spaces/togethercomputer/GPT-JT
by haolezon 11/30/2022, 3:48:51 AM
What does this mean? Can I download the trained model and run it on my machines? Assuming I won't need a supercomputer to run it.
by Tepixon 11/30/2022, 1:40:19 PM
What's the best chatty model to run locally on a RTX 3090? This seems cool but it's a bit hard to get it to talk.
by malsheon 11/30/2022, 3:59:55 PM
Has anyone tried running it on M1 MBP? How is the performance?