Releasing v1 of GPT-JT, fork of GPT-6B fine-tuned on 3.53B tokens

  • If anyone's wondering, this model is not the GPT-3-killer. The model will be mostly only useful for classification, and not general text generation. It's not an apple to apples comparison, since the other models were not fine-tuned on the same dataset.

    Interesting that they didn't compare the model to Flan-T5 or TK-Instruct, both of which were fine-tuned on similar data and should display comparable results using the same amount of parameters. See the leaderboard here: https://huggingface.co/spaces/ought/raft-leaderboard

    Nonetheless, props for open sourcing the model and attempting to develop new techniques for decentralized training of large scale transformers, this is no easy feat.

  • Text summarization examples [1] are fun:

    > Input: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as 'Jumbo'.

    > Output: Not as Advertised

    [1] https://huggingface.co/spaces/togethercomputer/GPT-JT

  • What does this mean? Can I download the trained model and run it on my machines? Assuming I won't need a supercomputer to run it.

  • What's the best chatty model to run locally on a RTX 3090? This seems cool but it's a bit hard to get it to talk.

  • Has anyone tried running it on M1 MBP? How is the performance?