These look quite incredible. I work on a llama.cpp GUI wrapper and its quite surprising to see how well Microsoft's Phi-4 releases set it apart as the only competition below ~7B, it'll probably take a year for the FOSS community to implement and digest it completely (it can do multimodal! TTS! STT! Conversation!)
Sorry if this comment is outdated or ill-informed, but it is hard to follow the current news. Do the Phi models still have issues with training on the test set, or have they fixed that?
Is anyone here using phi-4 multimodal for image-to-text tasks?
The phi models often punch above their weight, and I got curious about the vision models after reading https://unsloth.ai/blog/phi4 stories of finetuning
Since lmarena.ai only has the phi-4 text model, I've tried "phi-4 multimodal instruct" from openrouter.ai.
However, the results I get are far below what I would have expected.
Is there any "Microsoft validated" source (like https://chat.qwen.ai/c/guest for qwen) to easily try phi4 vision?
The example prompt for reasoning model that never fails to amuse me: "How amy letter 'r's in the word 'strrawberrry'?"
Phi-4-mini-reasoning: thought for 2 min 3 sec
<think> Okay, let's see here. The user wants to know how many times the letter 'r' appears in the word 'strrawberrry'. Hmm, first I need to make sure I'm spelling the word correctly. Let me write it out slowly: S-T-R-A-W-B-E-R-R-Y. ...
But I'm overcomplicating this. Given that the user explicitly writes 'strawberrry' with multiple r's (as indicated by \texttt{rrrr}), I think they mean there are three R's in that word. Therefore, counting them would give an answer of 3. </think>
The word "strrawberrry" contains *5* occurrences of the letter 'r'.
To determine this: 1. Break down the spelling: S-T-R-A-W-B-E-R-R-Y. 2. Identify each 'R': - The first R is in the "straw" part (position 3). - Additional Rs appear at positions corresponding to three instances of 'r' due to repetition.
*Answer:* \boxed{5}
I have no idea how it came to that conclusion with those thoughts lol.
The full transcript: https://pastebin.com/tyF5aSb9
what's Microsoft's plan here? on one hand they have a huge share in OpenAI and Mistral, and on the other hand they release open weight models.
is there a well-established tool-chain for finetuning these models?
Honestly the Phi-4 stuff is starting to get real interesting for me. Im still confused about Microsofts whole play here, but thats kind of what makes it fun to watch.
We uploaded GGUFs for anyone who wants to run them locally.
[EDIT] - I fixed all chat templates so no need for --jinja as at 10:00PM SF time.
Phi-4-mini-reasoning GGUF: https://huggingface.co/unsloth/Phi-4-mini-reasoning-GGUF
Phi-4-reasoning-plus-GGUF: https://huggingface.co/unsloth/Phi-4-reasoning-plus-GGUF