
Microsoft has made two more powerful and efficient versions of DeepSeek’s distilled local AI models to Copilot+ PCs,
“At Microsoft, we believe the future of AI is happening now — spanning from the cloud to the edge,” Microsoft Distinguished Engineers Vivek Pradeep and Logan Iyer explain. “Our vision is bold: to build Windows as the ultimate platform for AI innovation, where intelligence isn’t just in the cloud but seamlessly woven throughout the system, silicon and hardware at the edge.”
Windows Intelligence In Your Inbox
Sign up for our new free newsletter to get three time-saving tips each Friday — and get free copies of Paul Thurrott’s Windows 11 and Windows 10 Field Guides (normally $9.99) as a special welcome gift!
“*” indicates required fields
As you will recall, DeepSeek shocked the world when the efficient new Chinese AI models suddenly vaulted into the spotlight back in January. But Microsoft, which you’d think might have more to lose than most companies if DeepSeek were successful, was in some ways equally surprising in immediately embracing it. Within days, it made DeepSeek’s R1 reasoning model available to developers via Azure Foundry and GitHub. And then it made smaller, distilled versions of the models available for local use on Copilot+ PCs.
This week’s announcement is an expansion of that latter advance.
In late January, Microsoft made distilled, NPU-optimized versions of the DeepSeek R1 models available for use on Snapdragon X-powered Copilot+ PCs. Then, just days later, it released newer versions of those models that were further optimized with the open ONNX framework for better performance and reliability. And at some point, those models were made available on x86-based Copilot+ PCs using the latest-generation AMD and Intel silicon.
Keeping track of these things is … difficult. But the initial distilled, local DeepSeek-based model Microsoft provided was called DeepSeek-R1-Distill-Qwen-1.5B, where the “B” part of that name indicates the number of parameters or weighted variables, in billions that the model version was trained on. Models with more parameters generally offer better reliability (meaning, they are better at predicting and give more accurate answers).
This week, Microsoft is “taking the next step forward” with two new versions of the local DeepSeek models, DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Qwen-14B. It says running models of this caliber locally on an NPU is a “a significant milestone in the democratization and accessibility of artificial intelligence” that allows researchers, developers and enthusiasts to leverage the substantial power and functionalities of large-scale machine learning models directly from their Copilot+ PCs.”
Like the original 1.5B version, these new models are reasoning models that can show you “how” they arrive at answers, improving response quality both literally and as perceived by the user.
Of course, the “user” in this case is a developer for the most part, though enthusiasts may, of course, be interested in playing with these models as well. To get started, you’ll need a Copilot+ PC, of course–Snapdragon X-based for now, with AMD Zen 5 and Intel Core Ultra series 2 support coming soon–plus Visual Studio Code and the AI Toolkit extension. Then, you can download one or more of these (and many other) models using AI Toolkit: Select Catalog > Models and then search/filter (for local run w/NPU) per the screenshot above. Once the model is downloaded, you can interact with it using a chat interface in the Playground (My Models > Local models > Model name > Load in Playground).
This all occurs much more slowly locally than it does in the cloud, at least using the 14B version I’m testing. But it’s interesting to watch the AI “think” and then produce answers. I need to play around with this more, but it’s basically a chatbot but slower. And you can enable a “Web” switch to include web search results in answers if you want a hybrid experience.
“Copilot+ PCs pair efficient compute with the near infinite compute Microsoft has to offer via its Azure services,” Pradeep and Iyer write. “With reasoning able to span the cloud and the edge, running in sustained loops on the PC and invoking the much larger brains in the cloud as needed — we are on to a new paradigm of continuous compute creating value for our customers. The future of AI compute just got brighter! We can’t wait to see the new innovations from our developer community taking advantage of these rich capabilities.”