Skip to content
AI Primer
breaking

North Mini Code adds MLX, Unsloth GGUFs, and oMLX support

Cohere added MLX support, Unsloth GGUFs, oMLX work, and updated docs for North Mini Code two days after launch, with llama.cpp still under review. The broader runtime coverage makes the 30B coding model easier to run on local Mac, quantized, and self-hosted stacks.

3 min read
North Mini Code adds MLX, Unsloth GGUFs, and oMLX support
North Mini Code adds MLX, Unsloth GGUFs, and oMLX support

TL;DR

  • Two days after launch, Cohere said North Mini Code already had day-zero MLX support, with cohere's thread pointing to an Apple-side runtime path for the model.
  • According to cohere's follow-up thread, community members also shipped GGUF quants, oMLX support, and refreshed docs, while llama.cpp support was still under review.
  • Cohere's official launch post and model card position North Mini Code as a 30B total, 3B active MoE coding model with an Apache 2.0 license and a 256K context window.
  • The runtime story changed fast: the docs overview lists Cohere-hosted, Transformers, vLLM, SGLang, Docker, and AWS paths, while Unsloth's GGUF page adds quantized local options before upstream llama.cpp support is merged.

You can read Cohere's launch post, skim the model card, and then jump straight to the runtime churn: Unsloth already posted GGUFs, llama.cpp has an open cohere2-MoE PR, and oMLX is patching in Cohere2 MoE loading plus Melody tool-call parsing.

MLX and oMLX

Cohere used cohere's thread to highlight MLX support on day zero, which matters because North Mini Code launched as a sparse MoE model that was not yet a drop-in fit for every local stack.

The deeper Apple-side work shows up in the oMLX issue. That patch set adds a cohere2_moe compatibility module for mlx-lm, model discovery and pre-load patching for Cohere2 MoE checkpoints, Cohere Melody output parsing, and streamed tool-call delta handling. In other words, the work was not just about loading weights, it was also about making North's reasoning and tool-use format survive token streaming.

GGUFs and llama.cpp

Cohere's community support roundup says the fastest-moving request was llama.cpp support, but the same post also makes clear it was still under review rather than shipped.

That gap is why the early GGUF story is a little more nuanced than "GGUFs exist." Unsloth's GGUF page says the files already declare general.architecture = cohere2moe, but users need a llama.cpp build from PR #24260 until upstream support lands. The PR itself is open, cleanly mergeable, and adds cohere2-MoE architecture support, according to the GitHub pull request.

Docs and deployment paths

The official docs are already broader than the tweet thread makes them sound. The Start Here page describes North Mini Code as a 30B total, 3B active model for agentic software engineering, terminal tasks, and code generation, then routes users across Cohere-hosted, Transformers, vLLM, SGLang, Docker, and AWS deployment paths.

That makes the docs update a real product change, not cosmetic cleanup. Cohere's own launch materials framed the model around efficient self-hosting and sovereign deployment, and the documentation now gives that claim a concrete menu of runtimes instead of a single blessed path. The same roundup also points to Taskflow, a CLI task manager built on North Mini Code, as one of the first small projects already using the model in the wild.

Share on X