NOTEApril 23, 2026

Local Models Just Caught Up. The Industries That Couldn't Use AI Yesterday Can Use It Today.

Henry NguyenFounder, Xiren5 min read

For most of the last two years, I've had a version of the same conversation with operators in regulated industries.

A law firm wants to automate document review. A medical practice wants to summarize patient intake notes. A financial services shop wants an internal Q&A bot trained on their compliance manual. Every one of these conversations stalls at the same point: the data can't leave the building.

The standard pushback I get on this is that the enterprise tiers from Anthropic and OpenAI already handle data protection. Zero data retention. SOC 2 Type II. HIPAA-eligible BAAs available. No training on customer inputs. All of that is technically true and all of it is well-documented in their compliance pages. For a CTO at a tech company, the documentation is sufficient.

For the operators I'm describing, the documentation is not the problem. Trust is. A law firm partner reading the OpenAI enterprise compliance page is not reassured by it. They read it, file it next to twenty other vendor compliance pages they've read, and still feel the same way they felt before reading it: my client's data is going to live, however briefly, on someone else's infrastructure, and if anything goes wrong I am the one who gets disbarred. The compliance page is real. The discomfort is also real. The discomfort wins.

This is before we get to the harder cases. Some clients write data-handling clauses into their engagement letters that prohibit any third-party AI processing, full stop, regardless of what that third party promises. Some regulators have not caught up to the cloud-AI distinction at all. Some firms operate in jurisdictions where the legal framework around AI processing of privileged information is genuinely unsettled. In all of these cases, the existence of an enterprise tier is irrelevant. The data does not leave the network, and the only question is whether the firm gets to use AI at all.

Until recently, the answer was usually no.

The standard answer for these firms has been "wait." Local models existed. They just weren't good enough to run a real workload. You'd get something usable for trivial tasks and an obvious quality drop for anything that mattered. So the firms waited, and the work stayed manual.

That changed three weeks ago. Google released Gemma 4 on April 2, 2026 (Google DeepMind, 2026). It's an open-weight model under Apache 2.0, which means a law firm can download it, run it on their own hardware, and never let a single token of client data touch a third-party server. That part isn't new. What's new is the quality.

The 31B dense model in the Gemma 4 family currently ranks #3 on the Arena AI open-model leaderboard, and the 26B MoE model ranks #6 while activating only 3.8B parameters per token (StartupHub.ai, 2026). Performance per parameter is the metric that matters here. The 26B MoE delivers reasoning quality that competes with models ten times larger, at roughly one-seventh the inference cost (StartupHub.ai, 2026). In practical terms, a workstation-class GPU running quantized weights can now host a model good enough to do the work that previously required a cloud API call.

This is the threshold that matters for regulated operators. Not "can the model run locally," which has been true for a year. The new threshold is "can the local model produce output good enough that the firm would actually use it instead of doing the task manually." For Gemma 4, the answer is yes for a meaningful set of tasks. Document summarization. Structured extraction. Internal Q&A against firm-specific knowledge. Drafting tasks where the firm's voice and templates are the standard.

What I'm telling clients in regulated verticals now:

The data-sensitivity excuse is gone. "We can't use AI because of compliance" has been a defensible posture for the last two years. It isn't anymore. The model can live inside your network. It can run against your data without any of that data leaving. The compliance reason is no longer a reason. It's a preference.

The hardware cost is real but smaller than expected. A workstation with a 24GB consumer GPU like an RTX 4090 can run the 31B Dense model quantized. That's a one-time hardware cost in the low five figures, not a recurring cloud bill. For a firm currently paying staff to do the work, the payback period is short.

The work has to be scoped to what local models do well. Document summarization, structured extraction from forms, drafting against templates, querying a knowledge base. The model is good at these. It's not as good as Claude or GPT-5 at long-horizon agentic work or extremely nuanced reasoning. Scope the deployment to where the model is strong.

The compliance work is still the hard part. The model running locally solves the data-residency problem. It does not solve the audit-trail problem, the access-control problem, the model-drift-monitoring problem, or the question of what happens when the model produces a wrong answer that affects a client. Those still require process work. AI doesn't fix broken processes. It scales them.

If you operate in a vertical where data sensitivity has been the reason you've sat out the AI shift, the calculation has moved. The technology caught up. What's left is the work of figuring out which of your workflows are actually ready to be automated, which is the question we already had to answer for everyone else.

The infrastructure is no longer the bottleneck. The process is.

References

Google DeepMind. (2026, April 2). Gemma 4: Byte for byte, the most capable open models. Google. https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/
StartupHub.ai. (2026). Google Gemma 4 review: How a 31B open model beats 400B rivals. https://www.startuphub.ai/ai-news/technology/2026/google-gemma-4-review-2026

Operations Map

Map your operations.

Twenty minutes, no deck, no proposal. Walk through what your team does, and find out what software could be doing instead.

Map your operations