Run a Private AI Chatbot in Chrome - No API Key, No Cloud, No Cost

Most people assume AI assistants require a subscription, an internet connection, and a willingness to let your conversations pass through someone else’s server. What if none of that were true?

You might think that setting up a local LLM AI is beyond your capabilities, or maybe just takes too much space.

All you need is to open a special HTML file to use it.

As of Chrome 138+, there’s a fully capable large language model sitting dormant inside your browser right now — and with a little tinkering, you can build a slick multi-chat interface to talk to it. That’s exactly what the Local Chrome AI project does.

What Is This, Exactly?

Google has been quietly shipping Gemini Nano — a real, capable LLM — inside Chrome as part of its on-device AI initiative. It’s not a toy. It can answer questions, write code, summarize content, describe images, and hold a conversation. The catch: Google doesn’t exactly advertise how to turn it on, and there’s no built-in chat interface. That’s the gap this project fills.

The result is a single HTML file you download and open locally. No server. No npm install. No signup. You just drop it in Chrome and start chatting.

Setting It Up

Before the HTML file does anything, you need to unlock Gemini Nano in Chrome’s experimental flags. Paste these URLs into your address bar one at a time:

chrome://flags/#prompt-api-for-gemini-nano → Enabled
chrome://flags/#prompt-api-for-gemini-nano-multimodal-input → Enabled (for image support)
chrome://flags/#optimization-guide-on-device-model → Enabled BypassPerfRequirement

Then restart Chrome via chrome://restart. After that, go to chrome://components, find Optimization Guide On Device Model, and hit Check for update. Chrome will pull down the model — it’s 2–4 GB, so give it a few minutes. When the status reads “Component already up to date,” you’re ready.

Then just download one of the HTML files from the repo and open it in Chrome. That’s it.

What the Interface Does

The full-featured file (local_AI_with_sidebar.html) looks and feels like a real chat app. There’s a sidebar for managing multiple conversations, each one saved independently in your browser’s localStorage. You can create new threads, delete individual ones, or wipe everything with a single button. Conversations get auto-titled based on your first message.

The image-enabled version (local_AI_Sidebar_image_Uploads.html) adds a camera icon next to the input field. Attach a photo, ask a question about it, and Gemini Nano analyzes it entirely on your device. The image shows as a thumbnail in the chat bubble; click it for a fullscreen lightbox view. A settings panel lets you control whether images get saved to localStorage (they’re big you may not want them persisting) and how many messages back the model sees as context. Because of that, the default settings are to save space by not storing uploaded images in Chrome’s localStorage, but you can turn on image saving if you want to. Just be sure to turn this setting on before you upload an image. Anything uploaded previous to the setting change won’t be saved.

There’s also a minimal single-window version (minimi.html) if you just want to kick the tires without the sidebar.

The Clever Engineering Under the Hood

Local LLMs are stateless — every prompt starts from zero. The project solves this with two techniques worth knowing about.

The first is a simple context injection: before every message, the last N exchanges from localStorage get prepended to the prompt, giving the model a rolling memory of the conversation. You can tune how many messages back it looks in the settings panel.

The second is recursive summarization. When a chat exceeds a certain length, the script quietly asks the model to summarize everything so far, then replaces the growing history with that single compressed summary. This prevents what the README aptly calls the “Quadratic Slowdown” — where every new message has to carry an ever-longer context, making responses progressively slower. The summarization happens silently; from your perspective the conversation just continues normally.

Privacy Is the Real Feature

Every word of every conversation stays on your hard drive. The model runs on your GPU. Nothing is transmitted anywhere. You can turn off your Wi-Fi and it keeps working. Really, try it!

Cloud AI assistants are excellent, but they’re not private by design. This is.

The tradeoff is capability. Gemini Nano is a small model optimized to run locally, not a frontier model running on a data center. It’s fast, it’s private, and it handles everyday tasks well, but it won’t match GPT-4 or Claude on complex reasoning. For a lot of use cases, that’s a perfectly fine deal.

Who Should Try This

If you’re the type of person who reads about experimental Chrome flags for fun, you’ll have this running in ten minutes and enjoy every second of it. If you have colleagues who handle sensitive documents and keep asking “is this AI thing private?”, this is a concrete answer you can hand them as an HTML file.

The project is open source under MIT and lives at github.com/morrowsend/local_chome_ai. Fork it, modify the system prompt to give the AI a different personality, build your own tools on top of the same LanguageModel API — the foundation is solid and the code is readable.

Local AI has been technically possible for a while now. This project makes it actually convenient.

Requires Chrome 138+. The on-device model download is 2–4 GB. Performance depends on your GPU — on a modern machine responses are near-instant; on integrated graphics expect a few seconds per reply.

Run a Private AI Chatbot in Chrome – No API Key, No Cloud, No Cost