Guide
How to control your AI coding agent from your phone
Here's the situation an AI coding agent puts you in that nothing before it did: you kick off a
real task — "refactor the auth module, run the tests, open a PR" — and then there's nothing for
you to do for ten minutes except wait. So you walk away. You make coffee, you take the dog out,
you sit down to dinner. And somewhere in those ten minutes the agent finishes, or worse, stops
and asks "can I run git push?" — and just sits there, blocked, until you're back at
the keyboard.
The fix is to take the agent with you. Not the laptop — your phone. If you can watch the terminal stream live on your phone, type a follow-up, and tap "allow" on a prompt while you're nowhere near your desk, the dead time disappears. This guide covers what "remote control" actually has to do, the DIY ways to get there, where they fall short, and how to do it without putting your code on the open internet.
Why you'd want to leave your desk mid-task
Agentic coding has a rhythm: a burst of describing what you want, then a long stretch where the agent works and you don't. That stretch is the whole point — it's the time the tool buys you back. But it's only yours if you can actually leave. If "leaving" means the agent silently blocks on the first prompt and you lose twenty minutes, you end up tethered to the desk anyway, watching a terminal do nothing. That's the exact babysitting problem in a different costume.
Phone control changes the math. The agent's long task overlaps with your life instead of pausing it. You check the stream from the couch, you fire the next task from the kitchen, you approve the push from the bus. The work continues; you just stop being the bottleneck.
What "remote control" actually has to do
It's tempting to think a screenshot or a "done" notification is enough. It isn't. A real remote setup has four jobs, and most DIY approaches only do one or two:
- Live output, not a snapshot. You need the terminal streaming in real time — tokens as they arrive, test results as they land — not a status ping after the fact.
- Input, not just viewing. The point is to drive: type a follow-up, correct course, start a new task. A read-only mirror leaves you stuck the moment the agent needs a nudge.
- Approvals. The most common reason an agent blocks is a yes/no — "run this command?", "overwrite this file?". You need to answer those from your phone, or you haven't actually freed yourself.
- It survives your screen locking. The agent lives on the desktop and keeps running whether or not your phone is awake. Reconnecting should drop you back into the live session, not a dead one.
The DIY options, and where they break
You can absolutely rig this up yourself. People do. Each path gets you part of the way:
- SSH from a phone client (Termius, Blink) into tmux. This is the purist's
answer, and it genuinely works: attach to the
tmuxsession your agent runs in and you've got live output and input. The cost is setup — a reachable host or a tunnel (Tailscale, ngrok), keys on your phone, and a cramped terminal keyboard for typing prompts. It's powerful and fiddly, which is fine for one machine and miserable to keep working across several. (More on the background-session side of this in running Claude Code in the background.) - VNC / remote desktop. Mirrors your whole screen, so you see everything — including how badly a 27-inch desktop maps onto a phone. It's heavy, laggy on mobile data, and streams pixels instead of text, so you can't get a clean notification out of it.
- A web terminal (ttyd, gotty, wetty). Wraps a shell in a browser tab, which is closer to what you want — until you remember it's serving a live shell over HTTP. Now you're responsible for TLS, auth, and not exposing a root terminal to the internet. Get it slightly wrong and you've published a backdoor.
- Push notifications alone (Pushover, ntfy). Great for "it finished," useless for "now do this." You find out the agent needs you, then still have to get to a real keyboard to respond. Half a solution.
The pattern across all of these: the ones that let you actually type and approve are the ones that put a live shell on the network, and securing that is the hard part you inherit.
Doing it securely
This is the part worth slowing down on, because the failure mode is severe. A remotely reachable terminal is a remotely reachable terminal — if it's exposed and weakly authenticated, it's an open door to your whole machine. Three rules keep you safe:
- Don't open a public port. Prefer a desktop-initiated connection that dials out to a relay, or a private network (Tailscale/WireGuard), over poking a hole in your firewall and hoping nobody scans it.
- Encrypt end to end. Whatever sits in the middle — a relay, a tunnel provider — should only ever see ciphertext. Your prompts and your code shouldn't be readable by the hop that forwards them.
- Pair explicitly, and keep approvals on the device. A short pairing step (scan a QR, paste a one-time key) beats a long-lived password. And the agent should still ask before it does anything irreversible — see pre-approval policies for how to pre-bless the safe stuff so only the genuinely risky calls ever reach your phone.
Where Backgrind fits
Backgrind's Live mode is this guide, built in. The agent runs on your desktop in Backgrind's always-on-top overlay exactly as it normally would; Live mode mirrors that session to your phone or iPad through the browser. You watch the terminal stream in real time, send commands, switch between sessions, and answer "needs you" prompts — it's drive, not a read-only mirror.
The security model is the one above, by default. Pairing is desktop-initiated with a one-time code (scan a QR or paste it), the link is end-to-end encrypted with a per-pairing key, and the relay only ever forwards ciphertext — your code never sits readable on our server. There's no separate web account to create and nothing extra to trust.
Pair Live mode with several agents running at once and the whole loop comes together: tasks running in parallel on your machine, a chime when one needs a decision, and your phone as the remote that answers it. See it in the live demo.