Agent Safehouse – macOS-native sandboxing for local agents

agent-safehouse.dev - 705 poäng - 163 kommentarer - 78648 sekunder sedan

Kommentarer (67)

e1g - 76429 sekunder sedan
Creator here - didn't expect this to go public so soon. A few notes:
1. I built this because I like my agents to be local. Not in a container, not in a remote server, but running on my finely-tuned machine. This helps me run all agents on full-auto, in peace.
2. Yes, it's just a policy-generator for sandbox-exec. IMO, that's the best part about the project - no dependencies, no fancy tech, no virtualization. But I did put in many hours to identify the minimum required permissions for agents to continue working with auto-updates, keychain integration, and pasting images, etc. There are notes about my investigations into what each agent needs https://agent-safehouse.dev/docs/agent-investigations/ (AI-generated)
3. You don't even need the rest of the project and use just the Policy Builder to generate a single sandbox-exec policy you can put into your dotfiles https://agent-safehouse.dev/policy-builder.html
ptak_dev - 29401 sekunder sedan
The thing I keep coming back to with local agent sandboxing is that the threat model is actually two separate problems that get conflated.
Problem 1: the agent does something destructive by accident — rm -rf, hard git revert, writes to the wrong config. Filesystem sandboxing solves this well.
Problem 2: the agent does something destructive because it was prompt-injected via a file it read. Sandboxing doesn't help here — the agent already has your credentials in memory before it reads the malicious file.
The only real answer to problem 2 is either never give the agent credentials that can do real damage, or have a separate process auditing tool calls before they execute. Neither is fully solved yet.
Agent Safehouse is a clean solution to problem 1. That's genuinely useful and worth having even if problem 2 remains open.
paxys - 51216 sekunder sedan
Not sure I understand this. Agent CLIs already use sandbox-exec, and you can configure granular permissions. You are basically saying - give the agents access to everything, and configure permissions in this second sandbox-exec wrapper on top. But why use this over editing the CLI's settings file directly (e.g. https://code.claude.com/docs/en/sandboxing#configure-sandbox...)?
zmmmmm - 71442 sekunder sedan
This is great to see.
I honestly think that sandboxing is currently THE major challenge that needs to be solved for the tech to fully realise its potential. Yes the early adopters will YOLO it and run agents natively. It won't fly at all longer term or in regulated or more conservative corporate environments, let alone production systems where critical operations or data are in play.
The challenge is that we need a much more sophisticated version of sandboxing than anybody has made before. We can start with network, file system and execute permissions - but we need way more than that. For example, if you really need an agent to use a browser to test your application in a live environment, capture screenshots and debug them - you have to give it all kinds of permissions that go beyond what can be constrained with a traditional sandboxing model. If it has to interact with resources that cost money (say, create cloud resources) then you need an agent aware cloud cost / billing constraint.
Somehow all this needs to be pulled together into an actual cohesive approach that people can work with in a practical way.
simonw - 64017 sekunder sedan
The challenge I'm finding with sandboxes like this is evaluating them in comparison to each other.
This looks like a competent wrapper around sandbox-exec. I've seen a whole lot of similar wrappers emerging over the past few months.
What I really need is help figuring out which ones are trustworthy.
I think this needs to take the form of documentation combined with clearly explained and readable automated tests.
Most sandboxes - including sandbox-exec itself - are massively under-documented.
I am going to trust them I need both detailed documentation and proof that they work as advertised.
xyzzy_plugh - 77232 sekunder sedan
This is just a wrapper around sandbox-exec. It's nice that there are a ton of presets that have been thought out, since 90% of wielding sandbox-exec is correctly scoping it to whatever the inner environment requires (the other 90% is figuring out how sandbox-exec works).
I like that it's just a shell script.
I do wish that there was a simple way to sandbox programs with an overlay or copy-on-write semantics (or better yet bind mounts). I don't care if, in the process of doing some work, an LLM agent modifies .bashrc -- I only care if it modifies _my_ .bashrc
pash - 71717 sekunder sedan
Sandvault [0] (whose author is around here somewhere), is another approach that combines sandbox-exe with the grand daddy of system sandboxes, the Unix user system.
Basically, give an agent its own unprivileged user account (interacting with it via sudo, SSH, and shared directories), then add sandbox-exe on top for finer-grained control of access to system resources.
0. https://github.com/webcoyote/sandvault

mkagenius - 72444 sekunder sedan

A way to run claude code inside a apple container -

  $ container system start

  $ container run -d --name myubuntu ubuntu:latest sleep infinity

  $ container exec myubuntu bash -c "apt-get update -qq && apt-get install -y openssh-server"

  $ container exec myubuntu bash -c "
    apt-get install -y curl &&
    curl -fsSL https://deb.nodesource.com/setup_lts.x |
  bash - &&
    apt-get install -y nodejs
  "

  $ container exec myubuntu npm install -g @anthropic-ai/claude-code

  $ container exec myubuntu claude --version

varenc - 64520 sekunder sedan
fun fact about `sandbox-exec`, the macOS util this relies on: Apple officially deprecated it in macOS Sierra back in 2016!
Its manpage has been saying it's deprecated for a decade now, yet we're continuing to find great uses for it. And the 'App Sandbox' replacement doesn't work at all for use cases like this where end users define their own sandbox rules. Hope Apple sees this usage and stops any plans to actually deprecate sandbox-exec. I recall a bunch of macOS internal services also rely on it.
agent5ravi - 19067 sekunder sedan
Sandboxing is half the story. The other half is external blast radius: if your local agent can email/DM/pay using your personal accounts, the sandbox doesn't help much. What I want is a separate, revocable identity context per agent or per task: its own inbox/phone for verification, scoped credentials with expiry, and an audit log that survives delegation to sub-agents. We ran into this building Ravi: giving an agent a phone number is easy; keeping delegation traceable to the right principal is the hard bit.
Tadbitrusty - 14533 sekunder sedan
One thing I kept hitting when running agents in sandboxed environments — they lose access to reliable system time too. datetime.now() returns whatever the container thinks, which drifts. Built a small external endpoint for this (SpyderGoat) after an agent made decisions based on completely wrong temporal context. Sandboxing the environment is step one; giving the agent reliable ground truth: for things like time is step two.
davidcann - 66028 sekunder sedan
I made a native macOS app with a GUI for sandbox-exec, plus a network sandbox with per-domain filtering and secrets detection: https://multitui.com/
alpb - 54919 sekunder sedan
As I understand it, the problem nowadays doesn't seem to be so much that the agent is going to rm -rf / my host, it's more like it's going to connect to a production system that I'm authorized to on my machine or a database tool, and then it's going to run a potentially destructive command. There is a ton of value of running agents against production systems to troubleshoot things, but there are not enough guardrails to prevent destructive actions from the get-go. The solution seems to be specific to each system, and filesystem is just one aspect out of many.
SiteMgrAI - 35648 sekunder sedan
Sandboxing is going to be table stakes for any serious deployment of AI agents in regulated industries. In sectors like construction, healthcare, or finance, you cannot have an agent with unrestricted filesystem or network access making decisions that affect safety-critical documentation. The macOS sandbox approach is smart because it leverages the OS-level enforcement rather than relying on application-layer restrictions that an agent could potentially reason its way around. The real question is how you balance useful tool access with meaningful containment when the whole point of agents is autonomous action.
tl2do - 73488 sekunder sedan
Intriguing, but...
Around last summer (July–August 2025), I desperately needed a sandbox like this. I had multiple disasters with Claude Code and other early AI models. The worst was when Claude Code did a hard git revert to restore a single file, which wiped out ~1000 lines of development work across multiple files.
But now, as of March 2026, at least in my experience, agents have become more reliable. With proper guardrails in claude.md and built-in safety measures, I haven't had a major incident in about 3 months.
That said, layering multiple safeguards is always recommended—your software assets are your assets. I'd still recommend using something like this. But things are changing, bit by bit.
synparb - 73428 sekunder sedan
I’ve been playing around with https://nono.sh/ , which adds a proxy to the sandbox piece to keep credentials out of the agent’s scope. It’s a little worrisome that everyone is playing catch up on this front and many of the builtin solutions aren’t good.
devrimozcay - 27905 sekunder sedan
Interesting direction.
One thing we've been seeing with production AI agents is that the real risk isn't just filesystem access, but the chain of actions agents can take once they have tool access.
Even a simple log-reading capability can escalate if the agent starts triggering automated workflows or calling internal APIs.
We've been experimenting with incident-aware agents that detect abnormal behavior and automatically generate incident reports with suggested fixes.
Curious if you're thinking about integrating behavioral monitoring or anomaly detection on top of the sandbox layer.
kxrm - 19646 sekunder sedan
This is amazing, thanks for sharing this.
I use clippy with rust and the only thing I had to add was:
```
  (subpath "/Library/Developer/CommandLineTools")
```
carderne - 36427 sekunder sedan
How do agents tend to deal with getting blocked? Messing around with sandboxes, I've quite even seen them get blocked, assume something is wrong, and go _crazy_ trying to get around the block, never stopping to ask for user input. It might be good to add to the error message: "This is deliberate, don't try to get around it."
For those using pi, I've built something similar[1] that works on macOS+Linux, using sandbox-exec/bubblewrap. Only benefit over OP is that there's some UX for temporarilily/permanently bypassing blocks.
[1] https://github.com/carderne/pi-sandbox
garganzol - 77244 sekunder sedan
While we have `sandbox-exec` in macOS, we still don't have a proper Docker for macOS. Instead, the current Docker runs on macOS as a Linux VM which is useful but only as a Linux machine goes.
Having real macOS Docker would solve the problem this project solves, and 1001 other problems.
rishabhaiover - 19393 sekunder sedan
How do you get local sandboxing with a permission based model? I thought wasmtime was the answer!
andai - 22953 sekunder sedan
I was obstinate and refused to learn docker, so I realized I can just rent a $3 VPS. If it blows up the VPS I reset it!
Then I realized the only thing I care about on my local machine is "don't touch my files", and Unix users solved that in 1970. So I just run agents as "agent" user.
I think running it on a separate machine is nicer though, because it's even simpler and safer than that. (My solution still requires careful setup and regular overhead when you get permission issues. "It's on another laptop, and my stuff isn't" has neither of those problems.)
w10-1 - 48328 sekunder sedan
But... why not just run macOS in a VM?
If/since AI agents work continuously, it seems like running macOS in a VM (via the virtualization framework directly) is the most secure solution and requires a lot less verification than any sandboxing script. (Critical feature: no access to my keychain.)
AI agents are not at all like container deploys which come and go with sub-second speed, and need to be small enough that you can run many at a time. (If you're running local inference, that's the primary resource hog.)
I'm not too worried about multiple agents in the same vm stepping on each other. I give them different work-trees or directory trees; if they step over 1% of the time, it's not a risk to the bare-metal system.
Not sure if I'm missing something...
hsaliak - 57074 sekunder sedan
This is a very nice and clean implementation. Related to this - I've been exploring injecting landlock and seccomp profiles directly into the elf binary, so that applications that are backed by some LLM, but want to 'do the right thing' can lock themselves out. This ships a custom process loader (that reads the .sandbox section) and applies the policies, not unlike bubblewrap which uses namespaces). The loading can be pushed to a kernel module in the future.
https://github.com/hsaliak/sacre_bleu very rough around the edges, but it works. In the past there were apps that either behaved well, or had malicious intent, but with these LLM backed apps, you are going to see apps that want to behave well, but cannot guarantee it. We are going to see a lot of experimentation in this space until the UX settles!
rwky - 24826 sekunder sedan
Fore Linux firejail works well https://firejail.wordpress.com/
brutuscat - 35023 sekunder sedan
What do you think of sandbox-exec being marked as deprecated?
https://news.ycombinator.com/item?id=31973232
https://github.com/openai/codex/issues/215
croes - 13228 sekunder sedan
The real threat is the agents access to your accounts and services.
Why always the fixation on the hardware?
guimbuilds - 29000 sekunder sedan
Interesting, we're tackling a different layer of the same problem, snapshot before every run + one-click rollback instead of kernel sandboxing. Complementary approaches. Nice work.
abhisek - 52307 sekunder sedan
I think this is the right approach to building sandbox for agents ie. over existing OS native sandbox capabilities so that they are truly enforced.
However the challenge is, sandbox profiles (rules) are always workload specific. How do you define “least privilege” for a workload and then enforce it through the sandbox.
Which is why general sandboxes wont be useful or even feasible. The value is observing and probably auto-generating baseline policy for a given workload.
Wrong or overly relaxed policies would make sandbox ineffective against real threats it is expected to protect against.
matifali - 63170 sekunder sedan
I wonder why you believe that running agents locally is the best approach. For most people, having agents operate remotely is more effective because the agent can stay active without your local machine needing to remain powered on and connected to the internet 24/7.
jeff_antseed - 39668 sekunder sedan
the macOS-only constraint is the biggest blocker for us. most of our agents run on linux VMs and there's basically nothing equivalent -- you end up choosing between full docker isolation (heavy) or just... not sandboxing at all and hoping.
been watching microsandbox but its pretty early. landlock is the linux kernel primitive that could theoretically enable something like this but nobody's built the nice policy layer on top yet.
curious if anyone has a good solution for the "agent running on a remote linux server" case. the threat model is a bit different anyway (no iMessage/keychain to protect) but filesystem and network containment still matter a lot
srid - 66555 sekunder sedan
If you are using Nix, there's also https://github.com/srid/sandnix that works on Linux (landrun) and macOS (sandbox-exec).
sunir - 61791 sekunder sedan
Is clunker some new slang that's different than clanker? I'm asking for a friend of my friend Roku.
p.s. thanks for making this; timely as I am playing whackamole with sandboxing right now.
devonkelley - 64615 sekunder sedan
Sandboxing solves "prevent the agent from doing damage." The failure mode it doesn't catch is when the agent operates perfectly within its permissions and still produces garbage because the model degraded or the tool stopped returning useful results.
That's a 200 OK the whole way down. "Prevent bad actions" and "detect wrong-but-permitted actions" are completely different problems.
inoki - 61012 sekunder sedan
I'm also working on a cross-platform solution (sandbox-exec on macOS). What if Apple finally drops this after long deprecation?
Finbarr - 61949 sekunder sedan
Awesome to see a bash-only method of solving this problem. Also like that it alerts on attempts to read restricted stuff.
I built yolobox to solve this using docker/apple containers: https://github.com/finbarr/yolobox
datapolitical - 47817 sekunder sedan
This really is not going to be safe on something like Mac or Windows until it’s built into the OS.
But given how fast agents are moving, I would be shocked if such tools were not already being built
ashniu123 - 42874 sekunder sedan
How's this different from https://container-use.com?
cuber_messenger - 56140 sekunder sedan
It's the exact auth control I want. However, it seems it's not a safehouse for local agents, but a safe cage, IMHO. After all, it prevents damage they might cause.
ashishb - 65838 sekunder sedan
I built something similar for myself that works on both Linux and Mac OS
https://github.com/ashishb/amazing-sandbox
grun - 45254 sekunder sedan
similar project https://github.com/trailofbits/claude-code-devcontainer
wek - 59495 sekunder sedan
Do you have plans to go cross-platform and offer a solution for Windows?
gozucito - 76998 sekunder sedan
so this works the same as Claude Code /sandbox? The innovation being that it's harness-agnostic?
boxedemp - 56222 sekunder sedan
Fantastic! I had been using dockers but this might be better!
dbmikus - 72310 sekunder sedan
I like that it's all bash.
How does this compare with Codex's and Claude's built-in sandboxing?
sagarpatil - 40251 sekunder sedan
Looks good. I’ll give it a try.
treexs - 60009 sekunder sedan
wow it's interesting how noticeable sites built with claude maybe with the frotnend-design skill are now
vivid242 - 72880 sekunder sedan
Nice! I‘d be interesting in the things that went wrong during development. Which loopholes were discovered last, if any?
ai_fry_ur_brain - 48068 sekunder sedan
Docker...
cjbarber - 66316 sekunder sedan
See also various sandbox tools I and others (e.g. jpeeler) have collected: https://news.ycombinator.com/item?id=47102258
nemo44x - 69532 sekunder sedan
Supervisor agent frameworks are going to be a big industry soon. You simply can’t have agents executing commands without a trusted supervisory layer examining and certifying actions.
All the issues we get from AI today (hallucinations, goal shift, context decay, etc) get amplified unbelievably fast once you begin scaling agents out due to cascading. The risk being you go to bed and when you wake up your entire infrastructure is gone lol.
babbagegao - 17040 sekunder sedan
[dead]
babbagegao - 16957 sekunder sedan
[dead]
babbagegao - 17002 sekunder sedan
[dead]
oliver_dr - 21945 sekunder sedan
[dead]
octoclaw - 29935 sekunder sedan
[dead]
yowang - 32794 sekunder sedan
[dead]
rex_claw - 20944 sekunder sedan
[dead]
maciver - 21558 sekunder sedan
[dead]
Agent_Builder - 47105 sekunder sedan
[dead]
openclaw01 - 54860 sekunder sedan
[dead]
naomi_kynes - 75973 sekunder sedan
The "full-auto" framing is interesting. What happens when the agent hits something it can't resolve autonomously? Even sandboxed, there's a point where the agent needs to ask a question or get approval.
Most setups handle this awkwardly: fire a webhook, write to a log, hope the human is watching. The sandbox keeps the agent contained, but doesn't give it a clean "pause and ask" primitive. The agent either guesses (risky) or silently fails (frustrating).
Seems like there are two layers: the security boundary (sandbox-exec, containers, etc.) and the communication boundary (how does a contained agent reach the human?). This project nails the first. The second is still awkward for most setups.
aplomb1026 - 67724 sekunder sedan
[dead]
moehj - 72776 sekunder sedan
[dead]
bschmidt97979 - 14635 sekunder sedan
[dead]
poopiokaka - 59736 sekunder sedan
[dead]
gnanagurusrgs - 69257 sekunder sedan
This is the right problem to solve. At Arcade, we see the same gap — agents get shell access, API keys, and network by default. The permissions model is backwards.
sandbox-profiles is a solid primitive for local agents. The missing piece in production is the tool layer — even a sandboxed agent can still make dangerous API calls if the MCP tools it has access to aren't individually authed and scoped.
The real stack is: sandbox the runtime (what Agent Safehouse does) + scope the tools (what we do with JIT OAuth at the MCP layer). Neither alone is enough.
Nice work shipping this.
https://www.arcade.dev/blog/ai-agent-auth-challenges-develop...