Commit Graph

3 Commits

Author SHA1 Message Date
BBQ abbb14c59f fix(containerd): prevent silent network failures from leaving containers unreachable (#202)
* fix(containerd): prevent silent network failures from leaving containers unreachable

Container network setup failures were silently swallowed at multiple
points in the call chain, leaving containers in a "running but
unreachable" ghost state. This patch closes every silent-failure path:

- setupCNINetwork: return error when CNI yields no usable IP
- Manager.Start: roll back container when IP is empty instead of
  returning success
- ensureContainerAndTask: extract setupNetworkOrFail with 1 retry,
  propagate error to callers
- ReconcileContainers: stop reporting "healthy" when network setup fails
- recoverContainerIP: retry up to 2 times with backoff for transient
  CNI failures (IPAM lock contention, etc.)
- gRPC Pool: evict connections stuck in Connecting state for >30s

* fix(containerd): clean stale cni0 bridge on startup to prevent MAC error

After a Docker container restart, the cni0 bridge interface can linger
with a zeroed MAC (00:00:00:00:00:00) and DOWN state. The CNI bridge
plugin then fails with "could not set bridge's mac: invalid argument",
making all MCP containers unreachable.

Two-layer fix:
- Entrypoint: delete cni0 and flush IPAM state before starting containerd
- Go: detect bridge MAC errors in setupCNINetwork and auto-delete cni0
  before retrying, as defense-in-depth for runtime recovery

* fix(containerd): use exec.CommandContext to satisfy noctx linter
2026-03-07 17:50:01 +08:00
BBQ 7730096696 fix(containerd): restore CNI MASQUERADE after container restart (#167)
cni.Remove() failure on stale iptables state blocked the retry
cni.Setup(), leaving bot containers without SNAT/MASQUERADE.

- Ignore cni.Remove() error so retry Setup always runs
- Add global MASQUERADE rule in entrypoints as belt-and-suspenders

Closes #161
2026-03-03 16:00:46 +08:00
BBQ 04bce702b7 feat(devenv): MCP dev hot-reload with image-based approach (#145)
Add mcp-build.sh that compiles the MCP binary and packages it as an
OCI image layer on top of the base rootfs, imported directly into
containerd. Air triggers rebuild on code changes, cleaning stale
containers automatically.

Consolidate dev-only files (Dockerfiles, entrypoint, config, build
script) into devenv/ to separate dev tooling from production artifacts.
Skip image pull when already imported to speed up dev startup.
2026-03-02 14:59:48 +08:00