cni.Remove() failure on stale iptables state blocked the retry
cni.Setup(), leaving bot containers without SNAT/MASQUERADE.
- Ignore cni.Remove() error so retry Setup always runs
- Add global MASQUERADE rule in entrypoints as belt-and-suspenders
Closes#161
- Fix DeleteContainer FAILED_PRECONDITION by cleaning up stopped task
entries before container deletion
- Fix CreateSnapshot leaving container in broken state: commit turns
the active snapshot read-only, so the full cycle (stop → commit →
prepare → delete → recreate → start) is now applied consistently
- Use context.WithoutCancel for atomic container replacement sequences
to prevent cancelled HTTP requests from corrupting container state
- Use dctx for DB operations (recordSnapshotVersion/insertEvent) to
avoid orphan snapshots in containerd without matching DB records
- Restart task + network after snapshot replacement, fixing Exec after
CreateVersion where the container had no running task
- Extract replaceContainerSnapshot helper to deduplicate the prepare →
delete → recreate → start pattern across three call sites
- Move snapshot list data fetching into Manager.ListBotSnapshotData to
encapsulate per-container locking; remove exported LockBot method
- Use UnixNano for snapshot names to avoid second-precision collisions
Containerd does not auto-prepend "docker.io/" to short Docker Hub names
like "memohai/mcp:latest", causing it to treat "memohai" as a registry
host and fail with EOF. Add NormalizeImageRef() to ensure all image
references are fully qualified before being passed to containerd.
The previous snapshotParentFromLayers manually decompressed layer
blobs to compute diffIDs, which could diverge from the IDs that
containerd's Unpack uses (e.g. gzip vs zstd). Use image.RootFS()
instead — the canonical source containerd itself relies on.