Remove the manifest.json dependency for memory file tracking. Instead,
build an index by scanning daily memory files on demand. This eliminates
a class of bugs where the manifest could drift out of sync with actual
files, and simplifies the code by removing Manifest/ManifestEntry types
and all read/write/path helpers.
Made-with: Cursor
When a container is deleted but its snapshot survives (dev image rebuild,
containerd metadata loss, manual ctr deletion), the reconciliation path
previously created a fresh container and unconditionally destroyed the
old snapshot via prepareSnapshot, causing complete data loss.
Manager.Start now detects orphaned snapshots before EnsureBot runs,
exports /data to a backup archive, and restores it into the new
container's snapshot before the task starts.
Use rune-aware truncation for user-facing text and log previews so multibyte content is not corrupted in memory context, Telegram messages, or diagnostics.
* fix(containerd): prevent silent network failures from leaving containers unreachable (#202)
* fix(containerd): prevent silent network failures from leaving containers unreachable
Container network setup failures were silently swallowed at multiple
points in the call chain, leaving containers in a "running but
unreachable" ghost state. This patch closes every silent-failure path:
- setupCNINetwork: return error when CNI yields no usable IP
- Manager.Start: roll back container when IP is empty instead of
returning success
- ensureContainerAndTask: extract setupNetworkOrFail with 1 retry,
propagate error to callers
- ReconcileContainers: stop reporting "healthy" when network setup fails
- recoverContainerIP: retry up to 2 times with backoff for transient
CNI failures (IPAM lock contention, etc.)
- gRPC Pool: evict connections stuck in Connecting state for >30s
* fix(containerd): clean stale cni0 bridge on startup to prevent MAC error
After a Docker container restart, the cni0 bridge interface can linger
with a zeroed MAC (00:00:00:00:00:00) and DOWN state. The CNI bridge
plugin then fails with "could not set bridge's mac: invalid argument",
making all MCP containers unreachable.
Two-layer fix:
- Entrypoint: delete cni0 and flush IPAM state before starting containerd
- Go: detect bridge MAC errors in setupCNINetwork and auto-delete cni0
before retrying, as defense-in-depth for runtime recovery
* fix(containerd): use exec.CommandContext to satisfy noctx linter
* fix(mcp): propagate network errors from replaceContainerSnapshot
Network setup failure after snapshot replace (rollback/commit) was
silently swallowed — the container would start but remain unreachable
via gRPC. Return the error so callers (CreateSnapshot, RollbackVersion,
etc.) surface the failure instead of reporting success.
* feat(container): add explicit data workflows and snapshot rollback
Make container upgrades and recreation data-safe by adding explicit preserve, export, import, restore, and rollback flows across the backend, SDK, and web UI.
* fix(container): resolve go lint issues
Fix formatting and lint violations introduced by the container data workflow changes so the Go CI lint job passes cleanly.
jsdom relies on Node.js-specific APIs that Bun cannot properly resolve
when running a bundled artifact. This caused an EISDIR error in Docker
containers (Bun tried to read the jsdom directory as a file).
Replace jsdom with linkedom, a lightweight pure-JS DOM implementation
that is fully compatible with Bun and @mozilla/readability. Also remove
the --external jsdom build flag since linkedom bundles cleanly.
Closes#181
Replace the host bind-mount + containerd exec approach with a per-bot
in-container gRPC server (ContainerService, port 9090). All file I/O,
exec, and MCP stdio sessions now go through gRPC instead of running
shell commands or reading host-mounted directories.
Architecture changes:
- cmd/mcp: rewritten as a gRPC server (ContainerService) with full
file and exec API (ReadFile, WriteFile, ListDir, ReadRaw, WriteRaw,
Exec, Stat, Mkdir, Rename, DeleteFile)
- internal/mcp/mcpcontainer: protobuf definitions and generated stubs
- internal/mcp/mcpclient: gRPC client wrapper with connection pool
(Pool) and Provider interface for dependency injection
- mcp.Manager: add per-bot IP cache, gRPC connection pool, and
SetContainerIP/MCPClient methods; remove DataDir/Exec helpers
- containerd.Service: remove ExecTask/ExecTaskStreaming; network setup
now returns NetworkResult{IP} for pool routing
- internal/fs/service.go: deleted (replaced by mcpclient)
- handlers/fs.go: deleted; MCP stdio session logic moved to mcp_stdio.go
- container provider Executor: all tools (read/write/list/edit/exec)
now call gRPC client instead of running shell via exec
- storefs, containerfs, media, skills, memory: all I/O ported to
mcpclient.Provider
Database:
- migration 0022: drop host_path column from containers table
One-time data migration:
- migrateBindMountData: on first Start() after upgrade, copies old
bind-mount data into the container via gRPC, then renames src dir
to prevent re-migration; runs in background goroutine
Bug fixes:
- mcp_stdio: callRaw now returns full JSON-RPC envelope
{"jsonrpc","id","result"|"error"} matching protocol spec;
explicit "initialize" call now advances session init state to
prevent duplicate handshake on next non-initialize call
- mcpclient Pool: properly evict stale gRPC connection after snapshot
replace (container process recreated); use SetContainerIP instead
of direct map write so IP changes always evict pool entry
- migrateBindMountData: walkErr on directories now counted as failure
so partially-walked trees don't get incorrectly marked as migrated
- cmd/mcp/Dockerfile: removed dead file (docker/Dockerfile.mcp is the
canonical production build)
Tests:
- provider_test.go: restored with bufconn in-process gRPC mock
(fakeContainerService + staticProvider), 14 cases covering all 5
tools plus edge cases
- mcp_session_test.go: new, covers JSON-RPC envelope, init state
machine, pending cleanup on cancel/close, readLoop cancel
- storefs/service_test.go: restored (pure function roundtrip tests)
Split long AI responses into multiple platform messages during streaming
instead of truncating them. The manager counts accumulated delta runes
and opens a new stream when approaching the platform's TextChunkLimit.
Uses a soft/hard limit strategy that prefers splitting at sentence ends
or line breaks over cutting mid-sentence.
- Add pushDelta with soft (75%) / hard (100%) limit and natural break
point detection
- Add splitStream, pushFinalAfterSplit, pushFinalWithChunking helpers
- Fix Discord adapter to use RuneCount Message Length
- Add tests for delta splitting, natural breaks, and final handling
* feat(telegram): use sendMessageDraft for streaming in private chats
Use Telegram Bot API 9.3's sendMessageDraft to stream partial messages
with smooth animation in private chats, replacing the sendMessage +
editMessageText approach. Group/channel chats keep the existing
edit-based streaming.
- Add sendTelegramDraft() for the sendMessageDraft API
- Detect private chats via conversation_type metadata in OpenStream
- Use 300ms throttle for drafts (vs 5s for edits)
- Send permanent messages at tool call boundaries and on final event
- Reset buffer atomically in StreamEventFinal to prevent duplicate
messages when multiple final events fire (one per assistant output)
* test(telegram): improve draft mode test assertions
Add sendTextForTest hook for sendTelegramTextReturnMessage to enable
direct assertion of send calls. Clean up residual unused variables
and replace indirect assertions with explicit mock-based verification.
cni.Remove() failure on stale iptables state blocked the retry
cni.Setup(), leaving bot containers without SNAT/MASQUERADE.
- Ignore cni.Remove() error so retry Setup always runs
- Add global MASQUERADE rule in entrypoints as belt-and-suspenders
Closes#161
- Fix DeleteContainer FAILED_PRECONDITION by cleaning up stopped task
entries before container deletion
- Fix CreateSnapshot leaving container in broken state: commit turns
the active snapshot read-only, so the full cycle (stop → commit →
prepare → delete → recreate → start) is now applied consistently
- Use context.WithoutCancel for atomic container replacement sequences
to prevent cancelled HTTP requests from corrupting container state
- Use dctx for DB operations (recordSnapshotVersion/insertEvent) to
avoid orphan snapshots in containerd without matching DB records
- Restart task + network after snapshot replacement, fixing Exec after
CreateVersion where the container had no running task
- Extract replaceContainerSnapshot helper to deduplicate the prepare →
delete → recreate → start pattern across three call sites
- Move snapshot list data fetching into Manager.ListBotSnapshotData to
encapsulate per-container locking; remove exported LockBot method
- Use UnixNano for snapshot names to avoid second-precision collisions
* fix(utils): preserve colon-containing values in tagsToRecord; align invalidFallback across date formatters; add formatRelativeTime
**key-value-tags: fix value truncation on tags with colons**
`tagsToRecord` used `tag.split(':')` with array destructuring, so any
value containing `:` (e.g. a webhook URL `https://example.com/hook`)
was silently truncated to just the scheme. Switch to `indexOf` so the
split happens only on the first colon, preserving the full value.
Example (before → after):
`tagsToRecord(['hook:https://api.example.com/cb'])`
before: `{ hook: 'https' }` ← bug
after: `{ hook: 'https://api.example.com/cb' }`
Add `key-value-tags.test.ts` covering: simple pairs, URL values,
multi-colon values, empty key/value, round-trip with `recordToTags`.
**date-time: honour `invalidFallback` consistently**
`FormatDateOptions` declares `invalidFallback` but only
`formatDateTimeSeconds` ever read it — `formatDateTime` and `formatDate`
both collapsed a present-but-invalid date string into `fallback ?? ''`,
making it impossible for callers to distinguish "nothing was passed" from
"a bad string was passed".
Extract a shared `resolveInvalid(value, options)` helper (prefers
`invalidFallback`, then `fallback`, then the raw value) and apply it
uniformly. Also refactor `formatDateTimeSeconds` to use the existing
`parseDate` helper, eliminating the duplicated `new Date` + `isNaN`
guard. No externally visible behaviour change for previously valid
combinations; callers that relied on invalid dates falling through to
`fallback` keep working since `resolveInvalid` falls through to
`fallback` when `invalidFallback` is absent.
**date-time: add `formatRelativeTime`**
Chat and notification UIs commonly need relative timestamps ("3 minutes
ago", "yesterday"). The utility file has no such function. Add
`formatRelativeTime(value, options?)` using `Intl.RelativeTimeFormat`
so the output respects the browser locale without hardcoded English
strings. Thresholds: seconds < 60 s, minutes < 1 h, hours < 24 h,
days < 7 d, beyond that falls back to `toLocaleDateString()`. Accepts
both ISO strings and `Date` objects.
Add `date-time.test.ts` covering all four exported functions including
`vi.useFakeTimers` assertions for `formatRelativeTime`.
* fix(utils): clean up formatRelativeTime after merge
Made-with: Cursor
- Add formatRelativeTime() to date-time utils (Intl.RelativeTimeFormat, locale-aware)
- Display relative time under each message in message-item.vue
- Show full datetime in title attribute on hover
Made-with: Cursor
Email is a supported channel (bindings, providers, outbox) but had no icon
and fell back to the generic comment icon. Use FontAwesome envelope.
Made-with: Cursor