Expose a dedicated container metrics endpoint and surface current CPU, memory, and root filesystem usage in the bot container view. This gives operators a quick health snapshot while degrading cleanly on unsupported backends.
Expose a paginated endpoint and UI table that lists individual LLM call
records (assistant messages with usage) per bot, showing time, session
type, model, provider, and token counts. Respects existing date / model
/ session-type filters and adds full-height loaders plus a max-width
layout to keep the usage page consistent with other top-level pages.
Introduce a new `show_tool_calls_in_im` bot setting plus a full overhaul of
how tool calls are surfaced in IM channels:
- Add per-bot setting + migration (0072) and expose through settings API /
handlers / frontend SDK.
- Introduce a `toolCallDroppingStream` wrapper that filters tool_call_* events
when the setting is off, keeping the rest of the stream intact.
- Add a shared `ToolCallPresentation` model (Header / Body blocks / Footer)
with plain and Markdown renderers, and a per-tool formatter registry that
produces rich output (e.g. `web_search` link lists, `list` directory
previews, `exec` stdout/stderr tails) instead of raw JSON dumps.
- High-capability adapters (Telegram, Feishu, Matrix, Slack, Discord) now
flush pre-text and then send ONE tool-call message per call, editing it
in-place from `running` to `completed` / `failed`; mapping from callID to
platform message ID is tracked per stream, with a fallback to a new
message if the edit fails. Low-capability adapters (WeCom, QQ, DingTalk)
keep posting a single final message, but now benefit from the same rich
per-tool formatting.
- Suppress the early duplicate `EventToolCallStart` (from
`sdk.ToolInputStartPart`) so that the SDK's final `StreamToolCallPart`
remains the single source of truth for tool call start, preventing
duplicated "running" bubbles in IM.
- Stop auto-populating `InputSummary` / `ResultSummary` after a per-tool
formatter runs, which previously leaked the raw JSON result as a
fallback footer underneath the formatted body.
Add regression tests for the formatters, the Markdown renderer, the
edit-in-place flow on Telegram/Matrix, and the JSON-leak guard on `list`.
The discuss driver's RC+TR composition had three compounding bugs that
caused old tasks to be re-answered after idle timeouts and made the LLM
blind to its own prior tool usage:
- DecodeTurnResponseEntry only kept visible text via TextContent(), so
assistant steps carrying only tool_call parts (the first half of every
tool round) were dropped entirely. Rewritten to render tool_call and
tool_result parts as <tool_call>/<tool_result> tags, covering both
Vercel-style content parts and legacy OpenAI ToolCalls/role=tool
envelopes. Reasoning parts remain stripped to avoid re-injection.
- loadTurnResponses hard-capped TRs at 24h while RC is replayed in full
from the events table, producing asymmetric context (user messages
from day 1 visible, matching bot replies missing). The cap is removed;
any size-bound trimming belongs in compaction, not here.
- lastProcessedMs lived only in memory and was set to time.Now() at turn
end. After the 10-minute idle timeout, the goroutine exited and the
next turn started with cursor=0, treating the entire history as new
traffic. Now initialised from the latest TR's requested_at on cold
start, and advanced to max(consumed RC.ReceivedAtMs) per turn so that
messages arriving mid-generation trigger a follow-up round instead of
being wrongly marked processed.
Avoid feeding structured tool payloads back into pipeline context as doubly encoded JSON, and return readable history summaries instead of raw message blobs.
* feat: expand speech provider support with new client types and configuration schema
* feat: add icon support for speech providers and update related configurations
* feat: add SVG support for Deepgram and Elevenlabs with Vue components
* feat: except *-speech client type in llm provider
* feat: enhance speech provider functionality with advanced settings and model import capabilities
* chore: remove go.mod replace
* feat: enhance speech provider functionality with advanced settings and model import capabilities
* chore: update go module dependencies
* feat: Ear and Mouth
* fix: separate ear/mouth page
* fix: separate audio domain and restore transcription templates
Move speech and transcription internals into the audio domain, restore template-driven transcription providers, and regenerate Swagger/SDK so the frontend can stop hand-calling /transcription-* APIs.
---------
Co-authored-by: aki <arisu@ieee.org>
* feat: expand speech provider support with new client types and configuration schema
* feat: add icon support for speech providers and update related configurations
* feat: add SVG support for Deepgram and Elevenlabs with Vue components
* feat: except *-speech client type in llm provider
* feat: enhance speech provider functionality with advanced settings and model import capabilities
* chore: remove go.mod replace
* feat: enhance speech provider functionality with advanced settings and model import capabilities
* chore: update go module dependencies
* feat: Ear and Mouth
* fix: separate ear/mouth page
* fix: separate audio domain and restore transcription templates
Move speech and transcription internals into the audio domain, restore template-driven transcription providers, and regenerate Swagger/SDK so the frontend can stop hand-calling /transcription-* APIs.
---------
Co-authored-by: aki <arisu@ieee.org>
* feat: expand speech provider support with new client types and configuration schema
* feat: add icon support for speech providers and update related configurations
* feat: add SVG support for Deepgram and Elevenlabs with Vue components
* feat: except *-speech client type in llm provider
* feat: enhance speech provider functionality with advanced settings and model import capabilities
* chore: remove go.mod replace
* feat: enhance speech provider functionality with advanced settings and model import capabilities
* chore: update go module dependencies
---------
Co-authored-by: Acbox <acbox0328@gmail.com>
Give bots their known per-channel account handles in the system prompt so they can reason about platform-specific self references consistently. Reuse persisted channel self_identity data across chat, discuss, schedule, heartbeat, and subagent prompts.
Emit tool-call placeholders as soon as tool input streaming starts so long writes appear immediately in chat. Reuse the same UI tool message when full input arrives to avoid duplicate cards, and keep the hook-required test suite green.
- Add POST /bots/:bot_id/sessions/:session_id/compact endpoint for
synchronous context compaction with fallback to chat model when no
dedicated compaction model is configured
- Add "Compact Now" button to session info panel in the web UI
- Add /compact slash command for triggering compaction from chat
- Regenerate OpenAPI spec and TypeScript SDK
SQL queries (CountProviders, CountModels, ListModels, ListEnabledModels,
ListModelsByProviderID) now exclude speech types. Added IsLLMClientType
guard to prevent cross-domain queries via /models?client_type and
/providers/:id/import-models. Frontend provider forms no longer offer
edge-speech as a client type option.
Also fixed pre-existing SA5011 staticcheck warnings in proxy_test.go
and executor_test.go.
The per-bot context_token_budget column was unused (no frontend UI) and
has been removed. Context trimming now derives the budget from the chat
model's context_window setting, which is already configured per model.
1. Split the oversized `cmd/agent` entrypoint into a multi-file package and update dev/build scripts to use the package path instead of compiling `main.go` directly.
2. Add a new `memoh` terminal UI for local bot chat, with Bubble Tea
Initialize new bots with preset ACL templates and an allow-by-default fallback so common access setups can be selected during bot creation instead of being configured manually afterward.
Add explicit scalar casts for optional timezone and context token budget fields so sqlc can parse UpsertBotSettings again. Regenerating sqlc also syncs the stale generated bot/query structs that drifted from the schema.
Make slash commands easier to navigate in chat by splitting help into levels, compacting list output, and surfacing current selections for model, search, memory, and browser settings. Also route /status to the active conversation session and add an access inspector so users can understand their current command and ACL context.
Unify webhook handling across channel adapters and add the WeChat Official Account channel so inbound routing and replies work without platform-specific handlers. Add adapter-scoped proxy support and stable config field ordering so restricted network environments can deliver WeChat and Telegram messages reliably.
* refactor: unify providers and models tables
- Rename `llm_providers` → `providers`, `llm_provider_oauth_tokens` → `provider_oauth_tokens`
- Remove `tts_providers` and `tts_models` tables; speech models now live in the unified `models` table with `type = 'speech'`
- Replace top-level `api_key`/`base_url` columns with a JSONB `config` field on `providers`
- Rename `llm_provider_id` → `provider_id` across all references
- Add `edge-speech` client type and `conf/providers/edge.yaml` default provider
- Create new read-only speech endpoints (`/speech-providers`, `/speech-models`) backed by filtered views of the unified tables
- Remove old TTS CRUD handlers; simplify speech page to read-only + test
- Update registry loader to skip malformed YAML files instead of failing entirely
- Fix YAML quoting for model names containing colons in openrouter.yaml
- Regenerate sqlc, swagger, and TypeScript SDK
* fix: exclude speech providers from providers list endpoint
ListProviders now filters out client_type matching '%-speech' so Edge
and future speech providers no longer appear on the Providers page.
ListSpeechProviders uses the same pattern match instead of hard-coding
'edge-speech'.
* fix: use explicit client_type list instead of LIKE pattern
Replace '%-speech' pattern with explicit IN ('edge-speech') for both
ListProviders (exclusion) and ListSpeechProviders (inclusion). New
speech client types must be added to both queries.
* fix: use EXECUTE for dynamic SQL in migrations referencing old schema
PL/pgSQL pre-validates column/table references in static SQL statements
inside DO blocks before evaluating IF/RETURN guards. This caused
migrations 0010-0061 to fail on fresh databases where the canonical
schema uses `providers`/`provider_id` instead of `llm_providers`/
`llm_provider_id`.
Wrap all SQL that references potentially non-existent old schema objects
(llm_providers, llm_provider_id, tts_providers, tts_models, etc.) in
EXECUTE strings so they are only parsed at runtime when actually reached.
* fix: revert canonical schema to use llm_providers for migration compatibility
The CI migrations workflow (up → down → up) failed because 0061 down
renames `providers` back to `llm_providers`, but 0001 down only dropped
`providers` — leaving `llm_providers` as a remnant. On the second
migrate up, 0010 found the stale `llm_providers` and tried to reference
`models.llm_provider_id` which no longer existed.
Revert 0001 canonical schema to use original names (llm_providers,
tts_providers, tts_models) so incremental migrations work naturally and
0061 handles the final rename. Remove EXECUTE wrappers and unnecessary
guards from migrations that now always operate on llm_providers.
* fix: icons
* fix: sync canonical schema with 0061 migration to fix sqlc column mismatch
0001_init.up.sql still used old names (llm_providers, llm_provider_id)
and included dropped tts_providers/tts_models tables. sqlc could not
parse the PL/pgSQL EXECUTE in migration 0061, so generated code retained
stale columns (input_modalities, supports_reasoning) causing runtime
"column does not exist" errors when adding models.
- Update 0001_init.up.sql to current schema (providers, provider_id,
no tts tables, add provider_oauth_tokens)
- Use ALTER TABLE IF EXISTS in 0010/0041/0042 for backward compat
- Regenerate sqlc
* fix: guard all legacy migrations against fresh schema for CI compat
On fresh databases, 0001_init.up.sql creates providers/provider_id
(not llm_providers/llm_provider_id). Migrations 0013, 0041, 0046, 0047
referenced the old names without guards, causing CI migration failures.
- 0013: check llm_provider_id column exists before adding old constraint
- 0041: check llm_providers table exists before backfill/constraint DDL
- 0046: wrap CREATE TABLE in DO block with llm_providers existence check
- 0047: use ALTER TABLE IF EXISTS + DO block guard
* refactor: introduce DCP pipeline layer for unified context assembly
Introduce a Deterministic Context Pipeline (DCP) inspired by Cahciua,
providing event-driven context assembly for LLM conversations.
- Add `internal/pipeline/` package with Canonical Event types, Projection
(reduce), Rendering (XML RC), Pipeline manager, and EventStore persistence
- Change user message format from YAML front-matter to XML `<message>` tags
with self-contained attributes (sender, channel, conversation, type)
- Merge CLI/Web dual API into single `/local/` endpoint, remove CLI handler
- Add `bot_session_events` table for event persistence and cold-start replay
- Add `discuss` session type (reserved for future Cahciua-style mode)
- Wire pipeline into HandleInbound: adapt → persist → push on every message
- Lazy cold-start replay: load events from DB on first session access
* feat: implement discuss mode with reactive driver and probe gate
Add discuss session mode where the bot autonomously decides when to speak
in group chats via tool-gated output (send tool only, no direct text reply).
- Add discuss driver (per-session goroutine, RC watch, step loop via
agent.Generate, TR persistence, late-binding prompt with mention hints)
- Add system_discuss.md prompt template ("text = inner monologue, send = speak")
- Add context composition (MergeContext, ComposeContext, TrimContext) for
RC + assistant/tool message interleaving by timestamp
- Add probe gate: when discuss_probe_model_id is set, cheap model pre-filters
group messages; no tool calls = silence, tool calls = activate primary
- Add /new [chat|discuss] command: explicit mode selection, defaults to
discuss in groups, chat in DMs, chat-only for WebUI
- Add ResolveRunConfig on flow.Resolver for discuss driver to reuse
model/tools/system-prompt resolution without reimplementing
- Fix send tool for discuss mode: same-conversation sends now go through
SendDirect (channel adapter) instead of the local emitter shortcut
- Add target attribute to XML message format (reply_target for routing)
- Add discuss_probe_model_id to bots table settings
- Remove pipeline compaction (SetCompactCursor) — reuse existing compaction.Service
- Persist full SDK messages (including tool calls) in discuss mode
* refactor: unify DCP event layer, fix persistence and local channel
- Fix bot_session_events dedup index to include event_kind so that
message + edit events for the same external_message_id coexist.
- Change CreateSessionEvent from :one to :exec so ON CONFLICT DO NOTHING
does not produce spurious errors on duplicate delivery.
- Move ACL evaluation before event ingest; denied messages no longer
enter bot_session_events or the in-memory pipeline.
- Let chat mode consume RenderedContext from the DCP pipeline when
available, sharing the same event-driven context assembly as discuss.
- Collapse local WebSocket handler to route through HandleInbound
instead of directly calling StreamChatWS, eliminating the dual
business entry point.
- Extract buildBaseRunConfig shared builder so resolve() and
ResolveRunConfig() no longer duplicate model/credentials/skills setup.
- Add StoreRound to RunConfigResolver interface so discuss driver
persists assistant output with full metadata, usage, and memory
extraction (same quality as chat mode).
- Fix discuss driver context: use context.Background() instead of the
short-lived HTTP request context that was getting cancelled.
- Fix model ID passed to StoreRound: return database UUID from
ResolveRunConfig instead of SDK model name.
- Remove dead CLIAdapter/CLIType and update legacy web/cli references
in tests and comments.
* fix: stop idle discuss goroutines after 10min timeout
Discuss session goroutines were never cleaned up when a session became
inactive (e.g. after /new). Add a 10-minute idle timer that auto-exits
the goroutine and removes it from the sessions map when no new RC
arrives.
* refactor: pipeline details — event types, structured reply, display content
- Remove [User sent N attachments] placeholder text from buildInboundQuery;
attachment info is now expressed via pipeline <attachment> tags.
- Unify in-reply-to as structured ReplyRef (Sender/Preview fields) across
Telegram, Discord, Feishu, and Matrix adapters instead of prepending
[Reply to ...] text into the message body. Remove now-unused
buildTelegramQuotedText, buildDiscordQuotedText, buildMatrixQuotedText.
- Make AdaptInbound return CanonicalEvent interface and dispatch to
adaptMessage/adaptEdit/adaptService based on metadata["event_type"].
- Add event_id column to bot_history_messages (migration 0059) so user
messages can reference their canonical pipeline event.
- PersistEvent now returns the event UUID; HandleInbound passes it through
to both persistPassiveMessage and ChatRequest.EventID for storeRound.
- Add FillDisplayContent to message service: extracts plain text from
event_data for clean frontend display.
- Frontend extractMessageText prefers display_content when available,
falling back to legacy strip logic for old messages.
- Fix: always generate headerifiedQuery for storage even when usePipeline
is true, so user messages are persisted via storeRound in chat mode.
* fix: use json.Marshal for pipeline context content serialization
The manual string escaping in buildMessagesFromPipeline only handled
double quotes but not newlines, backslashes, and other JSON special
characters, producing invalid json.RawMessage values. The LLM then
received empty/malformed context and complained about having no history.
* fix: restore WebSocket handler to use StreamChatWS directly
The previous refactoring replaced the WS handler with HandleInbound +
RouteHub subscription, which broke streaming because RouteHub events
use a different format (channel.StreamEvent) than what the frontend
expects (flow.WSStreamEvent with text_delta, tool_call_start, etc.).
Restore the original direct StreamChatWS call path so WebUI streaming
works again. The WS handler now matches the pre-refactoring behavior
while all other changes (pipeline, ACL, event types, etc.) are kept.
* feat: store display_text directly in bot_history_messages
Instead of computing display content at API response time by querying
bot_session_events via event_id, store the raw user text in a dedicated
display_text column at write time. This works for all paths including
the WebSocket handler which does not go through the pipeline/event layer.
- Migration 0060: add display_text TEXT column
- PersistInput gains DisplayText; filled from trimmedText (passive) and
req.Query (storeRound)
- toMessageFields reads display_text into DisplayContent
- Remove FillDisplayContent runtime query and ListSessionEventsByEventID
- Frontend already prefers display_content when available (no change)
* fix: display_text should contain raw user text, not XML-wrapped query
req.Query gets overwritten to headerifiedQuery (with XML <message> tags)
before storeRound runs. Add RawQuery field to ChatRequest to preserve
the original user text, and use it for display_text in storeMessages.
* fix(web): show discuss sessions
* refactor: introduce DCP pipeline layer for unified context assembly
Introduce a Deterministic Context Pipeline (DCP) inspired by Cahciua,
providing event-driven context assembly for LLM conversations.
- Add `internal/pipeline/` package with Canonical Event types, Projection
(reduce), Rendering (XML RC), Pipeline manager, and EventStore persistence
- Change user message format from YAML front-matter to XML `<message>` tags
with self-contained attributes (sender, channel, conversation, type)
- Merge CLI/Web dual API into single `/local/` endpoint, remove CLI handler
- Add `bot_session_events` table for event persistence and cold-start replay
- Add `discuss` session type (reserved for future Cahciua-style mode)
- Wire pipeline into HandleInbound: adapt → persist → push on every message
- Lazy cold-start replay: load events from DB on first session access
* feat: implement discuss mode with reactive driver and probe gate
Add discuss session mode where the bot autonomously decides when to speak
in group chats via tool-gated output (send tool only, no direct text reply).
- Add discuss driver (per-session goroutine, RC watch, step loop via
agent.Generate, TR persistence, late-binding prompt with mention hints)
- Add system_discuss.md prompt template ("text = inner monologue, send = speak")
- Add context composition (MergeContext, ComposeContext, TrimContext) for
RC + assistant/tool message interleaving by timestamp
- Add probe gate: when discuss_probe_model_id is set, cheap model pre-filters
group messages; no tool calls = silence, tool calls = activate primary
- Add /new [chat|discuss] command: explicit mode selection, defaults to
discuss in groups, chat in DMs, chat-only for WebUI
- Add ResolveRunConfig on flow.Resolver for discuss driver to reuse
model/tools/system-prompt resolution without reimplementing
- Fix send tool for discuss mode: same-conversation sends now go through
SendDirect (channel adapter) instead of the local emitter shortcut
- Add target attribute to XML message format (reply_target for routing)
- Add discuss_probe_model_id to bots table settings
- Remove pipeline compaction (SetCompactCursor) — reuse existing compaction.Service
- Persist full SDK messages (including tool calls) in discuss mode
* refactor: unify DCP event layer, fix persistence and local channel
- Fix bot_session_events dedup index to include event_kind so that
message + edit events for the same external_message_id coexist.
- Change CreateSessionEvent from :one to :exec so ON CONFLICT DO NOTHING
does not produce spurious errors on duplicate delivery.
- Move ACL evaluation before event ingest; denied messages no longer
enter bot_session_events or the in-memory pipeline.
- Let chat mode consume RenderedContext from the DCP pipeline when
available, sharing the same event-driven context assembly as discuss.
- Collapse local WebSocket handler to route through HandleInbound
instead of directly calling StreamChatWS, eliminating the dual
business entry point.
- Extract buildBaseRunConfig shared builder so resolve() and
ResolveRunConfig() no longer duplicate model/credentials/skills setup.
- Add StoreRound to RunConfigResolver interface so discuss driver
persists assistant output with full metadata, usage, and memory
extraction (same quality as chat mode).
- Fix discuss driver context: use context.Background() instead of the
short-lived HTTP request context that was getting cancelled.
- Fix model ID passed to StoreRound: return database UUID from
ResolveRunConfig instead of SDK model name.
- Remove dead CLIAdapter/CLIType and update legacy web/cli references
in tests and comments.
* fix: stop idle discuss goroutines after 10min timeout
Discuss session goroutines were never cleaned up when a session became
inactive (e.g. after /new). Add a 10-minute idle timer that auto-exits
the goroutine and removes it from the sessions map when no new RC
arrives.
* refactor: pipeline details — event types, structured reply, display content
- Remove [User sent N attachments] placeholder text from buildInboundQuery;
attachment info is now expressed via pipeline <attachment> tags.
- Unify in-reply-to as structured ReplyRef (Sender/Preview fields) across
Telegram, Discord, Feishu, and Matrix adapters instead of prepending
[Reply to ...] text into the message body. Remove now-unused
buildTelegramQuotedText, buildDiscordQuotedText, buildMatrixQuotedText.
- Make AdaptInbound return CanonicalEvent interface and dispatch to
adaptMessage/adaptEdit/adaptService based on metadata["event_type"].
- Add event_id column to bot_history_messages (migration 0059) so user
messages can reference their canonical pipeline event.
- PersistEvent now returns the event UUID; HandleInbound passes it through
to both persistPassiveMessage and ChatRequest.EventID for storeRound.
- Add FillDisplayContent to message service: extracts plain text from
event_data for clean frontend display.
- Frontend extractMessageText prefers display_content when available,
falling back to legacy strip logic for old messages.
- Fix: always generate headerifiedQuery for storage even when usePipeline
is true, so user messages are persisted via storeRound in chat mode.
* fix: use json.Marshal for pipeline context content serialization
The manual string escaping in buildMessagesFromPipeline only handled
double quotes but not newlines, backslashes, and other JSON special
characters, producing invalid json.RawMessage values. The LLM then
received empty/malformed context and complained about having no history.
* fix: restore WebSocket handler to use StreamChatWS directly
The previous refactoring replaced the WS handler with HandleInbound +
RouteHub subscription, which broke streaming because RouteHub events
use a different format (channel.StreamEvent) than what the frontend
expects (flow.WSStreamEvent with text_delta, tool_call_start, etc.).
Restore the original direct StreamChatWS call path so WebUI streaming
works again. The WS handler now matches the pre-refactoring behavior
while all other changes (pipeline, ACL, event types, etc.) are kept.
* feat: store display_text directly in bot_history_messages
Instead of computing display content at API response time by querying
bot_session_events via event_id, store the raw user text in a dedicated
display_text column at write time. This works for all paths including
the WebSocket handler which does not go through the pipeline/event layer.
- Migration 0060: add display_text TEXT column
- PersistInput gains DisplayText; filled from trimmedText (passive) and
req.Query (storeRound)
- toMessageFields reads display_text into DisplayContent
- Remove FillDisplayContent runtime query and ListSessionEventsByEventID
- Frontend already prefers display_content when available (no change)
* fix: display_text should contain raw user text, not XML-wrapped query
req.Query gets overwritten to headerifiedQuery (with XML <message> tags)
before storeRound runs. Add RawQuery field to ChatRequest to preserve
the original user text, and use it for display_text in storeMessages.
* fix(web): show discuss sessions
* chore(feishu): change discuss output to stream card
* fix(channel): unify discuss/chat send path and card markdown delivery
* feat(discuss): switch to stream execution with RouteHub broadcasting
* refactor(pipeline): remove context trimming from ComposeContext
The pipeline path should not trim context by token budget — the
upstream IC/RC already bounds the event window. Remove TrimContext,
FindWorkingWindowCursor, EstimateTokens, FormatLastProcessedMs (all
unused or only used for trimming), the maxTokens parameter from
ComposeContext, and MaxContextTokens from DiscussSessionConfig.
---------
Co-authored-by: 晨苒 <16112591+chen-ran@users.noreply.github.com>
Add native Playwright WebSocket sessions alongside the existing curated
browser tools. Agents can now create remote sessions that expose a full
Playwright API over WebSocket, enabling advanced use cases like HttpOnly
cookie injection, storage state management, and route interception.
Key changes:
- Per-bot isolated browser processes (launchServer via Node child process)
- New session module with create/close/status/heartbeat endpoints
- New browser_remote_session agent tool (Go)
- Storage state export/import on existing browser contexts
- Bot ID plumbing through context creation for process isolation
- Inflight deduplication to prevent duplicate browser launches
- Session janitor for automatic expiry cleanup
The WebSocket handler rejected messages with empty text even when
attachments were present, while the HTTP POST endpoint correctly
used Message.IsEmpty(). Move the empty-check after attachment
parsing so only truly empty messages are rejected.
* refactor(agent): replace XML tag extraction with tool-based send/react/speak
Remove the <attachments>, <reactions>, and <speech> XML tag extraction
system from the agent streaming pipeline. Instead, the send/react/speak
tools now handle both same-conversation and cross-conversation delivery:
- send: omit target to deliver attachments in the current conversation;
specify target for cross-channel messaging
- react: omit target to react in the current conversation
- speak: omit target to speak in the current conversation
Backend changes:
- Add StreamEmitter callback to tools.SessionContext so tools can push
attachment/reaction/speech events directly into the agent stream
- Wire emitter in agent.go for both streaming and non-streaming paths
- Remove StreamTagExtractor, DefaultTagResolvers, emitTagEvents, and
delete internal/agent/tags.go entirely
- Remove StripAgentTags calls from assistant_output.go
- Add IsSameConversation detection in messaging executor; same-conv
sends pass raw paths through the emitter for downstream ingestion
- Auto-resolve relative paths (e.g. "IDENTITY.md" -> "/data/IDENTITY.md")
- Add Metadata propagation through the full attachment chain
(tools.Attachment -> agent.FileAttachment -> parseAttachmentDelta)
- Update system_chat.md and _contacts.md prompts
Frontend changes (apps/web):
- Hide send/react/speak tool_call blocks when result indicates
delivered to current conversation
- Defer attachment_delta blocks to end of message (flush on stream
completion) for consistent positioning with DB-loaded history
* fix(agent): speak tool emits synthesized audio directly as voice attachment
Instead of emitting speech_delta (which requires downstream re-synthesis),
the speak tool now emits the already-synthesized audio as an attachment_delta
with voice type. This avoids double TTS synthesis and eliminates dependency
on ttsService being configured on the inbound processor.
Also fixes speak on WebUI where ReplyTarget is empty (same fix as send).