Memoh

mirror of https://github.com/memohai/Memoh.git synced 2026-04-25 07:00:48 +09:00

Author	SHA1	Message	Date
Acbox	63fe03cfff	Revert "Feat/speech support (#392 )" This reverts commit `c9dcfe287f`.	2026-04-22 00:10:36 +08:00
Acbox	c9dcfe287f	Feat/speech support (#392 ) * feat: expand speech provider support with new client types and configuration schema * feat: add icon support for speech providers and update related configurations * feat: add SVG support for Deepgram and Elevenlabs with Vue components * feat: except -speech client type in llm provider feat: enhance speech provider functionality with advanced settings and model import capabilities * chore: remove go.mod replace * feat: enhance speech provider functionality with advanced settings and model import capabilities * chore: update go module dependencies * feat: Ear and Mouth * fix: separate ear/mouth page * fix: separate audio domain and restore transcription templates Move speech and transcription internals into the audio domain, restore template-driven transcription providers, and regenerate Swagger/SDK so the frontend can stop hand-calling /transcription-* APIs. --------- Co-authored-by: aki <arisu@ieee.org>	2026-04-22 00:09:46 +08:00
Acbox Liu	5cfbaa40e2	refactor(agent): replace XML tag extraction with tool-based send/react/speak (#330 ) * refactor(agent): replace XML tag extraction with tool-based send/react/speak Remove the <attachments>, <reactions>, and <speech> XML tag extraction system from the agent streaming pipeline. Instead, the send/react/speak tools now handle both same-conversation and cross-conversation delivery: - send: omit target to deliver attachments in the current conversation; specify target for cross-channel messaging - react: omit target to react in the current conversation - speak: omit target to speak in the current conversation Backend changes: - Add StreamEmitter callback to tools.SessionContext so tools can push attachment/reaction/speech events directly into the agent stream - Wire emitter in agent.go for both streaming and non-streaming paths - Remove StreamTagExtractor, DefaultTagResolvers, emitTagEvents, and delete internal/agent/tags.go entirely - Remove StripAgentTags calls from assistant_output.go - Add IsSameConversation detection in messaging executor; same-conv sends pass raw paths through the emitter for downstream ingestion - Auto-resolve relative paths (e.g. "IDENTITY.md" -> "/data/IDENTITY.md") - Add Metadata propagation through the full attachment chain (tools.Attachment -> agent.FileAttachment -> parseAttachmentDelta) - Update system_chat.md and _contacts.md prompts Frontend changes (apps/web): - Hide send/react/speak tool_call blocks when result indicates delivered to current conversation - Defer attachment_delta blocks to end of message (flush on stream completion) for consistent positioning with DB-loaded history * fix(agent): speak tool emits synthesized audio directly as voice attachment Instead of emitting speech_delta (which requires downstream re-synthesis), the speak tool now emits the already-synthesized audio as an attachment_delta with voice type. This avoids double TTS synthesis and eliminates dependency on ttsService being configured on the inbound processor. Also fixes speak on WebUI where ReplyTarget is empty (same fix as send).	2026-04-04 20:55:03 +08:00
Acbox Liu	b3a39ad93d	refactor: replace persistent subagents with ephemeral spawn tool (#280 ) * refactor: replace persistent subagents with ephemeral spawn tool (#subagent) - Drop subagents table, remove all persistent subagent infrastructure - Add 'subagent' session type with parent_session_id on bot_sessions - Rewrite subagent tool as single 'spawn' tool with parallel execution - Create system_subagent.md prompt, add _subagent.md include for chat - Limit subagent tools to file, exec, web_search, web_fetch only - Merge subagent token usage into parent chat session in reporting - Remove frontend subagent management page, update chat UI for spawn - Fix UTF-8 truncation in session title, fix query not passed to agent * refactor: remove history message page	2026-03-22 19:03:28 +08:00
Acbox Liu	1680316c7f	refactor(agent): remove agent gateway instead of twilight sdk (#264 ) * refactor(agent): replace TypeScript agent gateway with in-process Go agent using twilight-ai SDK - Remove apps/agent (Bun/Elysia gateway), packages/agent (@memoh/agent), internal/bun runtime manager, and all embedded agent/bun assets - Add internal/agent package powered by twilight-ai SDK for LLM calls, tool execution, streaming, sential logic, tag extraction, and prompts - Integrate ToolGatewayService in-process for both built-in and user MCP tools, eliminating HTTP round-trips to the old gateway - Update resolver to convert between sdk.Message and ModelMessage at the boundary (resolver_messages.go), keeping agent package free of persistence concerns - Prepend user message before storeRound since SDK only returns output messages (assistant + tool) - Clean up all Docker configs, TOML configs, nginx proxy, Dockerfile.agent, and Go config structs related to the removed agent gateway - Update cmd/agent and cmd/memoh entry points with setter-based ToolGateway injection to avoid FX dependency cycles * fix(web): move form declaration before computed properties that reference it The `form` reactive object was declared after computed properties like `selectedMemoryProvider` and `isSelectedMemoryProviderPersisted` that reference it, causing a TDZ ReferenceError during setup. * fix: prevent UTF-8 character corruption in streaming text output StreamTagExtractor.Push() used byte-level string slicing to hold back buffer tails for tag detection, which could split multi-byte UTF-8 characters. After json.Marshal replaced invalid bytes with U+FFFD, the corruption became permanent — causing garbled CJK characters (�) in agent responses. Add safeUTF8SplitIndex() to back up split points to valid character boundaries. Also fix byte-level truncation in command/formatter.go and command/fs.go to use rune-aware slicing. * fix: add agent error logging and fix Gemini tool schema validation - Log agent stream errors in both SSE and WebSocket paths with bot/model context - Fix send tool `attachments` parameter: empty `items` schema rejected by Google Gemini API (INVALID_ARGUMENT), now specifies `{"type": "string"}` - Upgrade twilight-ai to d898f0b (includes raw body in API error messages) * chore(ci): remove agent gateway from Docker build and release pipelines Agent gateway has been replaced by in-process Go agent; remove the obsolete Docker image matrix entry, Bun/UPX CI steps, and agent-binary build logic from the release script. * fix: preserve attachment filename, metadata, and container path through persistence - Add `name` column to `bot_history_message_assets` (migration 0034) to persist original filenames across page refreshes. - Add `metadata` JSONB column (migration 0035) to store source_path, source_url, and other context alongside each asset. - Update SQL queries, sqlc-generated code, and all Go types (MessageAsset, AssetRef, OutboundAssetRef, FileAttachment) to carry name and metadata through the full lifecycle. - Extract filenames from path/URL in AttachmentsResolver before clearing raw paths; enrich streaming event metadata with name, source_path, and source_url in both the WebSocket and channel inbound ingestion paths. - Implement `LinkAssets` on message service and `LinkOutboundAssets` on flow resolver so WebSocket-streamed bot attachments are persisted to the correct assistant message after streaming completes. - Frontend: update MessageAsset type with metadata field, pass metadata through to attachment items, and reorder attachment-block.vue template so container files (identified by metadata.source_path) open in the sidebar file manager instead of triggering a download. * refactor(agent): decouple built-in tools from MCP, load via ToolProvider interface Migrate all 13 built-in tool providers from internal/mcp/providers/ to internal/agent/tools/ using the twilight-ai sdk.Tool structure. The agent now loads tools through a ToolProvider interface instead of the MCP ToolGatewayService, which is simplified to only manage external federation sources. This enables selective tool loading and removes the coupling between business tools and the MCP protocol layer. * refactor(flow): split monolithic resolver.go into focused modules Break the 1959-line resolver.go into 12 files organized by concern: - resolver.go: core orchestration (Resolver struct, resolve, Chat, prepareRunConfig) - resolver_stream.go: streaming (StreamChat, StreamChatWS, tryStoreStream) - resolver_trigger.go: schedule/heartbeat triggers - resolver_attachments.go: attachment routing, inlining, encoding - resolver_history.go: message loading, deduplication, token trimming - resolver_store.go: persistence (storeRound, storeMessages, asset linking) - resolver_memory.go: memory provider integration - resolver_model_selection.go: model selection and candidate matching - resolver_identity.go: display name and channel identity resolution - resolver_settings.go: bot settings, loop detection, inbox - user_header.go: YAML front-matter formatting - resolver_util.go: shared utilities (sanitize, normalize, dedup, UUID) * fix(agent): enable Anthropic extended thinking by passing ReasoningConfig to provider Anthropic's thinking requires WithThinking() at provider creation time, unlike OpenAI which uses per-request ReasoningEffort. The config was never wired through, so Claude models could not trigger thinking. * refactor(agent): extract prompts into embedded markdown templates Move inline prompt strings from prompt.go into separate .md files under internal/agent/prompts/, using {{key}} placeholders and a simple render engine. Remove obsolete SystemPromptParams fields (Language, MaxContextLoadTime, Channels, CurrentChannel) and their call-site usage. * fix: lint	2026-03-19 13:31:54 +08:00

5 Commits