diff --git a/AGENTS.md b/AGENTS.md index dbc144f3..00166f62 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -6,13 +6,14 @@ Memoh is a multi-member, structured long-memory, containerized AI agent system p ## Architecture Overview -The system consists of three core services: +The system consists of four core services: | Service | Tech Stack | Port | Description | |---------|-----------|------|-------------| | **Server** (Backend) | Go + Echo | 8080 | Main service: REST API, auth, database, container management | | **Agent Gateway** | Bun + Elysia | 8081 | AI chat gateway: handles chat requests, tool execution, and SSE streaming | | **Web** (Frontend) | Vue 3 + Vite | 8082 | Management UI: visual configuration for Bots, Models, Channels, etc. | +| **Browser Gateway** | Bun + Elysia + Playwright | 8083 | Browser automation service: headless browser actions for bots | Infrastructure dependencies: - **PostgreSQL** — Relational data storage @@ -59,7 +60,7 @@ Infrastructure dependencies: Memoh/ ├── cmd/ # Go application entry points │ ├── agent/ # Main backend server (main.go) -│ ├── mcp/ # MCP server binary (stdio transport) +│ ├── mcp/ # MCP server binary (stdio transport, template/, entrypoint.sh) │ └── memoh/ # Unified binary wrapper (Cobra CLI) ├── internal/ # Go backend core code (domain packages) │ ├── accounts/ # User account management (CRUD, password hashing) @@ -68,6 +69,7 @@ Memoh/ │ ├── bind/ # Channel identity-to-user binding code management │ ├── boot/ # Runtime configuration provider (container backend detection) │ ├── bots/ # Bot management (CRUD, lifecycle) +│ ├── browsercontexts/ # Browser context management (CRUD) │ ├── bun/ # Bun runtime manager (agent gateway process lifecycle) │ ├── channel/ # Channel adapter system (Telegram, Discord, Feishu, Local, Email) │ ├── config/ # Configuration loading and parsing (TOML) @@ -94,6 +96,7 @@ Memoh/ │ ├── providers/ # LLM provider management (OpenAI, Anthropic, etc.) │ ├── prune/ # Text pruning utilities (truncation with head/tail) │ ├── schedule/ # Scheduled task service (cron) +│ ├── searchengines/ # Search engine abstraction (reserved) │ ├── searchproviders/ # Search engine provider management (Brave, etc.) │ ├── server/ # HTTP server wrapper (Echo setup, middleware, shutdown) │ ├── settings/ # Bot settings management @@ -101,23 +104,34 @@ Memoh/ │ ├── subagent/ # Sub-agent management (CRUD) │ └── version/ # Build-time version information ├── apps/ # Application services -│ └── agent/ # Agent Gateway (Bun/Elysia) -│ └── src/ -│ ├── index.ts # Elysia server entry point -│ ├── modules/ # Route modules (chat, stream, trigger) -│ ├── middlewares/ # CORS, error handling, bearer auth -│ ├── utils/ # SSE utilities -│ └── models.ts # Zod request schemas -├── packages/ # TypeScript monorepo +│ ├── agent/ # Agent Gateway (Bun/Elysia) +│ │ └── src/ +│ │ ├── index.ts # Elysia server entry point +│ │ ├── modules/ # Route modules (chat, stream, trigger) +│ │ ├── middlewares/ # CORS, error handling, bearer auth +│ │ ├── utils/ # SSE utilities +│ │ └── models.ts # Zod request schemas +│ ├── browser/ # Browser Gateway (Bun/Elysia/Playwright) +│ │ └── src/ +│ │ ├── index.ts # Elysia server entry point +│ │ ├── browser.ts # Playwright browser lifecycle +│ │ ├── modules/ # Route modules (action, context, devices) +│ │ ├── middlewares/ # CORS, error handling, bearer auth +│ │ ├── types/ # TypeScript type definitions +│ │ ├── storage.ts # Browser context storage +│ │ └── models.ts # Zod request schemas +│ └── web/ # Main web app (@memoh/web, Vue 3) +├── packages/ # Shared TypeScript libraries │ ├── agent/ # Core agent library (@memoh/agent) │ │ └── src/ │ │ ├── agent.ts # Agent creation and streaming logic │ │ ├── model.ts # Model configuration and creation +│ │ ├── tool-loop.ts # Tool execution loop +│ │ ├── sential.ts # Sential (sentinel) logic │ │ ├── tools/ # Tool implementations (MCP, web, subagent, skill) │ │ ├── prompts/ # System/heartbeat/schedule/subagent prompts │ │ ├── types/ # TypeScript type definitions │ │ └── utils/ # Attachments, headers, filesystem utilities -│ ├── web/ # Main web app (@memoh/web, Vue 3) │ ├── ui/ # Shared UI component library (@memoh/ui) │ ├── sdk/ # TypeScript SDK (@memoh/sdk, auto-generated from OpenAPI) │ ├── cli/ # CLI tool (@memoh/cli, Commander.js) @@ -126,9 +140,9 @@ Memoh/ ├── db/ # Database │ ├── migrations/ # SQL migration files │ └── queries/ # SQL query files (sqlc input) -├── conf/ # Configuration templates (app.example.toml, app.docker.toml) -├── devenv/ # Dev environment (docker-compose, dev Dockerfiles, app.dev.toml, mcp-build.sh) -├── docker/ # Production Docker build & runtime (Dockerfiles, entrypoints, nginx.conf) +├── conf/ # Configuration templates (app.example.toml, app.docker.toml, app.apple.toml, app.windows.toml) +├── devenv/ # Dev environment (docker-compose, dev Dockerfiles, app.dev.toml, mcp-build.sh, server-entrypoint.sh) +├── docker/ # Production Docker (Dockerfiles, entrypoints, nginx.conf, docker-compose.yml, docker-compose.cn.yml) ├── docs/ # Documentation site ├── scripts/ # Utility scripts (db, release, install) ├── docker-compose.yml # Docker Compose orchestration (production) @@ -259,6 +273,7 @@ Migrations live in `db/migrations/` and follow a dual-update convention: | `lifecycle_events` | Container lifecycle events | | `schedule` | Scheduled tasks (cron) | | `subagents` | Sub-agent definitions | +| `browser_contexts` | Browser context configurations (Playwright) | | `storage_providers` | Pluggable object storage backends | | `bot_storage_bindings` | Per-bot storage backend selection | | `bot_inbox` | Per-bot inbox (notifications, triggers) | @@ -280,15 +295,18 @@ The main configuration file is `config.toml` (copied from `conf/app.example.toml - `[postgres]` — PostgreSQL connection - `[qdrant]` — Qdrant vector database connection - `[agent_gateway]` — Agent Gateway address +- `[browser_gateway]` — Browser Gateway address - `[web]` — Web frontend address Configuration templates available in `conf/`: - `app.example.toml` — Default template -- `app.dev.toml` — Development (connects to devenv docker-compose) - `app.docker.toml` — Docker deployment - `app.apple.toml` — macOS (Apple Virtualization backend) - `app.windows.toml` — Windows +Development configuration in `devenv/`: +- `app.dev.toml` — Development (connects to devenv docker-compose) + ## Web Design -Please refer to `./packages/web/AGENTS.md`. +Please refer to `./apps/web/AGENTS.md`. diff --git a/README.md b/README.md index 9b2eb5b4..2da98322 100644 --- a/README.md +++ b/README.md @@ -78,6 +78,7 @@ Memoh Bot can distinguish and remember requests from multiple humans and bots, w - 🔧 **MCP (Model Context Protocol)**: Full MCP support (HTTP / SSE / Stdio). Built-in tools for container operations, memory search, web search, scheduling, messaging, and more. Connect external MCP servers for extensibility. - 🧩 **Subagents**: Create specialized sub-agents per bot with independent context and skills, enabling multi-agent collaboration. - 🎭 **Skills & Identity**: Define bot personality via IDENTITY.md, SOUL.md, and modular skill files that bots can enable/disable at runtime. +- 🌐 **Browser**: Each bot can have its own headless Chromium browser (via Playwright). Navigate pages, click elements, fill forms, take screenshots (with annotated element labels), read accessibility trees, manage tabs, and more — enabling real web automation and AI-driven browsing. - 🔍 **Web Search**: 12 built-in search providers — Brave, Bing, Google, Tavily, DuckDuckGo, SearXNG, Serper, Sogou, Jina, Exa, Bocha, and Yandex — for web search and URL content fetching. - ⏰ **Scheduled Tasks**: Cron-based scheduling with max-call limits. Bots can autonomously run commands or tools at specified intervals. - 💓 **Heartbeat**: Periodic autonomous tasks — bots can perform routine operations (e.g., check-ins, summaries, monitoring) at configurable intervals with execution logging. @@ -118,6 +119,7 @@ Memoh Bot can distinguish and remember requests from multiple humans and bots, w |-------|-------| | Backend | Go, Echo, sqlc, Uber FX, pgx/v5, containerd v2 | | Agent Gateway | Bun, Elysia | +| Browser Gateway | Bun, Elysia, Playwright (Chromium) | | Frontend | Vue 3, Vite, Pinia, Tailwind CSS, Reka UI | | Storage | PostgreSQL, Qdrant | | Infra | Docker, containerd, CNI | @@ -137,12 +139,12 @@ Memoh Bot can distinguish and remember requests from multiple humans and bots, w │ Auth · Bots · Channels · Memory · Containers · MCP │ └──────────────────────┬───────────────────────────────────┘ │ - ┌───────────┼───────────┐ - ▼ ▼ ▼ - ┌──────────┐ ┌─────────┐ ┌──────────────────┐ - │ PostgreSQL│ │ Qdrant │ │ Agent Gateway │ - │ │ │ (Vector)│ │ (Bun/Elysia :8081)│ - └──────────┘ └─────────┘ └────────┬──────────┘ + ┌───────────┼───────────┬───────────┐ + ▼ ▼ ▼ ▼ + ┌──────────┐ ┌─────────┐ ┌──────────────────┐ ┌───────────────────┐ + │ PostgreSQL│ │ Qdrant │ │ Agent Gateway │ │ Browser Gateway │ + │ │ │ (Vector)│ │ (Bun/Elysia :8081)│ │ (Playwright :8083) │ + └──────────┘ └─────────┘ └────────┬──────────┘ └───────────────────┘ │ ┌───────┼───────┐ ▼ ▼ ▼ diff --git a/README_CN.md b/README_CN.md index b601dddc..48d1c264 100644 --- a/README_CN.md +++ b/README_CN.md @@ -78,6 +78,7 @@ Memoh Bot 能区分并记忆多人与多 bot 的请求,在任意群聊中无 - 🔧 **MCP(模型上下文协议)**:完整 MCP 支持(HTTP / SSE / Stdio)。内置容器操作、记忆搜索、网络搜索、定时任务、消息发送等工具,可连接外部 MCP 服务器扩展。 - 🧩 **子代理**:为每个 bot 创建专用子代理,拥有独立上下文与技能,实现多代理协作。 - 🎭 **技能与身份**:通过 IDENTITY.md、SOUL.md 定义 bot 人格,模块化技能文件可在运行时启用/禁用。 +- 🌐 **浏览器**:每个 Bot 可拥有独立的无头 Chromium 浏览器(基于 Playwright)。支持页面导航、点击、填写表单、截图(带编号标注的交互元素)、读取无障碍树、多标签页管理等,实现真正的网页自动化与 AI 驱动浏览。 - 🔍 **网络搜索**:内置 12 种搜索提供商 —— Brave、Bing、Google、Tavily、DuckDuckGo、SearXNG、Serper、搜狗、Jina、Exa、Bocha、Yandex,支持网页搜索与 URL 内容抓取。 - ⏰ **定时任务**:基于 Cron 的任务调度,支持最大调用次数限制。Bot 可自主在指定时间执行命令或工具。 - 💓 **心跳**:周期性自主任务,Bot 可按配置间隔执行例行操作(如签到、汇总、监控),并记录执行日志。 @@ -118,6 +119,7 @@ Memoh Bot 能区分并记忆多人与多 bot 的请求,在任意群聊中无 |------|------| | 后端 | Go, Echo, sqlc, Uber FX, pgx/v5, containerd v2 | | Agent 网关 | Bun, Elysia | +| 浏览器网关 | Bun, Elysia, Playwright (Chromium) | | 前端 | Vue 3, Vite, Pinia, Tailwind CSS, Reka UI | | 存储 | PostgreSQL, Qdrant | | 基础设施 | Docker, containerd, CNI | @@ -137,12 +139,12 @@ Memoh Bot 能区分并记忆多人与多 bot 的请求,在任意群聊中无 │ Auth · Bots · Channels · Memory · Containers · MCP │ └──────────────────────┬───────────────────────────────────┘ │ - ┌───────────┼───────────┐ - ▼ ▼ ▼ - ┌──────────┐ ┌─────────┐ ┌──────────────────┐ - │ PostgreSQL│ │ Qdrant │ │ Agent Gateway │ - │ │ │ (向量库) │ │ (Bun/Elysia :8081)│ - └──────────┘ └─────────┘ └────────┬──────────┘ + ┌───────────┼───────────┬───────────┐ + ▼ ▼ ▼ ▼ + ┌──────────┐ ┌─────────┐ ┌──────────────────┐ ┌───────────────────┐ + │ PostgreSQL│ │ Qdrant │ │ Agent Gateway │ │ Browser Gateway │ + │ │ │ (向量库) │ │ (Bun/Elysia :8081)│ │ (Playwright :8083) │ + └──────────┘ └─────────┘ └────────┬──────────┘ └───────────────────┘ │ ┌───────┼───────┐ ▼ ▼ ▼