database/CLAUDE.md
2025-12-27 16:21:09 +08:00

680 lines
22 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
MCP Database Server is a WebSocket/SSE-based PostgreSQL tooling service that exposes database operations through the Model Context Protocol (MCP). It allows AI clients to interact with multiple PostgreSQL databases through a unified, authenticated interface.
### Background and Goals
**Problem Statement**: The original MCP implementation used STDIO transport, requiring each AI client to install and maintain a local copy of the codebase. This caused maintenance overhead and made it difficult to share database access across multiple clients or hosts.
**Solution**: A long-running server that:
- Runs as a daemon/container accessible from multiple AI clients
- Supports remote transports (WebSocket and SSE)
- Provides centralized authentication and audit logging
- Enables multi-database/multi-schema access through a single endpoint
- Eliminates per-client installation requirements
**Key Design Decisions** (from v1.0.0):
1. **Transport Layer**: MCP SDK lacks server-side WebSocket support, so we implemented custom `WebSocketServerTransport` and `SSEServerTransport` classes
2. **Multi-Schema Access**: Single configuration supports multiple PostgreSQL databases with different schemas accessible via `environment` parameter
3. **Authentication**: Token-based (Bearer) authentication by default; mTLS support reserved for future
4. **Concurrency Model**: Per-client session isolation with independent connection pools
5. **Code Separation**: Complete separation from original STDIO-based codebase; this is a standalone server implementation
## Build and Development Commands
### Build
```bash
npm run build
```
Compiles TypeScript to JavaScript in the `dist/` directory.
### Start Server
```bash
# Production
npm start
# Development (with hot reload)
npm run dev
```
### Generate Authentication Token
```bash
node scripts/generate-token.js
```
Generates a secure 64-character hex token for Bearer authentication.
### Test Database Connection
```bash
npx tsx scripts/test-connection.ts
```
**IMPORTANT**: After making any code changes, always update `changelog.json` with version number, date, and description of changes. See "Version History and Roadmap" section for details.
## Architecture
### Core Layers
The codebase is organized into distinct layers:
1. **server.ts** - Main entry point that:
- Loads and validates configuration
- Initializes the UnifiedServerManager (handles both WebSocket and SSE)
- Creates PostgresMcp instance for database operations
- Manages session lifecycle and graceful shutdown
2. **transport/** - Multi-transport support:
- `unified-server.ts`: Single HTTP server handling both WebSocket and SSE transports
- `websocket-server-transport.ts`: WebSocket client transport implementation
- `sse-server-transport.ts`: Server-Sent Events client transport
- Both transports share the same authentication and session management
3. **core/** - Database abstraction layer (PostgresMcp):
- `connection-manager.ts`: Pool management for multiple database environments
- `query-runner.ts`: SQL query execution with schema path handling
- `transaction-manager.ts`: Transaction lifecycle (BEGIN/COMMIT/ROLLBACK)
- `metadata-browser.ts`: Schema introspection (tables, views, functions, etc.)
- `bulk-helpers.ts`: Batch insert operations
- `diagnostics.ts`: Query analysis and performance diagnostics
4. **tools/** - MCP tool registration:
- Each file (`metadata.ts`, `query.ts`, `data.ts`, `diagnostics.ts`) registers a group of MCP tools
- Tools use zod schemas for input validation
- Tools delegate to PostgresMcp core methods
5. **session/** - Session management:
- Per-client session tracking with unique session IDs
- Transaction-to-session binding (transactions are bound to the session's client)
- Query concurrency limits per session
- Automatic stale session cleanup
6. **config/** - Configuration system:
- Supports JSON configuration files with environment variable resolution (`ENV:VAR_NAME` syntax)
- Three-tier override: config file → environment variables → CLI arguments
- Validation using zod schemas
- Multiple database environments per server
7. **auth/** - Authentication:
- Token-based authentication (Bearer tokens in WebSocket/SSE handshake)
- Verification occurs at connection time (both WebSocket upgrade and SSE endpoint)
8. **audit/** - Audit logging:
- JSON Lines format for structured logging
- SQL parameter redaction for security
- Configurable output (stdout or file)
9. **health/** - Health monitoring:
- `/health` endpoint provides server status and per-environment connection status
- Includes active connection counts and pool statistics
10. **changelog/** - Version tracking:
- `/changelog` endpoint exposes version history without authentication
- Version information automatically synced from `changelog.json`
- Used for tracking system updates and changes
### Key Design Patterns
**Environment Isolation**: Each configured database "environment" (e.g., "drworks", "ipworkstation") has:
- Isolated connection pool
- Independent schema search paths
- Separate permission modes (readonly/readwrite/ddl)
- Per-environment query timeouts
**Session-Transaction Binding**:
- When `pg_begin_transaction` is called, a dedicated database client is bound to that session
- All subsequent queries in that session use the same client until commit/rollback
- Sessions are automatically cleaned up on disconnect or timeout
- This prevents transaction leaks across different AI clients
**Schema Path Resolution**:
- Tools accept optional `schema` parameter
- Resolution order: tool parameter → environment defaultSchema → environment searchPath
- Search path is set per-query using PostgreSQL's `SET search_path`
**Unified Transport Architecture**:
- Single HTTP server handles both WebSocket (upgrade requests) and SSE (GET /sse)
- Transport-agnostic MCP server implementation
- Both transports use the same authentication, session management, and tool registration
## Configuration
### Configuration File Structure
Configuration uses `config/database.json` (see `config/database.example.json` for template).
```json
{
"server": {
"listen": { "host": "0.0.0.0", "port": 7700 },
"auth": { "type": "token", "token": "ENV:MCP_AUTH_TOKEN" },
"allowUnauthenticatedRemote": false,
"maxConcurrentClients": 50,
"logLevel": "info"
},
"environments": {
"drworks": {
"type": "postgres",
"connection": {
"host": "localhost",
"port": 5432,
"database": "shcis_drworks_cpoe_pg",
"user": "postgres",
"password": "ENV:MCP_DRWORKS_PASSWORD",
"ssl": { "require": true }
},
"defaultSchema": "dbo",
"searchPath": ["dbo", "api", "nurse"],
"pool": { "max": 10, "idleTimeoutMs": 30000 },
"statementTimeoutMs": 60000,
"slowQueryMs": 2000,
"mode": "readwrite"
}
},
"audit": {
"enabled": true,
"output": "stdout",
"format": "json",
"redactParams": true,
"maxSqlLength": 200
}
}
```
### Configuration Fields
**server** - Global server settings:
- `listen.host/port`: Listen address (default: 0.0.0.0:7700)
- `auth.type`: Authentication type (`token` | `mtls` | `none`)
- `auth.token`: Bearer token value (supports `ENV:` prefix)
- `allowUnauthenticatedRemote`: Allow listening on non-localhost without auth (default: false, use with caution)
- `maxConcurrentClients`: Max WebSocket connections (default: 50)
- `logLevel`: Log level (`debug` | `info` | `warn` | `error`)
**environments** - Database connection configurations:
- Each environment is an isolated connection pool with unique name
- `type`: Database type (currently only `postgres`)
- `connection`: Standard PostgreSQL connection parameters
- `defaultSchema`: Default schema when not specified in tool calls
- `searchPath`: Array of schemas for PostgreSQL search_path
- `pool.max`: Max connections in pool (default: 10)
- `pool.idleTimeoutMs`: Idle connection timeout (default: 30000)
- `statementTimeoutMs`: Query timeout (default: 60000)
- `slowQueryMs`: Slow query threshold for warnings (default: 2000)
- `mode`: Permission mode (`readonly` | `readwrite` | `ddl`)
**audit** - Audit logging configuration:
- `enabled`: Enable audit logging (default: true)
- `output`: Output destination (`stdout` or file path)
- `format`: Log format (`json` recommended)
- `redactParams`: Redact SQL parameters (default: true)
- `maxSqlLength`: Max SQL preview length (default: 200)
### Environment Variable Resolution
Environment variables can be referenced using `ENV:VAR_NAME` syntax:
```json
{
"password": "ENV:MCP_DRWORKS_PASSWORD"
}
```
**Naming Convention**: `MCP_<ENVIRONMENT>_<FIELD>`
- Environment names are uppercased
- Non-alphanumeric chars become underscores
- Example: `drworks``MCP_DRWORKS_PASSWORD`
- Nested fields use double underscore: `ssl.ca``MCP_DRWORKS_SSL__CA`
Priority order (highest to lowest):
1. CLI arguments (`--auth-token`, `--listen`, `--log-level`)
2. Environment variables (`MCP_AUTH_TOKEN`, `MCP_LISTEN`, `MCP_LOG_LEVEL`)
3. Configuration file
### Multi-Schema Access Examples
**Scenario 1: Use default schema**
```typescript
// Uses environment's defaultSchema ("dbo")
await client.callTool('pg_list_tables', {
environment: 'drworks'
});
```
**Scenario 2: Switch to specific schema**
```typescript
// Temporarily use "api" schema
await client.callTool('pg_list_tables', {
environment: 'drworks',
schema: 'api'
});
```
**Scenario 3: Custom search path**
```typescript
// Query with custom search path priority
await client.callTool('pg_query', {
environment: 'drworks',
searchPath: ['nurse', 'dbo', 'api'],
query: 'SELECT * FROM patient_info' // Searches nurse → dbo → api
});
```
## Key Implementation Notes
### Security and Authentication
**Token Authentication** (default in v1.0.0):
- Uses Bearer tokens in HTTP Authorization header during WebSocket/SSE handshake
- Token format: 64-character hex string (256-bit random, generated via `crypto.randomBytes(32)`)
- Token verification happens at connection time via `verifyClient` hook
- Failed authentication returns HTTP 401 or WebSocket close code 1008
- Token can be configured via: CLI args → env vars → config file
**Token Transmission**:
```http
GET / HTTP/1.1
Upgrade: websocket
Authorization: Bearer <64-char-token>
```
**Security Best Practices**:
- Always use `wss://` (WebSocket over TLS) in production
- Never log tokens in audit logs (only first 8 chars as clientId)
- Rotate tokens periodically
- Use environment variables for token storage, not config files
- Enable SSL for database connections (`ssl.require: true`)
- Use `mode: "readonly"` for read-only access scenarios
**mTLS Support** (reserved for future):
- Mutual TLS authentication via client certificates
- Configured via `auth.type: "mtls"` with CA/cert/key paths
### Audit Logging
**Format**: JSON Lines (one JSON object per line)
```json
{
"timestamp": "2025-12-23T10:30:45.123Z",
"level": "audit",
"sessionId": "550e8400-e29b-41d4-a716-446655440000",
"clientId": "abc12345",
"environment": "drworks",
"tool": "pg_query",
"sqlHash": "a3b2c1d4",
"sqlPreview": "SELECT * FROM dbo.users WHERE id = $1 LIMIT 100",
"params": "[REDACTED]",
"durationMs": 234,
"rowCount": 15,
"status": "success"
}
```
**Privacy Protection**:
- SQL parameters are **never logged** (always "[REDACTED]")
- SQL preview is truncated to `maxSqlLength` (default: 200 chars)
- String literals in SQL are replaced with `[STRING]`, numbers with `[NUMBER]`
- Error messages are sanitized to remove potential data leaks
**Slow Query Warnings**:
- Queries exceeding `slowQueryMs` (default: 2000ms) trigger warning logs
- Includes `sqlHash` for correlation with audit logs
- Used for performance monitoring without exposing full queries
### Concurrency and Session Management
**Architecture**:
```
WebSocket Connection → Session (UUID) → Query Queue → Connection Pool
Transaction Client (if active)
```
**Concurrency Limits**:
1. **Global**: `maxConcurrentClients` (default: 50) - max WebSocket connections
2. **Per-Session**: `maxQueriesPerSession` (default: 5) - concurrent queries per client
3. **Per-Environment**: `pool.max` (default: 10) - database connections per environment
**Session Isolation**:
- Each WebSocket connection gets a unique `sessionId` (UUID v4)
- Sessions track `activeQueries` count and enforce per-session limits
- Sessions automatically timeout after `sessionTimeout` (default: 1 hour)
- Stale sessions are cleaned up every 60 seconds
**Transaction Binding**:
- `pg_begin_transaction` acquires a dedicated client from the pool
- This client is stored in `session.transactionClient` and bound to the session
- All subsequent queries in that environment use this client (not the pool)
- `pg_commit_transaction` or `pg_rollback_transaction` releases the client
- If client disconnects during transaction, automatic ROLLBACK occurs
- This prevents transaction state leaks across different AI clients
**Implementation Detail** (src/session/session-manager.ts:80-120):
```typescript
// Transaction begins
session.transactionClient = await pool.connect();
session.transactionEnv = environmentName;
// Subsequent queries route to transaction client
if (session.transactionClient && session.transactionEnv === env) {
return session.transactionClient.query(sql);
}
// On disconnect or timeout
if (session.transactionClient) {
await session.transactionClient.query('ROLLBACK');
session.transactionClient.release();
}
```
**Transaction Safety**:
- The TransactionManager stores transaction clients in a WeakMap keyed by session ID
- On disconnect, the SessionManager automatically rolls back active transactions
- Never use pool.query() for operations within a transaction; always use the session-bound client
**Connection Pool Lifecycle**:
- Pools are lazily created on first use (getPool method)
- Each pool has configurable max connections, idle timeout, and statement timeout
- Graceful shutdown closes all pools via `PostgresConnectionManager.closeAll()`
**MCP Tool Registration**:
- Tools are registered in `tools/index.ts` by calling registration functions from each tool category
- Each tool must have a unique name (prefixed with `pg_`)
- Tool schemas use zod for validation; the MCP SDK handles schema conversion
**Error Handling**:
- Database errors are caught and returned as MCP error responses
- The server never crashes on query errors; only fatal startup errors exit the process
- Audit logger sanitizes SQL and redacts parameters before logging
## Client Configuration
### MCP Client Setup
AI clients (Claude Code, Cursor, etc.) connect via MCP client configuration:
```json
{
"mcpServers": {
"database": {
"transport": "websocket",
"endpoint": "ws://localhost:7700",
"headers": {
"Authorization": "Bearer your-token-here"
}
}
}
}
```
**For SSE transport** (added in v1.0.0.1):
```json
{
"mcpServers": {
"database": {
"url": "http://localhost:7700/sse",
"headers": {
"Authorization": "Bearer your-token-here"
}
}
}
}
```
### Available MCP Tools
The server exposes 30+ PostgreSQL tools grouped by category:
**Metadata Tools**:
- `pg_list_environments` - List configured environments
- `pg_list_schemas` - List schemas in environment
- `pg_list_tables` - List tables in schema
- `pg_describe_table` - Get table structure (columns, types, constraints)
- `pg_list_views` - List views
- `pg_list_functions` - List functions
- `pg_list_indexes` - List indexes
- `pg_list_constraints` - List constraints
- `pg_list_triggers` - List triggers
**Query Tools**:
- `pg_query` - Execute read-only SELECT query
- `pg_explain` - Get query execution plan (EXPLAIN)
**Data Manipulation Tools**:
- `pg_insert` - Insert single row
- `pg_update` - Update rows
- `pg_delete` - Delete rows
- `pg_upsert` - Insert or update (ON CONFLICT)
- `pg_bulk_insert` - Batch insert multiple rows
**Transaction Tools**:
- `pg_begin_transaction` - Start transaction
- `pg_commit_transaction` - Commit transaction
- `pg_rollback_transaction` - Rollback transaction
**Diagnostic Tools**:
- `pg_analyze_query` - Analyze query performance
- `pg_check_connection` - Verify database connectivity
All tools require `environment` parameter to specify which database to use.
## Version History and Roadmap
### Changelog Maintenance (IMPORTANT)
**Every code change MUST be documented in the changelog**. This is a critical project requirement.
#### How to Update Changelog
When making any code changes:
1. **Update `changelog.json`** in the project root:
- Increment the version number following semantic versioning
- Use format: `major.minor.patch` or `major.minor.patch-buildnumber` (e.g., `1.0.1.03`)
- Add/update the version entry with:
- `version`: New version number
- `date`: Current date (YYYY-MM-DD format)
- `description`: Brief summary of changes (Chinese or English)
- `changes`: Array of specific changes made
2. **Version number is auto-synced**:
- The server automatically reads `currentVersion` from `changelog.json`
- No need to manually update `package.json` or `server.ts`
- The `/changelog` endpoint exposes full version history
3. **Example changelog entry**:
```json
{
"version": "1.0.1.04",
"date": "2024-12-25",
"description": "添加新功能",
"changes": [
"新增 XXX 功能",
"修复 YYY bug",
"优化 ZZZ 性能"
]
}
```
4. **View changelog**:
```bash
# Via HTTP endpoint (no authentication required)
curl http://localhost:7700/changelog
# Or check the file directly
cat changelog.json
```
**Remember**: Documentation is as important as code. Always update the changelog before committing!
---
### Version History
### v1.0.0 (Initial Release)
- WebSocket transport with custom `WebSocketServerTransport` implementation
- Token-based authentication
- Multi-environment configuration with per-environment connection pools
- Multi-schema access via `defaultSchema` and `searchPath`
- Session-based transaction management
- Audit logging with SQL parameter redaction
- Health check endpoint
- Docker support with graceful shutdown
- 30+ PostgreSQL MCP tools
### v1.0.1 (2024-12-21)
- **Added SSE transport** to support clients without WebSocket (e.g., cursor-browser-extension)
- Unified server now handles both WebSocket and SSE on same port
- SSE endpoint: `GET /sse` for stream, `POST /messages` for client messages
- Backward compatible with v1.0.0 WebSocket clients
### v1.0.1.01 (2024-12-22)
- Fixed connection pool leak issue
- Fixed SSE disconnect/reconnect logic
- Improved error handling
### v1.0.1.02 (2024-12-23)
- Fixed `ssl.require: false` configuration not taking effect
- Improved SSL configuration validation logic
- Updated documentation for SSL configuration
### v1.0.1.03 (2024-12-24)
- Added `allowUnauthenticatedRemote` configuration option
- Allow explicit enabling of unauthenticated remote access in trusted networks
- Improved security validation error messages
- New `/changelog` endpoint to view version update history (no authentication required)
### Future Roadmap
- Multi-database support (SQL Server, MySQL adapters)
- mTLS authentication implementation
- RBAC (role-based access control) for fine-grained permissions
- Rate limiting and quota management per client
- Configuration hot reload
- Metrics and Prometheus export
## Testing Notes
The project currently has placeholder tests. When adding tests:
- Create tests under `__tests__/` directory
- Use the connection manager's `withClient` method for database interaction in tests
- Test files should use `.test.ts` extension
- Consider testing transaction rollback behavior and session cleanup
## Deployment and Operations
### Docker Deployment
**Build Image**:
```bash
docker build -t mcp-database-server:1.0.1 .
```
**Run Container**:
```bash
docker run -d \
--name mcp-database-server \
-p 7700:7700 \
-v $(pwd)/config/database.json:/app/config/database.json:ro \
-e MCP_AUTH_TOKEN=your-token \
-e MCP_DRWORKS_PASSWORD=your-password \
mcp-database-server:1.0.1
```
**Docker Compose**:
```bash
docker compose up -d
```
### Health Check
```bash
curl http://localhost:7700/health
```
Response format:
```json
{
"status": "ok",
"uptime": 3600,
"version": "1.0.0",
"clients": 5,
"environments": [
{
"name": "drworks",
"status": "connected",
"poolSize": 10,
"activeConnections": 2
}
],
"timestamp": "2025-12-23T10:30:00.000Z"
}
```
Status values: `ok` (all connected) | `degraded` (some disconnected) | `error` (critical failure)
### Changelog Endpoint
View version history and system updates:
```bash
curl http://localhost:7700/changelog
```
Response format:
```json
{
"currentVersion": "1.0.1.03",
"changelog": [
{
"version": "1.0.1.03",
"date": "2024-12-24",
"description": "增强安全配置灵活性和更新日志功能",
"changes": [
"添加 allowUnauthenticatedRemote 配置选项",
"允许在受信任网络中显式启用无认证远程访问",
"改进安全验证错误提示信息",
"新增 /changelog 端点查看版本更新历史"
]
}
]
}
```
**Note**: This endpoint does not require authentication and can be accessed publicly.
### Graceful Shutdown
The server handles SIGTERM and SIGINT signals:
1. Stops accepting new connections
2. Rolls back active transactions
3. Closes all sessions
4. Closes database connection pools
5. Exits cleanly
```bash
# Docker
docker stop mcp-database-server
# Direct process
kill -TERM $(pgrep -f "node dist/src/server.js")
```
### Command Line Options
```bash
node dist/src/server.js [options]
Options:
--config <path> Configuration file path (default: ./config/database.json)
--listen <host:port> Listen address (overrides config)
--auth-token <token> Auth token (overrides config)
--log-level <level> Log level (overrides config: debug/info/warn/error)
```
Environment variables override configuration file:
- `MCP_CONFIG` - Configuration file path
- `MCP_LISTEN` - Listen address (host:port)
- `MCP_AUTH_TOKEN` - Authentication token
- `MCP_LOG_LEVEL` - Log level
- `MCP_<ENV>_PASSWORD` - Database password for environment