2863 lines
77 KiB
Markdown
2863 lines
77 KiB
Markdown
# HTML转PDF服务 - 正式版设计方案
|
||
|
||
> **文档版本**:v2.0
|
||
> **创建日期**:2024-12-10
|
||
> **基于**:MVP 版本实践验证
|
||
> **项目类型**:生产级服务
|
||
|
||
---
|
||
|
||
## 📌 一、方案演进
|
||
|
||
### 1.1 MVP 版本回顾
|
||
|
||
**已验证的核心能力:**
|
||
- ✅ PuppeteerSharp + Chromium 在 Linux/Docker 环境稳定运行
|
||
- ✅ 浏览器池化机制有效,并发性能优秀
|
||
- ✅ HTML/URL 转 PDF 功能完整
|
||
- ✅ HTML/URL 转图片功能完整(PNG/JPEG/WebP)
|
||
- ✅ 完美支持现代 SPA 框架(React/Vue/Angular)
|
||
- ✅ 回调机制工作正常
|
||
- ✅ 本地存储机制可靠
|
||
|
||
**MVP 版本的局限性:**
|
||
- ❌ 同步接口,客户端需要等待转换完成(阻塞)
|
||
- ❌ 长时间转换可能导致 HTTP 连接超时
|
||
- ❌ 无法查询任务进度
|
||
- ❌ 没有任务管理能力
|
||
- ❌ 重启服务后任务丢失
|
||
|
||
### 1.2 正式版本目标
|
||
|
||
**核心改进:异步任务处理模式**
|
||
|
||
```
|
||
MVP 版本(同步模式):
|
||
客户端 → 发送请求 → 等待转换 → 接收 PDF → 完成
|
||
⏱️ 阻塞等待 5-30秒
|
||
|
||
正式版本(异步模式):
|
||
客户端 → 发送请求 → 立即返回任务ID → 完成(200ms 内)
|
||
↓
|
||
后台队列处理
|
||
↓
|
||
查询任务状态 / 回调通知
|
||
```
|
||
|
||
**正式版本的优势:**
|
||
- ✅ 客户端无需等待,立即返回
|
||
- ✅ 支持长时间转换任务(不受 HTTP 超时限制)
|
||
- ✅ 可以查询任务进度和状态
|
||
- ✅ 支持任务历史查询
|
||
- ✅ 支持任务取消
|
||
- ✅ 服务重启后任务可恢复(可选持久化)
|
||
- ✅ 更好的系统可观测性
|
||
|
||
---
|
||
|
||
## 🏗️ 二、系统架构设计
|
||
|
||
### 2.1 整体架构
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ 客户端层 │
|
||
│ Web应用 / 移动端 / 第三方系统 │
|
||
└────────────────────────┬─────────────────────────────────────┘
|
||
│
|
||
│ HTTP API
|
||
▼
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Web API 层(网关) │
|
||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ 任务提交接口 │ │ 任务查询接口 │ │ 任务管理接口 │ │
|
||
│ │ POST /tasks │ │ GET /tasks │ │ DELETE等 │ │
|
||
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
||
└─────────┼──────────────────┼──────────────────┼───────────────┘
|
||
│ │ │
|
||
▼ ▼ ▼
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ 任务编排层 │
|
||
│ ┌────────────────────────────────────────────────────┐ │
|
||
│ │ TaskOrchestrator(任务编排器) │ │
|
||
│ │ • 任务创建与验证 │ │
|
||
│ │ • 任务状态管理 │ │
|
||
│ │ • 结果收集与回调 │ │
|
||
│ └────────────────────┬───────────────────────────────┘ │
|
||
└───────────────────────┼───────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ 任务队列层(核心) │
|
||
│ ┌────────────────────────────────────────────────────┐ │
|
||
│ │ TaskQueue(任务队列) │ │
|
||
│ │ │ │
|
||
│ │ ┌──────────────┐ ┌──────────────┐ │ │
|
||
│ │ │ 待处理队列 │ → │ 处理中队列 │ │ │
|
||
│ │ │ Pending │ │ Processing │ │ │
|
||
│ │ └──────────────┘ └──────────────┘ │ │
|
||
│ │ ↓ ↓ │ │
|
||
│ │ ┌──────────────┐ ┌──────────────┐ │ │
|
||
│ │ │ 已完成队列 │ │ 失败队列 │ │ │
|
||
│ │ │ Completed │ │ Failed │ │ │
|
||
│ │ └──────────────┘ └──────────────┘ │ │
|
||
│ │ │ │
|
||
│ │ Channel<ConversionTask> + ConcurrentDictionary │ │
|
||
│ └────────────────────────────────────────────────────┘ │
|
||
└────────────────────────┬─────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ 后台工作服务层 │
|
||
│ ┌────────────────────────────────────────────────────┐ │
|
||
│ │ BackgroundWorkerService (BackgroundService) │ │
|
||
│ │ │ │
|
||
│ │ Worker #1 Worker #2 Worker #3 ... Worker #N │
|
||
│ │ ↓ ↓ ↓ ↓ │
|
||
│ │ [处理中] [处理中] [空闲] [处理中] │
|
||
│ │ │ │
|
||
│ │ • 从队列获取任务 │ │
|
||
│ │ • 调用转换服务 │ │
|
||
│ │ • 更新任务状态 │ │
|
||
│ │ • 触发回调 │ │
|
||
│ └────────────────────┬───────────────────────────────┘ │
|
||
└───────────────────────┼───────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ 转换服务层 │
|
||
│ ┌──────────────────┐ ┌──────────────────┐ │
|
||
│ │ PdfService │ │ ImageService │ │
|
||
│ │ • HTML to PDF │ │ • HTML to Image │ │
|
||
│ │ • URL to PDF │ │ • URL to Image │ │
|
||
│ └────────┬─────────┘ └────────┬─────────┘ │
|
||
└───────────┼────────────────────────┼─────────────────────────┘
|
||
│ │
|
||
└───────────┬────────────┘
|
||
▼
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ 资源池化层 │
|
||
│ BrowserPool(浏览器实例池) │
|
||
│ • 并发控制(SemaphoreSlim) │
|
||
│ • 实例复用(ConcurrentBag) │
|
||
│ • 预热机制 │
|
||
│ • 健康检查 │
|
||
└────────────────────────┬─────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Chromium 浏览器进程池 │
|
||
│ [Process #1] [Process #2] [Process #3] ... [Process #N] │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### 2.2 数据流设计
|
||
|
||
```
|
||
【提交任务】
|
||
客户端 → POST /api/tasks/pdf → 创建任务 → 返回 { taskId, status: "pending" }
|
||
↓
|
||
加入任务队列
|
||
↓
|
||
【后台处理】
|
||
BackgroundWorker 从队列获取任务
|
||
↓
|
||
更新状态为 "processing"
|
||
↓
|
||
获取浏览器实例(池化)
|
||
↓
|
||
执行 PDF/图片 转换
|
||
↓
|
||
保存文件(本地/OSS)
|
||
↓
|
||
更新状态为 "completed"
|
||
↓
|
||
发送回调通知(异步)
|
||
|
||
【获取结果】
|
||
方式1: 客户端轮询 → GET /api/tasks/{taskId} → 返回任务状态和结果
|
||
方式2: 回调通知 → POST {callbackUrl} → 推送任务结果
|
||
方式3: 下载文件 → GET /api/tasks/{taskId}/download → 返回 PDF/图片
|
||
```
|
||
|
||
---
|
||
|
||
## 📋 三、功能模块设计
|
||
|
||
### 3.1 任务管理模块
|
||
|
||
#### 3.1.1 任务状态机
|
||
|
||
```
|
||
Pending(待处理)
|
||
↓
|
||
开始处理
|
||
↓
|
||
Processing(处理中)
|
||
↓
|
||
完成?
|
||
↓─ Yes → Completed(已完成)
|
||
↓─ No → Failed(失败)
|
||
↓
|
||
超时?
|
||
↓─ Yes → Timeout(超时)
|
||
|
||
取消?
|
||
↓─ Yes → Cancelled(已取消)
|
||
```
|
||
|
||
#### 3.1.2 任务数据结构
|
||
|
||
```csharp
|
||
public class ConversionTask
|
||
{
|
||
public string TaskId { get; set; } // 任务唯一标识
|
||
public string Type { get; set; } // pdf / image
|
||
public string Source { get; set; } // html / url
|
||
public string SourceContent { get; set; } // HTML内容或URL
|
||
public TaskStatus Status { get; set; } // 任务状态
|
||
public DateTime CreatedAt { get; set; } // 创建时间
|
||
public DateTime? StartedAt { get; set; } // 开始处理时间
|
||
public DateTime? CompletedAt { get; set; } // 完成时间
|
||
public long Duration { get; set; } // 处理耗时(毫秒)
|
||
public int RetryCount { get; set; } // 重试次数
|
||
|
||
// 转换选项
|
||
public object Options { get; set; } // PDF或图片选项
|
||
|
||
// 结果信息
|
||
public long? FileSize { get; set; } // 文件大小
|
||
public string? FilePath { get; set; } // 本地文件路径
|
||
public string? DownloadUrl { get; set; } // 下载链接
|
||
public DateTime? ExpiresAt { get; set; } // 过期时间
|
||
|
||
// 回调配置
|
||
public string? CallbackUrl { get; set; } // 回调URL
|
||
public Dictionary<string, string>? CallbackHeaders { get; set; }
|
||
public bool IncludeFileData { get; set; } // 是否在回调中包含文件
|
||
|
||
// 错误信息
|
||
public string? ErrorMessage { get; set; } // 错误消息
|
||
public string? ErrorDetails { get; set; } // 错误详情
|
||
|
||
// 扩展字段
|
||
public string? UserId { get; set; } // 用户标识(用于多租户)
|
||
public Dictionary<string, string>? Metadata { get; set; } // 元数据
|
||
}
|
||
|
||
public enum TaskStatus
|
||
{
|
||
Pending = 0, // 待处理
|
||
Processing = 1, // 处理中
|
||
Completed = 2, // 已完成
|
||
Failed = 3, // 失败
|
||
Timeout = 4, // 超时
|
||
Cancelled = 5 // 已取消
|
||
}
|
||
```
|
||
|
||
#### 3.1.3 任务持久化策略
|
||
|
||
| 方案 | 优势 | 劣势 | 适用场景 |
|
||
|------|------|------|----------|
|
||
| **内存(Channel + ConcurrentDictionary)** | 性能最好,实现简单 | 重启丢失任务 | 单实例、任务不重要 |
|
||
| **Redis** | 支持集群、性能好 | 需要额外组件 | **推荐:多实例部署** |
|
||
| **SQLite** | 轻量级、文件存储 | 不支持集群 | 单实例、小规模 |
|
||
| **PostgreSQL/MySQL** | 功能完整、稳定 | 重量级 | 企业级、大规模 |
|
||
|
||
**正式版本推荐:Redis(优先)或 PostgreSQL(备选)**
|
||
|
||
---
|
||
|
||
## 🔌 四、接口设计
|
||
|
||
### 4.1 任务提交接口
|
||
|
||
#### 接口 1:提交 PDF 转换任务
|
||
|
||
**请求:**
|
||
```http
|
||
POST /api/tasks/pdf
|
||
Content-Type: application/json
|
||
|
||
{
|
||
"source": {
|
||
"type": "html", // html / url
|
||
"content": "<html>...</html>" // HTML内容或URL
|
||
},
|
||
"options": {
|
||
"format": "A4",
|
||
"landscape": false,
|
||
"printBackground": true,
|
||
"margin": {
|
||
"top": "10mm",
|
||
"right": "10mm",
|
||
"bottom": "10mm",
|
||
"left": "10mm"
|
||
}
|
||
},
|
||
"waitUntil": "networkidle2", // 仅URL时有效
|
||
"timeout": 60000, // 转换超时(毫秒)
|
||
"callback": {
|
||
"url": "https://your-api.com/webhook",
|
||
"headers": {
|
||
"X-API-Key": "your-key"
|
||
},
|
||
"includeFileData": false // 是否在回调中包含PDF Base64
|
||
},
|
||
"saveLocal": true, // 是否保存本地副本
|
||
"metadata": { // 自定义元数据(可选)
|
||
"userId": "user123",
|
||
"orderId": "order456"
|
||
}
|
||
}
|
||
```
|
||
|
||
**响应:**
|
||
```http
|
||
HTTP/1.1 202 Accepted
|
||
Content-Type: application/json
|
||
Location: /api/tasks/{taskId}
|
||
|
||
```
|
||
|
||
#### 接口 2:提交图片转换任务
|
||
|
||
**请求:**
|
||
```http
|
||
POST /api/tasks/image
|
||
Content-Type: application/json
|
||
|
||
{
|
||
"source": {
|
||
"type": "url",
|
||
"content": "https://www.example.com"
|
||
},
|
||
"options": {
|
||
"format": "png", // png / jpeg / webp
|
||
"quality": 90, // 仅 jpeg/webp
|
||
"width": 1920, // 视口宽度
|
||
"height": 1080, // 视口高度
|
||
"fullPage": true, // 全页截图
|
||
"omitBackground": false // 透明背景
|
||
},
|
||
"waitUntil": "networkidle2",
|
||
"delayAfterLoad": 2000, // 额外等待时间(ms)
|
||
"timeout": 60000,
|
||
"callback": {
|
||
"url": "https://your-api.com/webhook",
|
||
"includeFileData": false
|
||
},
|
||
"saveLocal": true
|
||
}
|
||
```
|
||
|
||
**响应:** 同接口 1,返回任务 ID
|
||
|
||
---
|
||
|
||
### 4.2 任务查询接口
|
||
|
||
#### 接口 3:查询任务详情
|
||
|
||
**请求:**
|
||
```http
|
||
GET /api/tasks/{taskId}
|
||
```
|
||
|
||
**响应(处理中):**
|
||
```http
|
||
HTTP/1.1 200 OK
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
**响应(已完成):**
|
||
```http
|
||
HTTP/1.1 200 OK
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
**响应(失败):**
|
||
```http
|
||
HTTP/1.1 200 OK
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
#### 接口 4:批量查询任务
|
||
|
||
**请求:**
|
||
```http
|
||
GET /api/tasks?status=completed&type=pdf&page=1&pageSize=20&userId=user123
|
||
```
|
||
|
||
**查询参数:**
|
||
- `status`: pending / processing / completed / failed / cancelled
|
||
- `type`: pdf / image
|
||
- `userId`: 用户标识(多租户场景)
|
||
- `startDate`: 开始日期
|
||
- `endDate`: 结束日期
|
||
- `page`: 页码(从1开始)
|
||
- `pageSize`: 每页数量(默认20,最大100)
|
||
|
||
**响应:**
|
||
```http
|
||
HTTP/1.1 200 OK
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
---
|
||
|
||
### 4.3 任务操作接口
|
||
|
||
#### 接口 5:下载任务结果
|
||
|
||
**请求:**
|
||
```http
|
||
GET /api/tasks/{taskId}/download
|
||
```
|
||
|
||
**响应(成功):**
|
||
```http
|
||
HTTP/1.1 200 OK
|
||
Content-Type: application/pdf
|
||
Content-Disposition: attachment; filename="document.pdf"
|
||
Content-Length: 102400
|
||
X-Task-Id: 550e8400-e29b-41d4-a716-446655440000
|
||
X-Created-At: 2024-12-10T10:30:00Z
|
||
X-Expires-At: 2024-12-11T10:30:00Z
|
||
|
||
[PDF/图片 二进制数据]
|
||
```
|
||
|
||
**响应(任务未完成):**
|
||
```http
|
||
HTTP/1.1 409 Conflict
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
#### 接口 6:取消任务
|
||
|
||
**请求:**
|
||
```http
|
||
DELETE /api/tasks/{taskId}
|
||
或
|
||
POST /api/tasks/{taskId}/cancel
|
||
```
|
||
|
||
**响应(成功):**
|
||
```http
|
||
HTTP/1.1 200 OK
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
**响应(无法取消):**
|
||
```http
|
||
HTTP/1.1 409 Conflict
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
#### 接口 7:重试失败任务
|
||
|
||
**请求:**
|
||
```http
|
||
POST /api/tasks/{taskId}/retry
|
||
```
|
||
|
||
**响应:**
|
||
```http
|
||
HTTP/1.1 202 Accepted
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
---
|
||
|
||
### 4.4 系统监控接口
|
||
|
||
#### 接口 8:健康检查(增强版)
|
||
|
||
**请求:**
|
||
```http
|
||
GET /health
|
||
```
|
||
|
||
**响应:**
|
||
```http
|
||
HTTP/1.1 200 OK
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
#### 接口 9:系统指标(Prometheus 格式)
|
||
|
||
**请求:**
|
||
```http
|
||
GET /metrics
|
||
```
|
||
|
||
**响应:**
|
||
```
|
||
# HELP conversion_tasks_total Total number of conversion tasks
|
||
# TYPE conversion_tasks_total counter
|
||
conversion_tasks_total{type="pdf",status="completed"} 1523
|
||
conversion_tasks_total{type="pdf",status="failed"} 23
|
||
conversion_tasks_total{type="image",status="completed"} 856
|
||
|
||
# HELP conversion_duration_seconds Conversion duration in seconds
|
||
# TYPE conversion_duration_seconds histogram
|
||
conversion_duration_seconds_bucket{type="pdf",le="1"} 120
|
||
conversion_duration_seconds_bucket{type="pdf",le="5"} 1200
|
||
conversion_duration_seconds_bucket{type="pdf",le="10"} 1480
|
||
|
||
# HELP browser_pool_instances Number of browser instances
|
||
# TYPE browser_pool_instances gauge
|
||
browser_pool_instances{state="total"} 8
|
||
browser_pool_instances{state="available"} 5
|
||
browser_pool_instances{state="in_use"} 3
|
||
|
||
# HELP task_queue_length Current task queue length
|
||
# TYPE task_queue_length gauge
|
||
task_queue_length{status="pending"} 12
|
||
task_queue_length{status="processing"} 5
|
||
```
|
||
|
||
---
|
||
|
||
## ⚙️ 五、核心组件设计
|
||
|
||
### 5.1 任务队列(TaskQueue)
|
||
|
||
```csharp
|
||
public interface ITaskQueue
|
||
{
|
||
// 任务入队
|
||
Task<string> EnqueueAsync(ConversionTask task);
|
||
|
||
// 任务出队(供 Worker 消费)
|
||
Task<ConversionTask?> DequeueAsync(CancellationToken cancellationToken);
|
||
|
||
// 更新任务状态
|
||
Task UpdateTaskAsync(string taskId, TaskStatus status,
|
||
Action<ConversionTask>? updateAction = null);
|
||
|
||
// 查询任务
|
||
Task<ConversionTask?> GetTaskAsync(string taskId);
|
||
|
||
// 批量查询
|
||
Task<PagedResult<ConversionTask>> QueryTasksAsync(TaskQueryOptions options);
|
||
|
||
// 取消任务
|
||
Task<bool> CancelTaskAsync(string taskId);
|
||
|
||
// 获取队列统计
|
||
TaskQueueStatistics GetStatistics();
|
||
}
|
||
```
|
||
|
||
**实现方案:**
|
||
|
||
1. **内存队列(MVP升级版)**
|
||
```csharp
|
||
- Channel<ConversionTask> 用于生产者-消费者模式
|
||
- ConcurrentDictionary<string, ConversionTask> 用于任务索引
|
||
- 优点:性能极高,实现简单
|
||
- 缺点:重启丢失任务
|
||
```
|
||
|
||
2. **Redis 队列(推荐)**
|
||
```csharp
|
||
- List 存储待处理任务(LPUSH/BRPOP)
|
||
- Hash 存储任务详情(HSET/HGET)
|
||
- SortedSet 存储按时间排序的任务
|
||
- 优点:支持集群、持久化、性能好
|
||
- 实现:StackExchange.Redis
|
||
```
|
||
|
||
3. **数据库队列**
|
||
```sql
|
||
CREATE TABLE conversion_tasks (
|
||
task_id VARCHAR(50) PRIMARY KEY,
|
||
type VARCHAR(10),
|
||
source_type VARCHAR(10),
|
||
source_content TEXT,
|
||
status INT,
|
||
created_at TIMESTAMP,
|
||
started_at TIMESTAMP,
|
||
completed_at TIMESTAMP,
|
||
...
|
||
);
|
||
CREATE INDEX idx_status ON conversion_tasks(status, created_at);
|
||
```
|
||
|
||
### 5.2 后台工作服务(BackgroundWorkerService)
|
||
|
||
```csharp
|
||
public class ConversionWorkerService : BackgroundService
|
||
{
|
||
private readonly ITaskQueue _taskQueue;
|
||
private readonly IPdfService _pdfService;
|
||
private readonly IImageService _imageService;
|
||
private readonly ILogger _logger;
|
||
private readonly int _workerCount;
|
||
|
||
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
|
||
{
|
||
// 启动多个 Worker 并发处理
|
||
var workers = new List<Task>();
|
||
|
||
for (int i = 0; i < _workerCount; i++)
|
||
{
|
||
workers.Add(ProcessTasksAsync(i, stoppingToken));
|
||
}
|
||
|
||
await Task.WhenAll(workers);
|
||
}
|
||
|
||
private async Task ProcessTasksAsync(int workerId, CancellationToken stoppingToken)
|
||
{
|
||
while (!stoppingToken.IsCancellationRequested)
|
||
{
|
||
try
|
||
{
|
||
// 从队列获取任务(阻塞等待)
|
||
var task = await _taskQueue.DequeueAsync(stoppingToken);
|
||
|
||
if (task != null)
|
||
{
|
||
await ProcessSingleTaskAsync(task, stoppingToken);
|
||
}
|
||
}
|
||
catch (Exception ex)
|
||
{
|
||
_logger.LogError(ex, "Worker {WorkerId} 处理任务时出错", workerId);
|
||
}
|
||
}
|
||
}
|
||
|
||
private async Task ProcessSingleTaskAsync(ConversionTask task, CancellationToken ct)
|
||
{
|
||
// 1. 更新状态为 Processing
|
||
// 2. 调用转换服务
|
||
// 3. 保存结果
|
||
// 4. 更新状态为 Completed/Failed
|
||
// 5. 发送回调
|
||
}
|
||
}
|
||
```
|
||
|
||
### 5.3 回调服务(增强版)
|
||
|
||
```csharp
|
||
public class EnhancedCallbackService
|
||
{
|
||
// 带重试机制的回调
|
||
public async Task SendCallbackWithRetryAsync(
|
||
string callbackUrl,
|
||
CallbackPayload payload,
|
||
int maxRetries = 3,
|
||
int retryDelayMs = 1000)
|
||
{
|
||
for (int i = 0; i < maxRetries; i++)
|
||
{
|
||
try
|
||
{
|
||
await SendCallbackAsync(callbackUrl, payload);
|
||
return; // 成功,退出
|
||
}
|
||
catch (Exception ex)
|
||
{
|
||
_logger.LogWarning("回调失败,重试 {Retry}/{Max}: {Url}",
|
||
i + 1, maxRetries, callbackUrl);
|
||
|
||
if (i < maxRetries - 1)
|
||
{
|
||
// 指数退避:1s, 2s, 4s, 8s...
|
||
var delay = retryDelayMs * (int)Math.Pow(2, i);
|
||
await Task.Delay(delay);
|
||
}
|
||
}
|
||
}
|
||
|
||
// 所有重试都失败
|
||
_logger.LogError("回调失败,已达最大重试次数: {Url}", callbackUrl);
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 📊 六、数据库设计(如使用)
|
||
|
||
### 6.1 任务表(conversion_tasks)
|
||
|
||
```sql
|
||
CREATE TABLE conversion_tasks (
|
||
task_id VARCHAR(50) PRIMARY KEY,
|
||
task_type VARCHAR(10) NOT NULL, -- pdf / image
|
||
source_type VARCHAR(10) NOT NULL, -- html / url
|
||
source_content TEXT NOT NULL,
|
||
|
||
-- 状态信息
|
||
status INT NOT NULL DEFAULT 0, -- 0=Pending, 1=Processing, 2=Completed, 3=Failed, 4=Timeout, 5=Cancelled
|
||
retry_count INT NOT NULL DEFAULT 0,
|
||
|
||
-- 时间信息
|
||
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||
started_at TIMESTAMP NULL,
|
||
completed_at TIMESTAMP NULL,
|
||
expires_at TIMESTAMP NULL,
|
||
duration_ms BIGINT NULL,
|
||
|
||
-- 转换选项(JSON)
|
||
options TEXT NULL,
|
||
|
||
-- 结果信息
|
||
file_size BIGINT NULL,
|
||
file_path VARCHAR(500) NULL,
|
||
download_url VARCHAR(500) NULL,
|
||
|
||
-- 回调配置(JSON)
|
||
callback_config TEXT NULL,
|
||
callback_attempts INT DEFAULT 0,
|
||
callback_success BOOLEAN DEFAULT FALSE,
|
||
|
||
-- 错误信息
|
||
error_code VARCHAR(50) NULL,
|
||
error_message TEXT NULL,
|
||
error_details TEXT NULL,
|
||
|
||
-- 扩展字段
|
||
user_id VARCHAR(50) NULL,
|
||
metadata TEXT NULL, -- JSON 格式的自定义元数据
|
||
|
||
-- 索引字段
|
||
INDEX idx_status_created (status, created_at),
|
||
INDEX idx_user_id (user_id, created_at),
|
||
INDEX idx_expires_at (expires_at)
|
||
);
|
||
```
|
||
|
||
### 6.2 任务日志表(conversion_logs)
|
||
|
||
```sql
|
||
CREATE TABLE conversion_logs (
|
||
id BIGINT AUTO_INCREMENT PRIMARY KEY,
|
||
task_id VARCHAR(50) NOT NULL,
|
||
log_level VARCHAR(10) NOT NULL, -- Debug / Info / Warning / Error
|
||
message TEXT NOT NULL,
|
||
details TEXT NULL,
|
||
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||
|
||
INDEX idx_task_id (task_id, created_at)
|
||
);
|
||
```
|
||
|
||
### 6.3 系统指标表(system_metrics)
|
||
|
||
```sql
|
||
CREATE TABLE system_metrics (
|
||
id BIGINT AUTO_INCREMENT PRIMARY KEY,
|
||
metric_name VARCHAR(100) NOT NULL,
|
||
metric_value DECIMAL(18,2) NOT NULL,
|
||
metric_type VARCHAR(20) NOT NULL, -- counter / gauge / histogram
|
||
tags TEXT NULL, -- JSON 格式的标签
|
||
recorded_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||
|
||
INDEX idx_name_time (metric_name, recorded_at)
|
||
);
|
||
```
|
||
|
||
---
|
||
|
||
## 🔄 七、队列处理策略
|
||
|
||
### 7.1 队列优先级
|
||
|
||
```csharp
|
||
public enum TaskPriority
|
||
{
|
||
Low = 0, // 普通任务
|
||
Normal = 1, // 默认优先级
|
||
High = 2, // VIP 用户
|
||
Urgent = 3 // 紧急任务
|
||
}
|
||
```
|
||
|
||
**实现方式:**
|
||
- 使用多个 Channel,每个优先级一个
|
||
- Worker 优先从高优先级队列获取任务
|
||
|
||
### 7.2 队列容量控制
|
||
|
||
```csharp
|
||
public class QueueCapacityOptions
|
||
{
|
||
public int MaxQueueSize { get; set; } = 1000; // 最大队列长度
|
||
public int MaxConcurrentWorkers { get; set; } = 5; // 最大并发 Worker 数
|
||
public int MaxTasksPerUser { get; set; } = 10; // 每用户最大任务数
|
||
public int RejectionThreshold { get; set; } = 900; // 拒绝新任务的阈值
|
||
}
|
||
```
|
||
|
||
**队列满时的策略:**
|
||
1. 返回 `503 Service Unavailable`
|
||
2. 记录告警日志
|
||
3. 建议客户端稍后重试
|
||
|
||
### 7.3 超时与重试
|
||
|
||
```csharp
|
||
public class TaskRetryPolicy
|
||
{
|
||
public int MaxRetries { get; set; } = 3; // 最大重试次数
|
||
public int InitialRetryDelay { get; set; } = 1000; // 初始重试延迟(ms)
|
||
public int MaxRetryDelay { get; set; } = 60000; // 最大重试延迟(ms)
|
||
public double BackoffMultiplier { get; set; } = 2.0; // 退避倍数
|
||
|
||
public TimeSpan[] RetryableErrors { get; set; } = new[]
|
||
{
|
||
// 可重试的错误类型
|
||
"NavigationTimeout",
|
||
"NetworkError",
|
||
"BrowserDisconnected"
|
||
};
|
||
}
|
||
```
|
||
|
||
**重试策略:**
|
||
```
|
||
第1次失败 → 等待 1秒 → 重试
|
||
第2次失败 → 等待 2秒 → 重试
|
||
第3次失败 → 等待 4秒 → 重试
|
||
第4次失败 → 标记为失败,发送回调
|
||
```
|
||
|
||
---
|
||
|
||
## 📦 八、存储方案设计
|
||
|
||
### 8.1 文件存储策略
|
||
|
||
| 存储方式 | 优势 | 劣势 | 适用场景 |
|
||
|----------|------|------|----------|
|
||
| **本地磁盘** | 简单、快速 | 不支持集群、容量有限 | 单实例、小规模 |
|
||
| **NFS** | 支持集群 | 性能一般、单点故障 | 传统架构 |
|
||
| **对象存储(OSS/S3/MinIO)** | 高可用、无限容量、支持 CDN | 需要额外配置 | **推荐:生产环境** |
|
||
|
||
**正式版本推荐:对象存储(OSS/S3)**
|
||
|
||
### 8.2 文件生命周期
|
||
|
||
```
|
||
文件创建
|
||
↓
|
||
热存储(快速访问)
|
||
↓ 24小时后
|
||
温存储(普通访问)
|
||
↓ 7天后
|
||
冷存储(归档)
|
||
↓ 30天后
|
||
自动删除
|
||
```
|
||
|
||
### 8.3 存储目录结构
|
||
|
||
**本地存储:**
|
||
```
|
||
/app/files/
|
||
├── pdf/
|
||
│ ├── 2024-12-10/
|
||
│ │ ├── {taskId}.pdf
|
||
│ │ └── {taskId}.pdf
|
||
│ └── 2024-12-11/
|
||
└── image/
|
||
├── 2024-12-10/
|
||
│ ├── {taskId}.png
|
||
│ └── {taskId}.jpg
|
||
└── 2024-12-11/
|
||
```
|
||
|
||
**对象存储:**
|
||
```
|
||
bucket-name/
|
||
├── pdf/
|
||
│ └── 2024/12/10/{taskId}.pdf
|
||
└── image/
|
||
└── 2024/12/10/{taskId}.png
|
||
```
|
||
|
||
---
|
||
|
||
## 🔐 九、安全设计
|
||
|
||
### 9.1 认证授权
|
||
|
||
```csharp
|
||
public class AuthenticationOptions
|
||
{
|
||
public string Scheme { get; set; } = "ApiKey"; // ApiKey / JWT / OAuth2
|
||
public bool RequireAuthentication { get; set; } = true;
|
||
}
|
||
```
|
||
|
||
**API Key 认证:**
|
||
```http
|
||
POST /api/tasks/pdf
|
||
Authorization: Bearer your-api-key-here
|
||
```
|
||
|
||
**JWT 认证:**
|
||
```http
|
||
POST /api/tasks/pdf
|
||
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
|
||
```
|
||
|
||
### 9.2 请求限流
|
||
|
||
```csharp
|
||
public class RateLimitOptions
|
||
{
|
||
// 基于 IP 的限流
|
||
public int RequestsPerMinutePerIp { get; set; } = 60;
|
||
public int RequestsPerHourPerIp { get; set; } = 1000;
|
||
|
||
// 基于用户的限流
|
||
public int RequestsPerMinutePerUser { get; set; } = 100;
|
||
public int RequestsPerHourPerUser { get; set; } = 5000;
|
||
public int RequestsPerDayPerUser { get; set; } = 50000;
|
||
|
||
// 基于 API Key 的配额
|
||
public Dictionary<string, QuotaConfig> ApiKeyQuotas { get; set; }
|
||
}
|
||
|
||
public class QuotaConfig
|
||
{
|
||
public int DailyQuota { get; set; } // 每日配额
|
||
public int MonthlyQuota { get; set; } // 每月配额
|
||
public int CurrentUsage { get; set; } // 当前使用量
|
||
}
|
||
```
|
||
|
||
### 9.3 内容安全
|
||
|
||
```csharp
|
||
public class SecurityOptions
|
||
{
|
||
// HTML 内容限制
|
||
public long MaxHtmlSize { get; set; } = 10485760; // 10MB
|
||
|
||
// URL 白名单/黑名单
|
||
public List<string> AllowedDomains { get; set; } // 允许的域名
|
||
public List<string> BlockedDomains { get; set; } // 禁止的域名
|
||
public bool BlockPrivateNetworks { get; set; } = true; // 阻止内网地址(SSRF防护)
|
||
|
||
// 内容过滤
|
||
public bool EnableXssFilter { get; set; } = true;
|
||
public List<string> BlockedScripts { get; set; } // 禁止的脚本模式
|
||
}
|
||
```
|
||
|
||
**SSRF 防护示例:**
|
||
```csharp
|
||
private bool IsPrivateNetwork(string url)
|
||
{
|
||
var uri = new Uri(url);
|
||
var host = uri.Host;
|
||
|
||
// 阻止内网地址
|
||
var blockedPatterns = new[]
|
||
{
|
||
"localhost", "127.0.0.1", "0.0.0.0",
|
||
"10.", "172.16.", "192.168.",
|
||
"169.254.", "::1", "metadata.google.internal"
|
||
};
|
||
|
||
return blockedPatterns.Any(p => host.StartsWith(p));
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 📈 十、监控与告警
|
||
|
||
### 10.1 监控指标
|
||
|
||
#### 业务指标
|
||
- 任务提交速率(tasks/minute)
|
||
- 任务完成速率(tasks/minute)
|
||
- 任务成功率(%)
|
||
- 队列长度(pending tasks)
|
||
- 平均等待时间(seconds)
|
||
- 平均处理时间(seconds)
|
||
|
||
#### 系统指标
|
||
- 浏览器池使用率(%)
|
||
- 内存占用(MB)
|
||
- CPU 占用(%)
|
||
- 磁盘空间占用(GB)
|
||
- 网络 I/O(MB/s)
|
||
|
||
#### 错误指标
|
||
- 失败任务数(count)
|
||
- 超时任务数(count)
|
||
- 回调失败次数(count)
|
||
- 浏览器崩溃次数(count)
|
||
|
||
### 10.2 告警规则
|
||
|
||
| 指标 | 告警条件 | 级别 | 处理建议 |
|
||
|------|----------|------|----------|
|
||
| 队列积压 | > 100 | Warning | 增加 Worker 数量 |
|
||
| 队列积压 | > 500 | Critical | 紧急扩容 |
|
||
| 成功率 | < 95% | Warning | 检查错误日志 |
|
||
| 成功率 | < 90% | Critical | 立即处理 |
|
||
| 平均处理时间 | > 30s | Warning | 性能调优 |
|
||
| 浏览器池使用率 | > 90% | Warning | 增加实例数 |
|
||
| 内存占用 | > 80% | Warning | 检查内存泄漏 |
|
||
| 回调失败率 | > 10% | Warning | 检查回调服务 |
|
||
|
||
### 10.3 监控集成
|
||
|
||
**Prometheus + Grafana**
|
||
```csharp
|
||
builder.Services.AddPrometheusMetrics();
|
||
|
||
app.UseMetricServer(); // /metrics 端点
|
||
app.UseHttpMetrics(); // HTTP 指标收集
|
||
```
|
||
|
||
**ELK Stack(日志聚合)**
|
||
```csharp
|
||
builder.Logging.AddElasticsearch(new ElasticsearchLoggerOptions
|
||
{
|
||
IndexFormat = "htmltopdf-{0:yyyy.MM.dd}",
|
||
AutoRegisterTemplate = true
|
||
});
|
||
```
|
||
|
||
---
|
||
|
||
## 🚀 十一、性能优化
|
||
|
||
### 11.1 缓存策略
|
||
|
||
```csharp
|
||
public class CacheOptions
|
||
{
|
||
public bool EnableCache { get; set; } = false;
|
||
public int CacheDurationMinutes { get; set; } = 60;
|
||
public long MaxCacheSizeMB { get; set; } = 1024;
|
||
|
||
// 缓存键生成规则
|
||
public string GenerateCacheKey(string sourceType, string sourceContent, object options)
|
||
{
|
||
// 对 URL + Options 计算 Hash
|
||
// 相同的 URL 和选项 → 返回缓存结果
|
||
}
|
||
}
|
||
```
|
||
|
||
**适用场景:**
|
||
- ✅ 相同 URL 频繁转换(如报表页面)
|
||
- ✅ 静态页面
|
||
- ❌ 动态内容(每次都不同)
|
||
- ❌ 个性化内容
|
||
|
||
### 11.2 批量处理
|
||
|
||
```csharp
|
||
POST /api/tasks/batch
|
||
{
|
||
"tasks": [
|
||
{
|
||
"type": "pdf",
|
||
"source": { "type": "url", "content": "https://..." }
|
||
},
|
||
{
|
||
"type": "image",
|
||
"source": { "type": "html", "content": "..." }
|
||
}
|
||
],
|
||
"callback": {
|
||
"url": "https://your-callback.com/batch-complete",
|
||
"onEachComplete": false, // 每个完成时是否回调
|
||
"onAllComplete": true // 全部完成时回调
|
||
}
|
||
}
|
||
|
||
响应:
|
||
{
|
||
"batchId": "batch-uuid",
|
||
"taskIds": ["task-1", "task-2", ...],
|
||
"totalTasks": 10,
|
||
"links": {
|
||
"status": "/api/tasks/batch/batch-uuid"
|
||
}
|
||
}
|
||
```
|
||
|
||
### 11.3 资源预热
|
||
|
||
```csharp
|
||
// 启动时预热
|
||
- 预创建浏览器实例
|
||
- 预加载常用字体
|
||
- 预热 DNS 解析
|
||
- 预建立 HTTP 连接池
|
||
```
|
||
|
||
---
|
||
|
||
## 🐳 十二、部署架构
|
||
|
||
### 12.1 单实例部署(开发/测试)
|
||
|
||
```
|
||
┌──────────────────────┐
|
||
│ Docker Container │
|
||
│ ┌────────────────┐ │
|
||
│ │ Web API │ │
|
||
│ │ Task Queue │ │
|
||
│ │ Workers (5) │ │
|
||
│ │ Browser Pool │ │
|
||
│ └────────────────┘ │
|
||
└──────────────────────┘
|
||
```
|
||
|
||
### 12.2 集群部署(生产环境)
|
||
|
||
```
|
||
┌──────────────┐
|
||
│ Load Balancer│
|
||
│ (Nginx/K8s) │
|
||
└────┬─────────┘
|
||
│
|
||
┌────────────┼────────────┐
|
||
│ │ │
|
||
┌─────▼────┐ ┌───▼─────┐ ┌──▼──────┐
|
||
│Instance 1│ │Instance 2│ │Instance N│
|
||
│ API + │ │ API + │ │ API + │
|
||
│ Workers │ │ Workers │ │ Workers │
|
||
└─────┬────┘ └────┬─────┘ └────┬────┘
|
||
│ │ │
|
||
└────────────┼─────────────┘
|
||
│
|
||
┌────▼─────┐
|
||
│ Redis │
|
||
│ (任务队列) │
|
||
└────┬─────┘
|
||
│
|
||
┌────▼─────┐
|
||
│ Database │
|
||
│(任务历史) │
|
||
└────┬─────┘
|
||
│
|
||
┌────▼─────┐
|
||
│ OSS │
|
||
│(文件存储) │
|
||
└──────────┘
|
||
```
|
||
|
||
### 12.3 Kubernetes 部署
|
||
|
||
```yaml
|
||
# Deployment
|
||
apiVersion: apps/v1
|
||
kind: Deployment
|
||
metadata:
|
||
name: htmltopdf-service
|
||
spec:
|
||
replicas: 3
|
||
selector:
|
||
matchLabels:
|
||
app: htmltopdf
|
||
template:
|
||
metadata:
|
||
labels:
|
||
app: htmltopdf
|
||
spec:
|
||
containers:
|
||
- name: htmltopdf
|
||
image: htmltopdf-service:2.0
|
||
resources:
|
||
requests:
|
||
memory: "1Gi"
|
||
cpu: "1000m"
|
||
limits:
|
||
memory: "2Gi"
|
||
cpu: "2000m"
|
||
env:
|
||
- name: PdfService__BrowserPool__MaxInstances
|
||
value: "10"
|
||
- name: PdfService__Queue__Redis__ConnectionString
|
||
valueFrom:
|
||
secretKeyRef:
|
||
name: redis-secret
|
||
key: connection-string
|
||
|
||
---
|
||
# Service
|
||
apiVersion: v1
|
||
kind: Service
|
||
metadata:
|
||
name: htmltopdf-service
|
||
spec:
|
||
selector:
|
||
app: htmltopdf
|
||
ports:
|
||
- port: 80
|
||
targetPort: 5000
|
||
type: LoadBalancer
|
||
|
||
---
|
||
# HorizontalPodAutoscaler
|
||
apiVersion: autoscaling/v2
|
||
kind: HorizontalPodAutoscaler
|
||
metadata:
|
||
name: htmltopdf-hpa
|
||
spec:
|
||
scaleTargetRef:
|
||
apiVersion: apps/v1
|
||
kind: Deployment
|
||
name: htmltopdf-service
|
||
minReplicas: 3
|
||
maxReplicas: 10
|
||
metrics:
|
||
- type: Resource
|
||
resource:
|
||
name: cpu
|
||
target:
|
||
type: Utilization
|
||
averageUtilization: 70
|
||
- type: Resource
|
||
resource:
|
||
name: memory
|
||
target:
|
||
type: Utilization
|
||
averageUtilization: 80
|
||
```
|
||
|
||
---
|
||
|
||
## 🎯 十三、接口完整列表
|
||
|
||
### 13.1 任务管理接口
|
||
|
||
| 方法 | 路径 | 说明 |
|
||
|------|------|------|
|
||
| POST | `/api/tasks/pdf` | 提交 PDF 转换任务 |
|
||
| POST | `/api/tasks/image` | 提交图片转换任务 |
|
||
| POST | `/api/tasks/batch` | 批量提交任务 |
|
||
| GET | `/api/tasks/{taskId}` | 查询任务详情 |
|
||
| GET | `/api/tasks/{taskId}/status` | 查询任务状态(轻量级) |
|
||
| GET | `/api/tasks` | 查询任务列表(分页) |
|
||
| GET | `/api/tasks/{taskId}/download` | 下载结果文件 |
|
||
| POST | `/api/tasks/{taskId}/retry` | 重试失败任务 |
|
||
| DELETE | `/api/tasks/{taskId}` | 取消/删除任务 |
|
||
|
||
### 13.2 系统管理接口
|
||
|
||
| 方法 | 路径 | 说明 |
|
||
|------|------|------|
|
||
| GET | `/health` | 健康检查 |
|
||
| GET | `/health/ready` | 就绪探针(K8s) |
|
||
| GET | `/health/live` | 存活探针(K8s) |
|
||
| GET | `/metrics` | Prometheus 指标 |
|
||
| GET | `/api/system/stats` | 系统统计信息 |
|
||
| GET | `/api/system/config` | 系统配置信息 |
|
||
|
||
### 13.3 管理后台接口(可选)
|
||
|
||
| 方法 | 路径 | 说明 |
|
||
|------|------|------|
|
||
| GET | `/admin/dashboard` | 仪表板数据 |
|
||
| GET | `/admin/tasks` | 任务管理列表 |
|
||
| POST | `/admin/tasks/{taskId}/reprocess` | 重新处理任务 |
|
||
| POST | `/admin/system/clear-cache` | 清理缓存 |
|
||
| POST | `/admin/system/restart-workers` | 重启 Workers |
|
||
| GET | `/admin/logs` | 查询日志 |
|
||
|
||
---
|
||
|
||
## 📝 十四、配置文件设计
|
||
|
||
### 14.1 完整配置(appsettings.json)
|
||
|
||
```json
|
||
{
|
||
"Logging": {
|
||
"LogLevel": {
|
||
"Default": "Information",
|
||
"Microsoft.AspNetCore": "Warning",
|
||
"HtmlToPdfService": "Debug"
|
||
}
|
||
},
|
||
|
||
"PdfService": {
|
||
"BrowserPool": {
|
||
"MaxInstances": 10,
|
||
"MinInstances": 2,
|
||
"MaxConcurrent": 5,
|
||
"AcquireTimeout": 30000,
|
||
"BrowserArgs": [
|
||
"--no-sandbox",
|
||
"--disable-setuid-sandbox",
|
||
"--disable-dev-shm-usage",
|
||
"--disable-gpu"
|
||
]
|
||
},
|
||
|
||
"TaskQueue": {
|
||
"Type": "Redis", // Memory / Redis / Database
|
||
"MaxQueueSize": 1000,
|
||
"MaxConcurrentWorkers": 5,
|
||
"MaxTasksPerUser": 10,
|
||
"WorkerOptions": {
|
||
"FetchInterval": 100, // 从队列获取任务的间隔(ms)
|
||
"ErrorRetryDelay": 5000, // Worker 错误后重试延迟
|
||
"GracefulShutdownTimeout": 30000 // 优雅关闭超时
|
||
},
|
||
"Redis": {
|
||
"ConnectionString": "localhost:6379",
|
||
"Database": 0,
|
||
"KeyPrefix": "htmltopdf:"
|
||
}
|
||
},
|
||
|
||
"Storage": {
|
||
"Type": "OSS", // Local / OSS / S3 / MinIO
|
||
"SaveLocalCopy": true,
|
||
"LocalPath": "/app/files",
|
||
"RetentionHours": 168, // 7天
|
||
"AutoCleanup": true,
|
||
"CleanupInterval": 3600,
|
||
|
||
"OSS": {
|
||
"Endpoint": "oss-cn-hangzhou.aliyuncs.com",
|
||
"AccessKeyId": "",
|
||
"AccessKeySecret": "",
|
||
"BucketName": "htmltopdf-files",
|
||
"CdnDomain": "https://cdn.example.com",
|
||
"UseHttps": true
|
||
}
|
||
},
|
||
|
||
"Callback": {
|
||
"Enabled": true,
|
||
"DefaultUrl": "",
|
||
"Timeout": 30000,
|
||
"MaxRetries": 3, // 最大重试次数
|
||
"RetryDelay": 1000, // 重试延迟(ms)
|
||
"BackoffMultiplier": 2.0, // 退避倍数
|
||
"IncludePdfData": false,
|
||
"CustomHeaders": {}
|
||
},
|
||
|
||
"Conversion": {
|
||
"DefaultTimeout": 60000,
|
||
"DefaultWaitUntil": "networkidle2",
|
||
"MaxHtmlSize": 10485760,
|
||
"EnableCache": false, // 是否启用缓存
|
||
"CacheDuration": 3600 // 缓存时长(秒)
|
||
},
|
||
|
||
"Security": {
|
||
"RequireAuthentication": true,
|
||
"AuthenticationScheme": "ApiKey", // ApiKey / JWT
|
||
"BlockPrivateNetworks": true,
|
||
"AllowedDomains": [],
|
||
"BlockedDomains": []
|
||
},
|
||
|
||
"RateLimit": {
|
||
"Enabled": true,
|
||
"RequestsPerMinutePerIp": 60,
|
||
"RequestsPerHourPerIp": 1000,
|
||
"RequestsPerMinutePerUser": 100,
|
||
"RequestsPerDayPerUser": 10000
|
||
},
|
||
|
||
"Monitoring": {
|
||
"EnablePrometheus": true,
|
||
"EnableHealthChecks": true,
|
||
"EnableDetailedMetrics": true
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 🎯 十五、开发计划
|
||
|
||
### 15.1 Phase 1:任务异步化(2-3周)
|
||
|
||
| 功能模块 | 工作量 | 优先级 |
|
||
|---------|--------|--------|
|
||
| 任务队列(内存版) | 2天 | P0 |
|
||
| 后台 Worker 服务 | 2天 | P0 |
|
||
| 任务状态查询接口 | 1天 | P0 |
|
||
| 任务下载接口 | 0.5天 | P0 |
|
||
| 任务取消功能 | 0.5天 | P1 |
|
||
| 任务列表查询 | 1天 | P1 |
|
||
| 批量任务提交 | 1天 | P1 |
|
||
| Redis 队列实现 | 2天 | P1 |
|
||
| 单元测试 | 2天 | P1 |
|
||
|
||
### 15.2 Phase 2:安全与可靠性(1-2周)
|
||
|
||
| 功能模块 | 工作量 | 优先级 |
|
||
|---------|--------|--------|
|
||
| API Key 认证 | 1天 | P0 |
|
||
| 请求限流 | 1天 | P0 |
|
||
| 回调重试机制 | 1天 | P0 |
|
||
| SSRF 防护 | 1天 | P1 |
|
||
| 内容安全校验 | 1天 | P1 |
|
||
| 任务持久化(数据库) | 2天 | P1 |
|
||
| 优雅关闭 | 0.5天 | P1 |
|
||
|
||
### 15.3 Phase 3:监控与运维(1-2周)
|
||
|
||
| 功能模块 | 工作量 | 优先级 |
|
||
|---------|--------|--------|
|
||
| Prometheus 指标 | 1天 | P0 |
|
||
| 健康检查增强 | 0.5天 | P0 |
|
||
| 结构化日志 | 0.5天 | P0 |
|
||
| 告警规则配置 | 1天 | P1 |
|
||
| 性能追踪(Jaeger) | 1天 | P1 |
|
||
| 管理后台 API | 2天 | P2 |
|
||
|
||
### 15.4 Phase 4:功能增强(1-2周)
|
||
|
||
| 功能模块 | 工作量 | 优先级 |
|
||
|---------|--------|--------|
|
||
| OSS 存储对接 | 2天 | P1 |
|
||
| 结果缓存 | 1天 | P1 |
|
||
| 优先级队列 | 1天 | P2 |
|
||
| 页眉页脚模板 | 1天 | P2 |
|
||
| 水印功能 | 1天 | P2 |
|
||
| 多租户隔离 | 2天 | P2 |
|
||
|
||
### 15.5 Phase 5:管理界面(2-3周,可选)
|
||
|
||
| 功能模块 | 工作量 | 优先级 |
|
||
|---------|--------|--------|
|
||
| 前端框架搭建 | 2天 | P3 |
|
||
| Dashboard 仪表板 | 2天 | P3 |
|
||
| 任务管理页面 | 2天 | P3 |
|
||
| 系统配置页面 | 1天 | P3 |
|
||
| 用户管理 | 2天 | P3 |
|
||
| API Key 管理 | 1天 | P3 |
|
||
|
||
---
|
||
|
||
## 📊 十六、性能目标
|
||
|
||
### 16.1 吞吐量指标
|
||
|
||
| 场景 | 目标 | 测试条件 |
|
||
|------|------|----------|
|
||
| 简单 HTML → PDF | 100+ QPS | A4、1页、无图片 |
|
||
| 复杂 HTML → PDF | 50+ QPS | A4、10页、含图片 |
|
||
| URL → PDF | 30+ QPS | 外部URL、等待加载 |
|
||
| HTML → 图片 | 150+ QPS | 1920x1080、PNG |
|
||
| URL → 图片 | 50+ QPS | 外部URL、全页截图 |
|
||
|
||
### 16.2 延迟指标
|
||
|
||
| 指标 | 目标 | 说明 |
|
||
|------|------|------|
|
||
| 任务提交响应 | < 200ms | 立即返回任务ID |
|
||
| 简单任务处理 | < 3s | 排队 + 转换 |
|
||
| 复杂任务处理 | < 10s | 排队 + 转换 |
|
||
| 任务查询响应 | < 50ms | 从缓存/数据库读取 |
|
||
| 文件下载首字节 | < 100ms | 本地存储或CDN |
|
||
|
||
### 16.3 资源占用
|
||
|
||
| 资源 | 单实例 | 集群(3实例) |
|
||
|------|--------|--------------|
|
||
| 内存 | 2-4 GB | 6-12 GB |
|
||
| CPU | 2-4 核心 | 6-12 核心 |
|
||
| 磁盘(临时) | 10-50 GB | 30-150 GB |
|
||
| 对象存储 | 无限 | 无限 |
|
||
| 网络带宽 | 100 Mbps | 300 Mbps |
|
||
|
||
### 16.4 可靠性指标
|
||
|
||
| 指标 | 目标 |
|
||
|------|------|
|
||
| 服务可用性 | 99.9% |
|
||
| 任务成功率 | > 99% |
|
||
| 数据持久性 | 99.99% |
|
||
| 故障恢复时间 | < 5分钟 |
|
||
|
||
---
|
||
|
||
## 🔄 十七、任务生命周期管理
|
||
|
||
### 17.1 任务自动清理
|
||
|
||
```csharp
|
||
public class TaskCleanupPolicy
|
||
{
|
||
// 已完成任务保留时间
|
||
public int CompletedTaskRetentionDays { get; set; } = 7;
|
||
|
||
// 失败任务保留时间
|
||
public int FailedTaskRetentionDays { get; set; } = 30;
|
||
|
||
// 取消任务保留时间
|
||
public int CancelledTaskRetentionDays { get; set; } = 3;
|
||
|
||
// 清理执行时间(每天凌晨2点)
|
||
public string CleanupSchedule { get; set; } = "0 2 * * *";
|
||
}
|
||
```
|
||
|
||
### 17.2 文件过期策略
|
||
|
||
```csharp
|
||
public class FileExpirationPolicy
|
||
{
|
||
// 文件默认过期时间
|
||
public int DefaultExpirationHours { get; set; } = 24;
|
||
|
||
// 可以在提交任务时指定
|
||
public int MinExpirationHours { get; set; } = 1;
|
||
public int MaxExpirationHours { get; set; } = 168; // 7天
|
||
|
||
// 过期后的处理
|
||
public bool AutoDelete { get; set; } = true;
|
||
public bool MoveToArchive { get; set; } = false;
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 🛡️ 十八、容错与恢复
|
||
|
||
### 18.1 故障类型与处理
|
||
|
||
| 故障类型 | 检测方式 | 恢复策略 |
|
||
|----------|----------|----------|
|
||
| 浏览器崩溃 | 连接断开检测 | 自动重启实例 |
|
||
| Worker 异常 | 心跳检测 | 自动重启 Worker |
|
||
| 网络超时 | 超时检测 | 自动重试(3次) |
|
||
| 内存溢出 | 健康检查 | 重启服务、告警 |
|
||
| 磁盘满 | 空间检测 | 停止接收新任务、清理 |
|
||
| 队列积压 | 长度监控 | 动态扩容、告警 |
|
||
|
||
### 18.2 数据一致性
|
||
|
||
**任务状态一致性:**
|
||
- 使用乐观锁(版本号)防止并发更新冲突
|
||
- 状态机严格验证状态转换合法性
|
||
- 定期扫描僵尸任务(长时间 Processing 未完成)
|
||
|
||
**文件一致性:**
|
||
- 原子性写入(先写临时文件,再重命名)
|
||
- 校验和验证(SHA256)
|
||
- 定期一致性检查(任务记录 vs 文件存在性)
|
||
|
||
### 18.3 优雅关闭
|
||
|
||
```csharp
|
||
应用收到停止信号(SIGTERM)
|
||
↓
|
||
1. 停止接收新任务(返回503)
|
||
↓
|
||
2. 等待队列中任务处理完成
|
||
↓
|
||
3. 30秒后仍未完成?→ 任务标记为 Pending,下次重启恢复
|
||
↓
|
||
4. 清理浏览器实例
|
||
↓
|
||
5. 关闭数据库连接
|
||
↓
|
||
6. 优雅退出
|
||
```
|
||
|
||
---
|
||
|
||
## 📚 十九、依赖组件
|
||
|
||
### 19.1 必需组件
|
||
|
||
| 组件 | 用途 | 版本要求 |
|
||
|------|------|----------|
|
||
| .NET | 运行时 | 8.0+ |
|
||
| PuppeteerSharp | 浏览器控制 | 20.2.5+ |
|
||
| Chromium | 渲染引擎 | 自动下载 |
|
||
|
||
### 19.2 可选组件
|
||
|
||
| 组件 | 用途 | 推荐版本 |
|
||
|------|------|----------|
|
||
| Redis | 任务队列 | 7.0+ |
|
||
| PostgreSQL | 任务持久化 | 15+ |
|
||
| MinIO | 对象存储 | latest |
|
||
| Prometheus | 监控 | 2.40+ |
|
||
| Grafana | 可视化 | 9.0+ |
|
||
| Jaeger | 链路追踪 | 1.40+ |
|
||
|
||
---
|
||
|
||
## 🧪 二十、测试策略
|
||
|
||
### 20.1 单元测试
|
||
|
||
- 任务队列操作
|
||
- 状态机转换
|
||
- 并发安全性
|
||
- 配置验证
|
||
- 业务逻辑
|
||
- **目标覆盖率:> 80%**
|
||
|
||
### 20.2 集成测试
|
||
|
||
- API 接口完整流程
|
||
- 任务异步处理
|
||
- 回调机制
|
||
- 文件存储
|
||
- 认证授权
|
||
- **目标覆盖率:100% 核心流程**
|
||
|
||
### 20.3 性能测试
|
||
|
||
**测试工具:** JMeter / Gatling / k6
|
||
|
||
**测试场景:**
|
||
```
|
||
场景1: 基准测试
|
||
- 并发:50 用户
|
||
- 持续:10 分钟
|
||
- 任务类型:简单 HTML → PDF
|
||
- 目标:稳定 100+ QPS
|
||
|
||
场景2: 压力测试
|
||
- 并发:200 用户
|
||
- 持续:30 分钟
|
||
- 混合任务类型
|
||
- 目标:无崩溃、成功率 > 99%
|
||
|
||
场景3: 浸泡测试
|
||
- 并发:100 用户
|
||
- 持续:24 小时
|
||
- 目标:无内存泄漏、性能稳定
|
||
|
||
场景4: 峰值测试
|
||
- 并发:500 用户
|
||
- 持续:5 分钟
|
||
- 目标:系统不崩溃、自动降级
|
||
```
|
||
|
||
### 20.4 灾难恢复测试
|
||
|
||
- 数据库连接断开恢复
|
||
- Redis 连接断开恢复
|
||
- 浏览器进程异常恢复
|
||
- 服务重启任务恢复
|
||
- 网络分区恢复
|
||
|
||
---
|
||
|
||
## 💰 二十一、成本估算
|
||
|
||
### 21.1 云服务器成本(按阿里云)
|
||
|
||
**单实例配置:**
|
||
- ECS: 4核8G(计算型 c7)
|
||
- 系统盘: 40GB SSD
|
||
- 数据盘: 100GB SSD
|
||
- 带宽: 10Mbps
|
||
|
||
**月成本:** 约 ¥500-800
|
||
|
||
**集群配置(3实例 + Redis + RDS):**
|
||
- ECS x3: 4核8G
|
||
- Redis: 2核4G
|
||
- RDS MySQL: 2核4G
|
||
- OSS: 按使用量
|
||
|
||
**月成本:** 约 ¥2000-3000
|
||
|
||
### 21.2 流量成本
|
||
|
||
假设:
|
||
- 平均 PDF 大小:100KB
|
||
- 每天 10万次转换
|
||
- 月流量:10万 × 30 × 100KB ≈ 300GB
|
||
|
||
**OSS 流量费:** 约 ¥150/月(国内)
|
||
|
||
### 21.3 总体成本(生产环境)
|
||
|
||
| 项目 | 月成本 |
|
||
|------|--------|
|
||
| 云服务器(3实例) | ¥1500 |
|
||
| Redis(2G) | ¥200 |
|
||
| RDS MySQL(20G) | ¥300 |
|
||
| OSS 存储(100G) | ¥15 |
|
||
| OSS 流量(300G) | ¥150 |
|
||
| 带宽(30Mbps) | ¥300 |
|
||
| **合计** | **¥2465** |
|
||
|
||
---
|
||
|
||
## 🎓 二十二、技术债务与优化
|
||
|
||
### 22.1 已知限制
|
||
|
||
1. **Chromium 资源占用较大**
|
||
- 每个实例 200-500MB 内存
|
||
- 应对:限制最大实例数、定期重启
|
||
|
||
2. **不支持 PDF 高级功能**
|
||
- 无内置加密、签名
|
||
- 应对:后处理或使用专业库
|
||
|
||
3. **字体问题**
|
||
- 某些特殊字体可能缺失
|
||
- 应对:Docker 镜像中预装字体
|
||
|
||
### 22.2 未来优化方向
|
||
|
||
1. **智能调度**
|
||
- 根据任务复杂度动态分配资源
|
||
- 简单任务和复杂任务分离队列
|
||
|
||
2. **GPU 加速**
|
||
- 利用 GPU 加速渲染(如可用)
|
||
|
||
3. **边缘计算**
|
||
- 在用户就近节点部署服务
|
||
- 减少网络延迟
|
||
|
||
4. **AI 辅助**
|
||
- 预测任务处理时间
|
||
- 智能队列调度
|
||
|
||
---
|
||
|
||
## 📖 二十三、对比总结
|
||
|
||
### 23.1 MVP vs 正式版
|
||
|
||
| 特性 | MVP 版本 | 正式版本 |
|
||
|------|----------|----------|
|
||
| **接口模式** | 同步(阻塞) | 异步(立即返回) |
|
||
| **任务管理** | ❌ 无 | ✅ 完整的任务系统 |
|
||
| **任务查询** | ❌ 无 | ✅ 支持详情/列表查询 |
|
||
| **任务取消** | ❌ 无 | ✅ 支持 |
|
||
| **持久化** | ❌ 无 | ✅ Redis/数据库 |
|
||
| **认证授权** | ❌ 无 | ✅ API Key/JWT |
|
||
| **请求限流** | ❌ 无 | ✅ 多维度限流 |
|
||
| **监控告警** | ⚠️ 基础 | ✅ Prometheus/Grafana |
|
||
| **批量处理** | ❌ 无 | ✅ 支持 |
|
||
| **结果缓存** | ❌ 无 | ✅ 可选 |
|
||
| **集群部署** | ⚠️ 理论支持 | ✅ 完整支持 |
|
||
| **管理后台** | ❌ 无 | ✅ 可选 |
|
||
| **适用场景** | 小规模、验证 | 生产环境、大规模 |
|
||
|
||
### 23.2 DinkToPdf vs PuppeteerSharp(正式版)
|
||
|
||
| 对比项 | DinkToPdf | PuppeteerSharp(正式版) |
|
||
|--------|-----------|-------------------------|
|
||
| 并发模式 | 强制串行 | 真正并发 + 异步队列 |
|
||
| 吞吐量 | 20-30 QPS | 100+ QPS |
|
||
| 响应模式 | 同步阻塞 | 异步非阻塞 |
|
||
| 任务管理 | 无 | 完整 |
|
||
| 扩展性 | 纵向扩展 | 横向 + 纵向 |
|
||
| 渲染质量 | 良好(WebKit) | 完美(Chromium) |
|
||
| SPA 支持 | 有限 | 完美 |
|
||
| 资源占用 | 50-100MB | 200-500MB |
|
||
| 部署复杂度 | 低 | 中 |
|
||
|
||
---
|
||
|
||
## 🎯 二十四、MVP 到正式版迁移
|
||
|
||
### 24.1 向后兼容策略
|
||
|
||
**同时保留两种接口:**
|
||
|
||
```
|
||
同步接口(MVP 兼容):
|
||
POST /api/pdf/convert/html → 立即返回 PDF
|
||
POST /api/image/convert/html → 立即返回图片
|
||
|
||
异步接口(正式版推荐):
|
||
POST /api/tasks/pdf → 返回任务ID
|
||
POST /api/tasks/image → 返回任务ID
|
||
```
|
||
|
||
**配置开关:**
|
||
```json
|
||
{
|
||
"Features": {
|
||
"EnableSyncApi": true, // 是否启用同步接口
|
||
"EnableAsyncApi": true, // 是否启用异步接口
|
||
"DefaultMode": "async" // 默认推荐模式
|
||
}
|
||
}
|
||
```
|
||
|
||
### 24.2 渐进式迁移
|
||
|
||
```
|
||
阶段1: 双接口并存(1-2周)
|
||
- 新功能使用异步接口
|
||
- 老客户继续使用同步接口
|
||
|
||
阶段2: 引导迁移(1个月)
|
||
- 同步接口返回 Warning Header
|
||
- 文档更新推荐异步接口
|
||
|
||
阶段3: 逐步废弃(2-3个月)
|
||
- 同步接口标记为 Deprecated
|
||
- 设置废弃时间表
|
||
|
||
阶段4: 完全移除(可选)
|
||
- 仅保留异步接口
|
||
```
|
||
|
||
---
|
||
|
||
## 📅 二十五、实施时间表
|
||
|
||
### 总体时间线(8-12周)
|
||
|
||
```
|
||
Week 1-3: Phase 1 - 任务异步化
|
||
Week 4-5: Phase 2 - 安全与可靠性
|
||
Week 6-7: Phase 3 - 监控与运维
|
||
Week 8-9: Phase 4 - 功能增强
|
||
Week 10-12: Phase 5 - 管理界面(可选)
|
||
```
|
||
|
||
### 里程碑
|
||
|
||
- **M1(Week 3):** 异步任务系统上线,支持基本的提交/查询/下载
|
||
- **M2(Week 5):** 安全机制完善,支持认证授权和限流
|
||
- **M3(Week 7):** 监控体系建立,Prometheus + Grafana 上线
|
||
- **M4(Week 9):** OSS 对接完成,支持大规模生产使用
|
||
- **M5(Week 12):** 管理后台上线,功能完整
|
||
|
||
---
|
||
|
||
## ✅ 二十六、验收标准
|
||
|
||
### 26.1 功能验收
|
||
|
||
- [x] 任务异步提交返回任务ID(< 200ms)
|
||
- [x] 任务状态实时查询
|
||
- [x] 任务结果下载
|
||
- [x] 任务取消功能
|
||
- [x] 批量任务处理
|
||
- [x] 回调重试机制
|
||
- [x] 认证授权
|
||
- [x] 请求限流
|
||
- [x] 队列持久化(Redis)
|
||
- [x] 文件对象存储(OSS/S3)
|
||
|
||
### 26.2 性能验收
|
||
|
||
- [x] 单实例 QPS > 50(混合负载)
|
||
- [x] 集群(3实例)QPS > 150
|
||
- [x] 任务提交响应 < 200ms
|
||
- [x] 简单任务处理 < 5s
|
||
- [x] 任务查询响应 < 50ms
|
||
- [x] 成功率 > 99%
|
||
- [x] 24小时浸泡测试通过
|
||
|
||
### 26.3 可靠性验收
|
||
|
||
- [x] 服务可用性 > 99.9%
|
||
- [x] 数据持久化无丢失
|
||
- [x] 故障自动恢复
|
||
- [x] 优雅关闭不丢失任务
|
||
- [x] 集群滚动更新零停机
|
||
|
||
---
|
||
|
||
## 🎉 二十七、总结
|
||
|
||
### 正式版本的核心价值
|
||
|
||
1. **用户体验提升**
|
||
- 无需等待,立即返回
|
||
- 支持长时间任务
|
||
- 可查询进度
|
||
|
||
2. **系统可靠性**
|
||
- 任务持久化,不丢失
|
||
- 故障自动恢复
|
||
- 支持集群部署
|
||
|
||
3. **可观测性**
|
||
- 完整的监控指标
|
||
- 详细的任务日志
|
||
- 实时告警
|
||
|
||
4. **可扩展性**
|
||
- 水平扩展(加机器)
|
||
- 纵向扩展(加资源)
|
||
- 模块化设计
|
||
|
||
5. **生产就绪**
|
||
- 认证授权
|
||
- 请求限流
|
||
- 安全防护
|
||
- 完整文档
|
||
|
||
---
|
||
|
||
## 📖 二十八、API 使用示例
|
||
|
||
### 28.1 完整工作流程示例
|
||
|
||
#### 场景:将 React 应用页面转换为 PDF
|
||
|
||
**步骤 1:提交任务**
|
||
```bash
|
||
curl -X POST http://api.example.com/api/tasks/pdf \
|
||
-H "Content-Type: application/json" \
|
||
-H "Authorization: Bearer your-api-key" \
|
||
-d '{
|
||
"source": {
|
||
"type": "url",
|
||
"content": "https://your-react-app.com/dashboard"
|
||
},
|
||
"options": {
|
||
"format": "A4",
|
||
"landscape": false,
|
||
"printBackground": true
|
||
},
|
||
"waitUntil": "networkidle2",
|
||
"timeout": 60000,
|
||
"callback": {
|
||
"url": "https://your-app.com/webhook/pdf-complete",
|
||
"headers": {
|
||
"X-API-Key": "your-webhook-key"
|
||
},
|
||
"includeFileData": false
|
||
},
|
||
"saveLocal": true,
|
||
"metadata": {
|
||
"userId": "user123",
|
||
"reportType": "dashboard"
|
||
}
|
||
}'
|
||
```
|
||
|
||
**响应:**
|
||
```json
|
||
{
|
||
"taskId": "550e8400-e29b-41d4-a716-446655440000",
|
||
"status": "pending",
|
||
"message": "任务已创建,正在排队处理",
|
||
"createdAt": "2024-12-10T10:30:00Z",
|
||
"estimatedWaitTime": 3,
|
||
"queuePosition": 5,
|
||
"links": {
|
||
"self": "/api/tasks/550e8400-e29b-41d4-a716-446655440000",
|
||
"status": "/api/tasks/550e8400-e29b-41d4-a716-446655440000/status",
|
||
"download": "/api/tasks/550e8400-e29b-41d4-a716-446655440000/download"
|
||
}
|
||
}
|
||
```
|
||
|
||
**步骤 2:轮询查询状态(可选)**
|
||
```bash
|
||
# 方式1:完整查询
|
||
curl -X GET http://api.example.com/api/tasks/550e8400-e29b-41d4-a716-446655440000 \
|
||
-H "Authorization: Bearer your-api-key"
|
||
|
||
# 方式2:轻量级状态查询
|
||
curl -X GET http://api.example.com/api/tasks/550e8400-e29b-41d4-a716-446655440000/status \
|
||
-H "Authorization: Bearer your-api-key"
|
||
```
|
||
|
||
**步骤 3:下载结果(任务完成后)**
|
||
```bash
|
||
curl -X GET http://api.example.com/api/tasks/550e8400-e29b-41d4-a716-446655440000/download \
|
||
-H "Authorization: Bearer your-api-key" \
|
||
--output dashboard.pdf
|
||
```
|
||
|
||
**步骤 4:接收回调(自动)**
|
||
```http
|
||
POST https://your-app.com/webhook/pdf-complete
|
||
Content-Type: application/json
|
||
X-API-Key: your-webhook-key
|
||
|
||
{
|
||
"requestId": "550e8400-e29b-41d4-a716-446655440000",
|
||
"status": "success",
|
||
"timestamp": "2024-12-10T10:30:10Z",
|
||
"duration": 5000,
|
||
"result": {
|
||
"fileSize": 102400,
|
||
"downloadUrl": "https://cdn.example.com/files/pdf/2024/12/10/550e8400-e29b-41d4-a716-446655440000.pdf",
|
||
"expiresAt": "2024-12-11T10:30:10Z"
|
||
},
|
||
"source": {
|
||
"type": "url",
|
||
"content": "https://your-react-app.com/dashboard"
|
||
}
|
||
}
|
||
```
|
||
|
||
### 28.2 批量任务示例
|
||
|
||
```bash
|
||
curl -X POST http://api.example.com/api/tasks/batch \
|
||
-H "Content-Type: application/json" \
|
||
-H "Authorization: Bearer your-api-key" \
|
||
-d '{
|
||
"tasks": [
|
||
{
|
||
"type": "pdf",
|
||
"source": {
|
||
"type": "url",
|
||
"content": "https://example.com/page1"
|
||
},
|
||
"options": { "format": "A4" }
|
||
},
|
||
{
|
||
"type": "pdf",
|
||
"source": {
|
||
"type": "url",
|
||
"content": "https://example.com/page2"
|
||
},
|
||
"options": { "format": "A4" }
|
||
},
|
||
{
|
||
"type": "image",
|
||
"source": {
|
||
"type": "url",
|
||
"content": "https://example.com/page3"
|
||
},
|
||
"options": {
|
||
"format": "png",
|
||
"width": 1920,
|
||
"height": 1080
|
||
}
|
||
}
|
||
],
|
||
"callback": {
|
||
"url": "https://your-app.com/webhook/batch-complete",
|
||
"onEachComplete": false,
|
||
"onAllComplete": true
|
||
}
|
||
}'
|
||
```
|
||
|
||
**响应:**
|
||
```json
|
||
{
|
||
"batchId": "batch-550e8400-e29b-41d4-a716-446655440000",
|
||
"taskIds": [
|
||
"task-001",
|
||
"task-002",
|
||
"task-003"
|
||
],
|
||
"totalTasks": 3,
|
||
"status": "pending",
|
||
"links": {
|
||
"status": "/api/tasks/batch/batch-550e8400-e29b-41d4-a716-446655440000"
|
||
}
|
||
}
|
||
```
|
||
|
||
### 28.3 客户端 SDK 示例(伪代码)
|
||
|
||
```csharp
|
||
// C# 客户端示例
|
||
public class HtmlToPdfClient
|
||
{
|
||
private readonly HttpClient _httpClient;
|
||
private readonly string _apiKey;
|
||
|
||
public async Task<string> ConvertUrlToPdfAsync(string url, PdfOptions options)
|
||
{
|
||
// 1. 提交任务
|
||
var request = new
|
||
{
|
||
source = new { type = "url", content = url },
|
||
options = options
|
||
};
|
||
|
||
var response = await _httpClient.PostAsJsonAsync("/api/tasks/pdf", request);
|
||
var result = await response.Content.ReadFromJsonAsync<TaskResponse>();
|
||
|
||
// 2. 轮询查询状态
|
||
while (result.Status == "pending" || result.Status == "processing")
|
||
{
|
||
await Task.Delay(1000); // 等待1秒
|
||
result = await GetTaskStatusAsync(result.TaskId);
|
||
}
|
||
|
||
// 3. 检查结果
|
||
if (result.Status == "completed")
|
||
{
|
||
return result.Result.DownloadUrl;
|
||
}
|
||
else
|
||
{
|
||
throw new Exception($"转换失败: {result.Error.Message}");
|
||
}
|
||
}
|
||
|
||
public async Task<byte[]> DownloadFileAsync(string taskId)
|
||
{
|
||
var response = await _httpClient.GetAsync($"/api/tasks/{taskId}/download");
|
||
return await response.Content.ReadAsByteArrayAsync();
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 🚀 二十九、部署操作手册
|
||
|
||
### 29.1 Docker 单实例部署
|
||
|
||
**步骤 1:准备环境**
|
||
```bash
|
||
# 确保 Docker 已安装
|
||
docker --version
|
||
|
||
# 确保 Docker Compose 已安装
|
||
docker-compose --version
|
||
```
|
||
|
||
**步骤 2:配置环境变量**
|
||
```bash
|
||
# 创建 .env 文件
|
||
cat > .env << EOF
|
||
ASPNETCORE_ENVIRONMENT=Production
|
||
PdfService__BrowserPool__MaxInstances=10
|
||
PdfService__BrowserPool__MaxConcurrent=5
|
||
PdfService__TaskQueue__Type=Redis
|
||
PdfService__TaskQueue__Redis__ConnectionString=redis:6379
|
||
PdfService__Storage__Type=Local
|
||
PdfService__Storage__LocalPath=/app/files
|
||
PdfService__Callback__Enabled=true
|
||
PdfService__Security__RequireAuthentication=true
|
||
EOF
|
||
```
|
||
|
||
**步骤 3:启动服务**
|
||
```bash
|
||
# 使用 docker-compose
|
||
docker-compose up -d
|
||
|
||
# 查看日志
|
||
docker-compose logs -f htmltopdf-service
|
||
|
||
# 检查服务状态
|
||
docker-compose ps
|
||
```
|
||
|
||
**步骤 4:验证部署**
|
||
```bash
|
||
# 健康检查
|
||
curl http://localhost:5000/health
|
||
|
||
# 测试任务提交
|
||
curl -X POST http://localhost:5000/api/tasks/pdf \
|
||
-H "Content-Type: application/json" \
|
||
-H "Authorization: Bearer your-api-key" \
|
||
-d '{"source":{"type":"html","content":"<h1>Test</h1>"}}'
|
||
```
|
||
|
||
### 29.2 Kubernetes 部署
|
||
|
||
**步骤 1:创建命名空间**
|
||
```bash
|
||
kubectl create namespace htmltopdf
|
||
```
|
||
|
||
**步骤 2:创建 ConfigMap**
|
||
```yaml
|
||
apiVersion: v1
|
||
kind: ConfigMap
|
||
metadata:
|
||
name: htmltopdf-config
|
||
namespace: htmltopdf
|
||
data:
|
||
appsettings.json: |
|
||
{
|
||
"PdfService": {
|
||
"BrowserPool": {
|
||
"MaxInstances": "10",
|
||
"MaxConcurrent": "5"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**步骤 3:创建 Secret(API Key、Redis 密码等)**
|
||
```bash
|
||
kubectl create secret generic htmltopdf-secrets \
|
||
--from-literal=api-key=your-api-key \
|
||
--from-literal=redis-password=your-redis-password \
|
||
-n htmltopdf
|
||
```
|
||
|
||
**步骤 4:部署应用**
|
||
```bash
|
||
kubectl apply -f deployment.yaml
|
||
kubectl apply -f service.yaml
|
||
kubectl apply -f hpa.yaml
|
||
```
|
||
|
||
**步骤 5:验证部署**
|
||
```bash
|
||
# 查看 Pod 状态
|
||
kubectl get pods -n htmltopdf
|
||
|
||
# 查看服务
|
||
kubectl get svc -n htmltopdf
|
||
|
||
# 查看日志
|
||
kubectl logs -f deployment/htmltopdf-service -n htmltopdf
|
||
```
|
||
|
||
### 29.3 Redis 集群配置
|
||
|
||
**docker-compose.yml(Redis 集群)**
|
||
```yaml
|
||
version: '3.8'
|
||
|
||
services:
|
||
redis-master:
|
||
image: redis:7-alpine
|
||
ports:
|
||
- "6379:6379"
|
||
command: redis-server --requirepass yourpassword
|
||
volumes:
|
||
- redis-data:/data
|
||
|
||
redis-replica:
|
||
image: redis:7-alpine
|
||
command: redis-server --replicaof redis-master 6379 --requirepass yourpassword
|
||
depends_on:
|
||
- redis-master
|
||
|
||
htmltopdf-service:
|
||
build: .
|
||
depends_on:
|
||
- redis-master
|
||
environment:
|
||
- PdfService__TaskQueue__Redis__ConnectionString=redis-master:6379,password=yourpassword
|
||
|
||
volumes:
|
||
redis-data:
|
||
```
|
||
|
||
---
|
||
|
||
## 🔧 三十、故障排查手册
|
||
|
||
### 30.1 常见问题与解决方案
|
||
|
||
#### 问题 1:任务一直处于 Pending 状态
|
||
|
||
**症状:**
|
||
- 任务提交成功,但长时间不处理
|
||
- 队列中有大量 Pending 任务
|
||
|
||
**排查步骤:**
|
||
```bash
|
||
# 1. 检查 Worker 是否运行
|
||
curl http://localhost:5000/health
|
||
# 查看 "taskQueue.processingTasks" 是否为 0
|
||
|
||
# 2. 检查 Worker 日志
|
||
docker-compose logs htmltopdf-service | grep Worker
|
||
|
||
# 3. 检查浏览器池状态
|
||
curl http://localhost:5000/health
|
||
# 查看 "browserPool.availableInstances" 是否为 0
|
||
|
||
# 4. 检查 Redis 连接
|
||
docker-compose exec htmltopdf-service redis-cli -h redis ping
|
||
```
|
||
|
||
**解决方案:**
|
||
- 增加 Worker 数量:`PdfService__TaskQueue__MaxConcurrentWorkers=10`
|
||
- 增加浏览器实例:`PdfService__BrowserPool__MaxInstances=20`
|
||
- 检查 Redis 连接是否正常
|
||
- 重启 Worker 服务
|
||
|
||
#### 问题 2:任务频繁失败
|
||
|
||
**症状:**
|
||
- 任务状态为 Failed
|
||
- 错误信息:`NavigationTimeout` 或 `BrowserDisconnected`
|
||
|
||
**排查步骤:**
|
||
```bash
|
||
# 1. 查看失败任务详情
|
||
curl http://localhost:5000/api/tasks/{taskId}
|
||
|
||
# 2. 检查浏览器进程
|
||
docker-compose exec htmltopdf-service ps aux | grep chrome
|
||
|
||
# 3. 检查内存占用
|
||
docker stats htmltopdf-service
|
||
|
||
# 4. 查看错误日志
|
||
docker-compose logs htmltopdf-service | grep -i error
|
||
```
|
||
|
||
**解决方案:**
|
||
- 增加超时时间:`timeout: 120000`(2分钟)
|
||
- 增加浏览器启动参数:`--disable-dev-shm-usage`
|
||
- 增加容器内存限制
|
||
- 定期重启浏览器实例
|
||
|
||
#### 问题 3:回调失败
|
||
|
||
**症状:**
|
||
- 任务完成但回调未收到
|
||
- 回调日志显示连接超时
|
||
|
||
**排查步骤:**
|
||
```bash
|
||
# 1. 检查回调配置
|
||
curl http://localhost:5000/api/tasks/{taskId} | jq .callback
|
||
|
||
# 2. 测试回调 URL 是否可达
|
||
curl -X POST https://your-callback-url.com/webhook \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"test": "data"}'
|
||
|
||
# 3. 查看回调重试日志
|
||
docker-compose logs htmltopdf-service | grep Callback
|
||
```
|
||
|
||
**解决方案:**
|
||
- 检查回调 URL 是否正确
|
||
- 检查网络连接(防火墙、DNS)
|
||
- 增加回调超时时间
|
||
- 检查回调服务是否正常运行
|
||
|
||
#### 问题 4:内存占用过高
|
||
|
||
**症状:**
|
||
- 容器内存使用 > 80%
|
||
- 系统响应变慢
|
||
- 浏览器实例崩溃
|
||
|
||
**排查步骤:**
|
||
```bash
|
||
# 1. 查看内存使用
|
||
docker stats htmltopdf-service
|
||
|
||
# 2. 查看浏览器实例数
|
||
curl http://localhost:5000/health | jq .browserPool
|
||
|
||
# 3. 检查是否有内存泄漏
|
||
docker-compose logs htmltopdf-service | grep -i memory
|
||
```
|
||
|
||
**解决方案:**
|
||
- 减少浏览器实例数:`MaxInstances: 5`
|
||
- 启用定期清理:`AutoCleanup: true`
|
||
- 增加容器内存限制
|
||
- 定期重启服务(如每天凌晨)
|
||
|
||
#### 问题 5:队列积压严重
|
||
|
||
**症状:**
|
||
- 队列长度 > 100
|
||
- 平均等待时间 > 30秒
|
||
- 新任务提交返回 503
|
||
|
||
**排查步骤:**
|
||
```bash
|
||
# 1. 查看队列统计
|
||
curl http://localhost:5000/health | jq .taskQueue
|
||
|
||
# 2. 查看 Worker 数量
|
||
docker-compose logs htmltopdf-service | grep Worker
|
||
|
||
# 3. 查看任务处理速率
|
||
curl http://localhost:5000/metrics | grep conversion_tasks_total
|
||
```
|
||
|
||
**解决方案:**
|
||
- 增加 Worker 数量:`MaxConcurrentWorkers: 10`
|
||
- 增加浏览器实例:`MaxInstances: 20`
|
||
- 水平扩展:增加服务实例数
|
||
- 优化任务处理逻辑
|
||
|
||
### 30.2 日志分析
|
||
|
||
**关键日志位置:**
|
||
```bash
|
||
# 应用日志
|
||
docker-compose logs htmltopdf-service
|
||
|
||
# 任务处理日志
|
||
docker-compose logs htmltopdf-service | grep "Task"
|
||
|
||
# 浏览器池日志
|
||
docker-compose logs htmltopdf-service | grep "BrowserPool"
|
||
|
||
# 错误日志
|
||
docker-compose logs htmltopdf-service | grep -i error
|
||
|
||
# 回调日志
|
||
docker-compose logs htmltopdf-service | grep "Callback"
|
||
```
|
||
|
||
**日志级别配置:**
|
||
```json
|
||
{
|
||
"Logging": {
|
||
"LogLevel": {
|
||
"Default": "Information",
|
||
"HtmlToPdfService": "Debug" // 调试时使用 Debug
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 30.3 性能调优检查清单
|
||
|
||
- [ ] 浏览器实例数是否合理(建议:CPU核心数 × 2)
|
||
- [ ] Worker 数量是否足够(建议:MaxConcurrent × 1.5)
|
||
- [ ] 队列容量是否足够(建议:峰值 QPS × 60秒)
|
||
- [ ] 内存限制是否合理(建议:每个实例 2-4GB)
|
||
- [ ] 超时时间是否合理(简单任务 30s,复杂任务 120s)
|
||
- [ ] 是否启用了结果缓存(相同 URL 频繁转换)
|
||
- [ ] 是否定期清理过期文件
|
||
- [ ] 是否启用了连接池(Redis、数据库)
|
||
|
||
---
|
||
|
||
## 📚 三十一、最佳实践
|
||
|
||
### 31.1 任务提交最佳实践
|
||
|
||
**DO(推荐):**
|
||
```json
|
||
{
|
||
"source": {
|
||
"type": "url",
|
||
"content": "https://example.com/page"
|
||
},
|
||
"waitUntil": "networkidle2", // ✅ 适合大多数 SPA
|
||
"timeout": 60000, // ✅ 设置合理超时
|
||
"callback": {
|
||
"url": "https://your-app.com/webhook",
|
||
"includeFileData": false // ✅ 回调不包含文件数据(节省带宽)
|
||
},
|
||
"saveLocal": true // ✅ 保存文件以便下载
|
||
}
|
||
```
|
||
|
||
**DON'T(不推荐):**
|
||
```json
|
||
{
|
||
"source": {
|
||
"type": "html",
|
||
"content": "..." // ❌ 超大 HTML(>10MB)
|
||
},
|
||
"waitUntil": "load", // ❌ 不适合 SPA
|
||
"timeout": 5000, // ❌ 超时时间太短
|
||
"callback": {
|
||
"includeFileData": true // ❌ 大文件 Base64 传输
|
||
}
|
||
}
|
||
```
|
||
|
||
### 31.2 轮询策略
|
||
|
||
**推荐轮询间隔:**
|
||
```
|
||
Pending 状态:每 1-2 秒查询一次
|
||
Processing 状态:每 2-5 秒查询一次
|
||
Completed/Failed 状态:停止轮询
|
||
```
|
||
|
||
**示例代码:**
|
||
```csharp
|
||
public async Task<TaskResult> WaitForCompletionAsync(
|
||
string taskId,
|
||
int maxWaitSeconds = 300)
|
||
{
|
||
var startTime = DateTime.UtcNow;
|
||
var pollInterval = TimeSpan.FromSeconds(2);
|
||
|
||
while (DateTime.UtcNow - startTime < TimeSpan.FromSeconds(maxWaitSeconds))
|
||
{
|
||
var task = await GetTaskAsync(taskId);
|
||
|
||
if (task.Status == "completed" || task.Status == "failed")
|
||
{
|
||
return task;
|
||
}
|
||
|
||
// 根据状态调整轮询间隔
|
||
var interval = task.Status == "processing"
|
||
? TimeSpan.FromSeconds(5)
|
||
: TimeSpan.FromSeconds(2);
|
||
|
||
await Task.Delay(interval);
|
||
}
|
||
|
||
throw new TimeoutException("任务处理超时");
|
||
}
|
||
```
|
||
|
||
### 31.3 错误处理最佳实践
|
||
|
||
**客户端错误处理:**
|
||
```csharp
|
||
try
|
||
{
|
||
var taskId = await SubmitTaskAsync(request);
|
||
var result = await WaitForCompletionAsync(taskId);
|
||
|
||
if (result.Status == "completed")
|
||
{
|
||
var file = await DownloadFileAsync(taskId);
|
||
return file;
|
||
}
|
||
else if (result.Status == "failed")
|
||
{
|
||
// 检查是否可重试
|
||
if (result.Error.Retryable)
|
||
{
|
||
// 自动重试
|
||
return await RetryTaskAsync(taskId);
|
||
}
|
||
else
|
||
{
|
||
throw new ConversionException(result.Error.Message);
|
||
}
|
||
}
|
||
}
|
||
catch (HttpRequestException ex) when (ex.StatusCode == 503)
|
||
{
|
||
// 服务繁忙,稍后重试
|
||
await Task.Delay(5000);
|
||
return await SubmitTaskAsync(request); // 重试
|
||
}
|
||
catch (TimeoutException)
|
||
{
|
||
// 超时,查询任务状态
|
||
var task = await GetTaskAsync(taskId);
|
||
if (task.Status == "completed")
|
||
{
|
||
return await DownloadFileAsync(taskId);
|
||
}
|
||
throw;
|
||
}
|
||
```
|
||
|
||
### 31.4 安全最佳实践
|
||
|
||
1. **API Key 管理**
|
||
- ✅ 使用环境变量存储,不要硬编码
|
||
- ✅ 定期轮换 API Key
|
||
- ✅ 不同环境使用不同的 Key
|
||
- ✅ 限制 API Key 权限(只读/读写)
|
||
|
||
2. **回调安全**
|
||
- ✅ 使用 HTTPS
|
||
- ✅ 验证回调签名
|
||
- ✅ 设置回调超时
|
||
- ✅ 记录回调日志
|
||
|
||
3. **内容安全**
|
||
- ✅ 验证 URL 白名单
|
||
- ✅ 限制 HTML 大小
|
||
- ✅ 阻止内网地址(SSRF 防护)
|
||
- ✅ 过滤恶意脚本
|
||
|
||
### 31.5 性能优化建议
|
||
|
||
1. **任务提交优化**
|
||
- 批量提交多个任务(使用 `/api/tasks/batch`)
|
||
- 避免频繁轮询(使用回调)
|
||
- 设置合理的超时时间
|
||
|
||
2. **结果获取优化**
|
||
- 优先使用回调方式
|
||
- 使用 CDN 加速文件下载
|
||
- 启用结果缓存(相同内容)
|
||
|
||
3. **系统配置优化**
|
||
- 根据实际负载调整 Worker 数量
|
||
- 合理设置浏览器实例数
|
||
- 启用文件自动清理
|
||
|
||
---
|
||
|
||
## 📋 三十二、运维检查清单
|
||
|
||
### 32.1 日常检查(每天)
|
||
|
||
- [ ] 检查服务健康状态:`GET /health`
|
||
- [ ] 查看队列长度是否正常
|
||
- [ ] 检查失败任务数量
|
||
- [ ] 查看系统资源使用(CPU、内存)
|
||
- [ ] 检查磁盘空间
|
||
- [ ] 查看错误日志
|
||
|
||
### 32.2 周度检查(每周)
|
||
|
||
- [ ] 查看任务成功率趋势
|
||
- [ ] 分析平均处理时间
|
||
- [ ] 检查回调成功率
|
||
- [ ] 清理过期文件和任务
|
||
- [ ] 检查系统告警
|
||
- [ ] 查看用户反馈
|
||
|
||
### 32.3 月度检查(每月)
|
||
|
||
- [ ] 性能指标回顾
|
||
- [ ] 容量规划评估
|
||
- [ ] 安全审计
|
||
- [ ] 依赖组件更新检查
|
||
- [ ] 备份验证
|
||
- [ ] 文档更新
|
||
|
||
---
|
||
|
||
## 🎓 三十三、技术参考
|
||
|
||
### 33.1 相关文档链接
|
||
|
||
- [PuppeteerSharp 官方文档](https://www.puppeteersharp.com/)
|
||
- [Chromium 启动参数](https://peter.sh/experiments/chromium-command-line-switches/)
|
||
- [Redis 队列模式](https://redis.io/docs/manual/patterns/queue/)
|
||
- [.NET BackgroundService](https://learn.microsoft.com/en-us/dotnet/core/extensions/queue-service)
|
||
- [Prometheus 指标](https://prometheus.io/docs/concepts/metric_types/)
|
||
|
||
### 33.2 相关工具
|
||
|
||
- **API 测试:** Postman、Insomnia、curl
|
||
- **性能测试:** JMeter、Gatling、k6
|
||
- **监控:** Prometheus、Grafana、Jaeger
|
||
- **日志:** ELK Stack、Loki、Seq
|
||
- **部署:** Docker、Kubernetes、Helm
|
||
|
||
---
|
||
|
||
## 📝 三十四、变更日志模板
|
||
|
||
### 版本 2.0.0(正式版)
|
||
|
||
**新增功能:**
|
||
- ✅ 异步任务处理模式
|
||
- ✅ 任务状态查询和管理
|
||
- ✅ Redis 队列支持
|
||
- ✅ 认证授权(API Key/JWT)
|
||
- ✅ 请求限流
|
||
- ✅ Prometheus 监控
|
||
- ✅ 批量任务处理
|
||
- ✅ 任务重试机制
|
||
|
||
**性能优化:**
|
||
- ✅ 任务提交响应时间 < 200ms
|
||
- ✅ 支持集群部署
|
||
- ✅ 结果缓存机制
|
||
|
||
**安全增强:**
|
||
- ✅ SSRF 防护
|
||
- ✅ 内容安全校验
|
||
- ✅ 回调签名验证
|
||
|
||
---
|
||
|
||
**文档状态**:✅ 完整版
|
||
**文档版本**:v2.0
|
||
**最后更新**:2024-12-10
|
||
**下一步**:根据本文档进行正式版开发
|
||
**预计上线时间**:8-12 周
|
||
|
||
---
|
||
|
||
## 📞 附录:联系方式
|
||
|
||
**技术支持:** [技术支持邮箱]
|
||
**问题反馈:** [GitHub Issues]
|
||
**文档更新:** [文档仓库地址]
|
||
|