使用GPT-4、PDFVector和PostgreSQL导出从文档提取数据
中级
这是一个Document Extraction, Multimodal AI领域的自动化工作流,包含 9 个节点。主要使用 Code, OpenAi, Switch, Postgres, PdfVector 等节点。 使用GPT-4、PDFVector和PostgreSQL导出从文档提取数据
前置要求
- •OpenAI API Key
- •PostgreSQL 数据库连接信息
工作流预览
可视化展示节点连接关系,支持缩放和平移
导出工作流
复制以下 JSON 配置到 n8n 导入,即可使用此工作流
{
"meta": {
"instanceId": "placeholder"
},
"nodes": [
{
"id": "workflow-info",
"name": "流水线信息",
"type": "n8n-nodes-base.stickyNote",
"position": [
250,
150
],
"parameters": {
"content": "## 文档提取流水线"
},
"typeVersion": 1
},
{
"id": "file-trigger",
"name": "监控文件夹",
"type": "n8n-nodes-base.localFileTrigger",
"notes": "Triggers when new documents arrive",
"position": [
450,
300
],
"parameters": {
"path": "/documents/incoming",
"events": [
"file:created"
]
},
"typeVersion": 1
},
{
"id": "pdfvector-parse",
"name": "PDF 向量 - 解析文档",
"type": "n8n-nodes-pdfvector.pdfVector",
"notes": "Parse with LLM for better extraction",
"position": [
650,
300
],
"parameters": {
"useLlm": "always",
"resource": "document",
"operation": "parse",
"documentUrl": "={{ $json.filePath }}"
},
"typeVersion": 1
},
{
"id": "extract-data",
"name": "提取结构化数据",
"type": "n8n-nodes-base.openAi",
"position": [
850,
300
],
"parameters": {
"model": "gpt-4",
"options": {
"responseFormat": {
"type": "json_object"
}
},
"messages": {
"values": [
{
"content": "Extract the following information from this document:\n\n1. Document Type (invoice, contract, report, etc.)\n2. Date/Dates mentioned\n3. Parties involved (names, companies)\n4. Key amounts/values\n5. Important terms or conditions\n6. Reference numbers\n7. Addresses\n8. Contact information\n\nDocument content:\n{{ $json.content }}\n\nReturn as structured JSON."
}
]
}
},
"typeVersion": 1
},
{
"id": "validate-data",
"name": "验证与清理数据",
"type": "n8n-nodes-base.code",
"position": [
1050,
300
],
"parameters": {
"functionCode": "// Validate and clean extracted data\nconst extracted = JSON.parse($json.content);\nconst validated = {};\n\n// Validate document type\nvalidated.documentType = extracted.documentType || 'unknown';\n\n// Parse and validate dates\nif (extracted.date) {\n const date = new Date(extracted.date);\n validated.date = isNaN(date) ? null : date.toISOString();\n}\n\n// Clean monetary values\nif (extracted.amounts) {\n validated.amounts = extracted.amounts.map(amt => {\n const cleaned = amt.replace(/[^0-9.-]/g, '');\n return parseFloat(cleaned) || 0;\n });\n}\n\n// Validate email addresses\nif (extracted.emails) {\n validated.emails = extracted.emails.filter(email => \n /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/.test(email)\n );\n}\n\nvalidated.raw = extracted;\nvalidated.fileName = $node['Watch Folder'].json.fileName;\nvalidated.processedAt = new Date().toISOString();\n\nreturn validated;"
},
"typeVersion": 1
},
{
"id": "route-by-type",
"name": "按文档类型路由",
"type": "n8n-nodes-base.switch",
"position": [
1250,
300
],
"parameters": {
"conditions": {
"string": [
{
"value1": "={{ $json.documentType }}",
"value2": "invoice",
"operation": "equals"
}
]
}
},
"typeVersion": 1
},
{
"id": "store-invoice",
"name": "存储发票数据",
"type": "n8n-nodes-base.postgres",
"position": [
1450,
250
],
"parameters": {
"table": "invoices",
"columns": "invoice_number,vendor,amount,date,raw_data",
"operation": "insert"
},
"typeVersion": 1
},
{
"id": "store-other",
"name": "存储其他文档",
"type": "n8n-nodes-base.postgres",
"position": [
1450,
350
],
"parameters": {
"table": "documents",
"columns": "type,content,metadata,processed_at",
"operation": "insert"
},
"typeVersion": 1
},
{
"id": "export-csv",
"name": "导出到 CSV",
"type": "n8n-nodes-base.writeBinaryFile",
"position": [
1650,
300
],
"parameters": {
"fileName": "extracted_data_{{ $now.format('yyyy-MM-dd') }}.csv",
"fileContent": "={{ $items().map(item => item.json).toCsv() }}"
},
"typeVersion": 1
}
],
"connections": {
"file-trigger": {
"main": [
[
{
"node": "pdfvector-parse",
"type": "main",
"index": 0
}
]
]
},
"store-invoice": {
"main": [
[
{
"node": "export-csv",
"type": "main",
"index": 0
}
]
]
},
"store-other": {
"main": [
[
{
"node": "export-csv",
"type": "main",
"index": 0
}
]
]
},
"validate-data": {
"main": [
[
{
"node": "route-by-type",
"type": "main",
"index": 0
}
]
]
},
"route-by-type": {
"main": [
[
{
"node": "store-invoice",
"type": "main",
"index": 0
}
],
[
{
"node": "store-other",
"type": "main",
"index": 0
}
]
]
},
"extract-data": {
"main": [
[
{
"node": "validate-data",
"type": "main",
"index": 0
}
]
]
},
"pdfvector-parse": {
"main": [
[
{
"node": "extract-data",
"type": "main",
"index": 0
}
]
]
}
}
}常见问题
如何使用这个工作流?
复制上方的 JSON 配置代码,在您的 n8n 实例中创建新工作流并选择「从 JSON 导入」,粘贴配置后根据需要修改凭证设置即可。
这个工作流适合什么场景?
中级 - 文档提取, 多模态 AI
需要付费吗?
本工作流完全免费,您可以直接导入使用。但请注意,工作流中使用的第三方服务(如 OpenAI API)可能需要您自行付费。
相关工作流推荐
使用GPT-4和多数据库搜索自动化学术文献综述
使用GPT-4和多数据库搜索自动化学术文献综述
If
Set
Code
+
If
Set
Code
13 节点PDF Vector
文档提取
使用PDF Vector和HIPAA合规从医疗文档提取临床数据
使用PDF Vector和HIPAA合规从医疗文档提取临床数据
If
Code
Postgres
+
If
Code
Postgres
9 节点PDF Vector
文档提取
使用PDF向量、GPT-4和Neo4j构建学术知识图谱
使用PDF向量、GPT-4和Neo4j从研究论文构建学术知识图谱
Code
Neo4j
Open Ai
+
Code
Neo4j
Open Ai
10 节点PDF Vector
AI RAG 检索增强
跨五个数据库的学术研究搜索,含PDF向量和多重导出
跨五个数据库的学术研究搜索,含PDF向量和多重导出
Set
Code
Pdf Vector
+
Set
Code
Pdf Vector
9 节点PDF Vector
AI RAG 检索增强
自动化学术论文监控,含PDF向量、GPT-3.5和Slack提醒
自动化学术论文监控,含PDF向量、GPT-3.5和Slack提醒
Set
Code
Slack
+
Set
Code
Slack
10 节点PDF Vector
个人效率
企业合同生命周期管理与AI风险分析
企业合同生命周期管理与AI风险分析
If
Code
Merge
+
If
Code
Merge
20 节点PDF Vector
文档提取
工作流信息
难度等级
中级
节点数量9
分类2
节点类型8
作者
PDF Vector
@pdfvectorA fully featured PDF APIs for developers - Parse any PDF or Word document, extract structured data, and access millions of academic papers - all through simple APIs.
外部链接
在 n8n.io 查看 →
分享此工作流