{
  "meta": {
    "instanceId": "placeholder"
  },
  "nodes": [
    {
      "id": "workflow-info",
      "name": "流水线信息",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        250,
        150
      ],
      "parameters": {
        "content": "## 文档提取流水线"
      },
      "typeVersion": 1
    },
    {
      "id": "file-trigger",
      "name": "监控文件夹",
      "type": "n8n-nodes-base.localFileTrigger",
      "notes": "Triggers when new documents arrive",
      "position": [
        450,
        300
      ],
      "parameters": {
        "path": "/documents/incoming",
        "events": [
          "file:created"
        ]
      },
      "typeVersion": 1
    },
    {
      "id": "pdfvector-parse",
      "name": "PDF 向量 - 解析文档",
      "type": "n8n-nodes-pdfvector.pdfVector",
      "notes": "Parse with LLM for better extraction",
      "position": [
        650,
        300
      ],
      "parameters": {
        "useLlm": "always",
        "resource": "document",
        "operation": "parse",
        "documentUrl": "={{ $json.filePath }}"
      },
      "typeVersion": 1
    },
    {
      "id": "extract-data",
      "name": "提取结构化数据",
      "type": "n8n-nodes-base.openAi",
      "position": [
        850,
        300
      ],
      "parameters": {
        "model": "gpt-4",
        "options": {
          "responseFormat": {
            "type": "json_object"
          }
        },
        "messages": {
          "values": [
            {
              "content": "Extract the following information from this document:\n\n1. Document Type (invoice, contract, report, etc.)\n2. Date/Dates mentioned\n3. Parties involved (names, companies)\n4. Key amounts/values\n5. Important terms or conditions\n6. Reference numbers\n7. Addresses\n8. Contact information\n\nDocument content:\n{{ $json.content }}\n\nReturn as structured JSON."
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "validate-data",
      "name": "验证与清理数据",
      "type": "n8n-nodes-base.code",
      "position": [
        1050,
        300
      ],
      "parameters": {
        "functionCode": "// Validate and clean extracted data\nconst extracted = JSON.parse($json.content);\nconst validated = {};\n\n// Validate document type\nvalidated.documentType = extracted.documentType || 'unknown';\n\n// Parse and validate dates\nif (extracted.date) {\n  const date = new Date(extracted.date);\n  validated.date = isNaN(date) ? null : date.toISOString();\n}\n\n// Clean monetary values\nif (extracted.amounts) {\n  validated.amounts = extracted.amounts.map(amt => {\n    const cleaned = amt.replace(/[^0-9.-]/g, '');\n    return parseFloat(cleaned) || 0;\n  });\n}\n\n// Validate email addresses\nif (extracted.emails) {\n  validated.emails = extracted.emails.filter(email => \n    /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/.test(email)\n  );\n}\n\nvalidated.raw = extracted;\nvalidated.fileName = $node['Watch Folder'].json.fileName;\nvalidated.processedAt = new Date().toISOString();\n\nreturn validated;"
      },
      "typeVersion": 1
    },
    {
      "id": "route-by-type",
      "name": "按文档类型路由",
      "type": "n8n-nodes-base.switch",
      "position": [
        1250,
        300
      ],
      "parameters": {
        "conditions": {
          "string": [
            {
              "value1": "={{ $json.documentType }}",
              "value2": "invoice",
              "operation": "equals"
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "store-invoice",
      "name": "存储发票数据",
      "type": "n8n-nodes-base.postgres",
      "position": [
        1450,
        250
      ],
      "parameters": {
        "table": "invoices",
        "columns": "invoice_number,vendor,amount,date,raw_data",
        "operation": "insert"
      },
      "typeVersion": 1
    },
    {
      "id": "store-other",
      "name": "存储其他文档",
      "type": "n8n-nodes-base.postgres",
      "position": [
        1450,
        350
      ],
      "parameters": {
        "table": "documents",
        "columns": "type,content,metadata,processed_at",
        "operation": "insert"
      },
      "typeVersion": 1
    },
    {
      "id": "export-csv",
      "name": "导出到 CSV",
      "type": "n8n-nodes-base.writeBinaryFile",
      "position": [
        1650,
        300
      ],
      "parameters": {
        "fileName": "extracted_data_{{ $now.format('yyyy-MM-dd') }}.csv",
        "fileContent": "={{ $items().map(item => item.json).toCsv() }}"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "file-trigger": {
      "main": [
        [
          {
            "node": "pdfvector-parse",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "store-invoice": {
      "main": [
        [
          {
            "node": "export-csv",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "store-other": {
      "main": [
        [
          {
            "node": "export-csv",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "validate-data": {
      "main": [
        [
          {
            "node": "route-by-type",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "route-by-type": {
      "main": [
        [
          {
            "node": "store-invoice",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "store-other",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "extract-data": {
      "main": [
        [
          {
            "node": "validate-data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "pdfvector-parse": {
      "main": [
        [
          {
            "node": "extract-data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

常见问题

如何使用这个工作流？

复制上方的 JSON 配置代码，在您的 n8n 实例中创建新工作流并选择「从 JSON 导入」，粘贴配置后根据需要修改凭证设置即可。

这个工作流适合什么场景？

中级 - 文档提取, 多模态 AI

需要付费吗？

本工作流完全免费，您可以直接导入使用。但请注意，工作流中使用的第三方服务（如 OpenAI API）可能需要您自行付费。

使用GPT-4、PDFVector和PostgreSQL导出从文档提取数据

使用的节点 (9)

分类

如何使用这个工作流？

这个工作流适合什么场景？

需要付费吗？

相关工作流推荐

分类