Exportación de datos extraídos de documentos con GPT-4, PDFVector y PostgreSQL
Este es unDocument Extraction, Multimodal AIflujo de automatización del dominio deautomatización que contiene 9 nodos.Utiliza principalmente nodos como Code, OpenAi, Switch, Postgres, PdfVector. Extraer datos de documentos con GPT-4, PDFVector y PostgreSQL para exportar
- •Clave de API de OpenAI
- •Información de conexión de la base de datos PostgreSQL
Nodos utilizados (9)
{
"meta": {
"instanceId": "placeholder"
},
"nodes": [
{
"id": "workflow-info",
"name": "Información del Pipeline",
"type": "n8n-nodes-base.stickyNote",
"position": [
250,
150
],
"parameters": {
"content": "## Document Extraction Pipeline\n\nExtracts structured data from:\n- Invoices\n- Contracts\n- Reports\n- Forms\n\nCustomize extraction rules in the AI node"
},
"typeVersion": 1
},
{
"id": "file-trigger",
"name": "Carpeta de Observación",
"type": "n8n-nodes-base.localFileTrigger",
"notes": "Triggers when new documents arrive",
"position": [
450,
300
],
"parameters": {
"path": "/documents/incoming",
"events": [
"file:created"
]
},
"typeVersion": 1
},
{
"id": "pdfvector-parse",
"name": "PDF Vector - Analizar Documento",
"type": "n8n-nodes-pdfvector.pdfVector",
"notes": "Parse with LLM for better extraction",
"position": [
650,
300
],
"parameters": {
"useLlm": "always",
"resource": "document",
"operation": "parse",
"documentUrl": "={{ $json.filePath }}"
},
"typeVersion": 1
},
{
"id": "extract-data",
"name": "Extraer Datos Estructurados",
"type": "n8n-nodes-base.openAi",
"position": [
850,
300
],
"parameters": {
"model": "gpt-4",
"options": {
"responseFormat": {
"type": "json_object"
}
},
"messages": {
"values": [
{
"content": "Extract the following information from this document:\n\n1. Document Type (invoice, contract, report, etc.)\n2. Date/Dates mentioned\n3. Parties involved (names, companies)\n4. Key amounts/values\n5. Important terms or conditions\n6. Reference numbers\n7. Addresses\n8. Contact information\n\nDocument content:\n{{ $json.content }}\n\nReturn as structured JSON."
}
]
}
},
"typeVersion": 1
},
{
"id": "validate-data",
"name": "Validar y Limpiar Datos",
"type": "n8n-nodes-base.code",
"position": [
1050,
300
],
"parameters": {
"functionCode": "// Validate and clean extracted data\nconst extracted = JSON.parse($json.content);\nconst validated = {};\n\n// Validate document type\nvalidated.documentType = extracted.documentType || 'unknown';\n\n// Parse and validate dates\nif (extracted.date) {\n const date = new Date(extracted.date);\n validated.date = isNaN(date) ? null : date.toISOString();\n}\n\n// Clean monetary values\nif (extracted.amounts) {\n validated.amounts = extracted.amounts.map(amt => {\n const cleaned = amt.replace(/[^0-9.-]/g, '');\n return parseFloat(cleaned) || 0;\n });\n}\n\n// Validate email addresses\nif (extracted.emails) {\n validated.emails = extracted.emails.filter(email => \n /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/.test(email)\n );\n}\n\nvalidated.raw = extracted;\nvalidated.fileName = $node['Watch Folder'].json.fileName;\nvalidated.processedAt = new Date().toISOString();\n\nreturn validated;"
},
"typeVersion": 1
},
{
"id": "route-by-type",
"name": "Enrutar por Tipo de Documento",
"type": "n8n-nodes-base.switch",
"position": [
1250,
300
],
"parameters": {
"conditions": {
"string": [
{
"value1": "={{ $json.documentType }}",
"value2": "invoice",
"operation": "equals"
}
]
}
},
"typeVersion": 1
},
{
"id": "store-invoice",
"name": "Almacenar Datos de Factura",
"type": "n8n-nodes-base.postgres",
"position": [
1450,
250
],
"parameters": {
"table": "invoices",
"columns": "invoice_number,vendor,amount,date,raw_data",
"operation": "insert"
},
"typeVersion": 1
},
{
"id": "store-other",
"name": "Almacenar Otros Documentos",
"type": "n8n-nodes-base.postgres",
"position": [
1450,
350
],
"parameters": {
"table": "documents",
"columns": "type,content,metadata,processed_at",
"operation": "insert"
},
"typeVersion": 1
},
{
"id": "export-csv",
"name": "Exportar a CSV",
"type": "n8n-nodes-base.writeBinaryFile",
"position": [
1650,
300
],
"parameters": {
"fileName": "extracted_data_{{ $now.format('yyyy-MM-dd') }}.csv",
"fileContent": "={{ $items().map(item => item.json).toCsv() }}"
},
"typeVersion": 1
}
],
"connections": {
"file-trigger": {
"main": [
[
{
"node": "pdfvector-parse",
"type": "main",
"index": 0
}
]
]
},
"store-invoice": {
"main": [
[
{
"node": "export-csv",
"type": "main",
"index": 0
}
]
]
},
"store-other": {
"main": [
[
{
"node": "export-csv",
"type": "main",
"index": 0
}
]
]
},
"validate-data": {
"main": [
[
{
"node": "route-by-type",
"type": "main",
"index": 0
}
]
]
},
"route-by-type": {
"main": [
[
{
"node": "store-invoice",
"type": "main",
"index": 0
}
],
[
{
"node": "store-other",
"type": "main",
"index": 0
}
]
]
},
"extract-data": {
"main": [
[
{
"node": "validate-data",
"type": "main",
"index": 0
}
]
]
},
"pdfvector-parse": {
"main": [
[
{
"node": "extract-data",
"type": "main",
"index": 0
}
]
]
}
}
}¿Cómo usar este flujo de trabajo?
Copie el código de configuración JSON de arriba, cree un nuevo flujo de trabajo en su instancia de n8n y seleccione "Importar desde JSON", pegue la configuración y luego modifique la configuración de credenciales según sea necesario.
¿En qué escenarios es adecuado este flujo de trabajo?
Intermedio - Extracción de documentos, IA Multimodal
¿Es de pago?
Este flujo de trabajo es completamente gratuito, puede importarlo y usarlo directamente. Sin embargo, tenga en cuenta que los servicios de terceros utilizados en el flujo de trabajo (como la API de OpenAI) pueden requerir un pago por su cuenta.
Flujos de trabajo relacionados recomendados
PDF Vector
@pdfvectorA fully featured PDF APIs for developers - Parse any PDF or Word document, extract structured data, and access millions of academic papers - all through simple APIs.
Compartir este flujo de trabajo