Flugcheck-in-Scraper für Airlines mit n8n, KI und Vektor-Datenbank-Speicherung
Dies ist ein Document Extraction, AI RAG-Bereich Automatisierungsworkflow mit 14 Nodes. Hauptsächlich werden Wait, HttpRequest, GoogleSheets, SplitInBatches, ChainLlm und andere Nodes verwendet. Daten der Online-Check-in von Fluggesellschaften mittels Ollama AI, Google Sheets und Postgres-Vektor-Datenbank extrahieren
- •Möglicherweise sind Ziel-API-Anmeldedaten erforderlich
- •Google Sheets API-Anmeldedaten
Verwendete Nodes (14)
Kategorie
{
"id": "FLn2skSh92HNO2SS",
"meta": {
"instanceId": "dd69efaf8212c74ad206700d104739d3329588a6f3f8381a46a481f34c9cc281",
"templateCredsSetupCompleted": true
},
"name": "Airline Web Check-in Scraper with AI & Vector DB Storage using n8n",
"tags": [],
"nodes": [
{
"id": "ee7a49bf-a2dc-4d12-aef0-9add291a398c",
"name": "Über Elemente iterieren",
"type": "n8n-nodes-base.splitInBatches",
"position": [
-220,
175
],
"parameters": {
"options": {}
},
"typeVersion": 3
},
{
"id": "049cfbd5-bbc7-483c-964e-a32cdab1e6b8",
"name": "Airline-URLs abrufen",
"type": "n8n-nodes-base.googleSheets",
"position": [
-440,
175
],
"parameters": {
"options": {},
"sheetName": {
"__rl": true,
"mode": "list",
"value": 2125635496,
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit#gid=2125635496",
"cachedResultName": "Sheet1"
},
"documentId": {
"__rl": true,
"mode": "list",
"value": "1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit?usp=drivesdk",
"cachedResultName": "airline_faq_urls"
},
"authentication": "serviceAccount"
},
"credentials": {
"googleApi": {
"id": "ScSS2KxGQULuPtdy",
"name": "Google Sheets- test"
}
},
"typeVersion": 4.5
},
{
"id": "7e2ca713-229f-490c-bd2e-481cf8f18184",
"name": "Chat Trigger - Start",
"type": "@n8n/n8n-nodes-langchain.chatTrigger",
"position": [
-660,
175
],
"webhookId": "6c85024c-928b-4f43-82b3-d1469283586f",
"parameters": {
"public": true,
"options": {}
},
"typeVersion": 1.1
},
{
"id": "c11c66ea-3e36-4c12-a263-109d03d8be1a",
"name": "Airline-Webseite scrapen",
"type": "n8n-nodes-base.httpRequest",
"position": [
0,
0
],
"parameters": {
"url": "=https://r.jina.ai/{{ $json['WEB CHECK IN URL'] }}",
"method": "POST",
"options": {},
"jsonHeaders": "{\n \"Cookie\": \"cookie-keyname1=cookie-value1; cookie-keyname2=cookie-value2; cookie-keyname3=cookie-value3\"\n}\n",
"sendHeaders": true,
"authentication": "genericCredentialType",
"specifyHeaders": "json",
"genericAuthType": "httpHeaderAuth"
},
"credentials": {
"httpHeaderAuth": {
"id": "KCqBydsOZHvzNKAI",
"name": "Header Auth account"
}
},
"typeVersion": 4.2
},
{
"id": "27072e20-58dc-49e2-ae6b-1053750607f9",
"name": "Informationen mit LLM extrahieren",
"type": "@n8n/n8n-nodes-langchain.chainLlm",
"position": [
220,
0
],
"parameters": {
"text": "={{ $json.data }}",
"messages": {
"messageValues": [
{
"message": "=You are an intelligent parser trained to extract structured data from messy airline webpages.\n\nYour task is to extract and return well-structured airline check-in and policy details from raw text. Always return the result as a clean, valid JSON object using the exact schema described below.\n\n---\n\nExtraction Guidelines:\n\n* Ensure consistent JSON structure for every airline.\n* If a key has no value or content, **remove it** (do not return null, empty arrays, or empty objects).\n* Include any other useful data under `\"additional_info\"` if it doesn’t fit existing keys.\n* Always extract **direct URLs** wherever available.\n* Your output should be compact, valid, and readable JSON.\n\n---\n\nJSON Structure Format:\n\n1. Web Check-in Details\n\n* \"web\\_checkin\\_available\": true/false\n* \"checkin\\_url\": \"<URL>\"\n* \"checkin\\_methods\": \\[\"Online\", \"Mobile App\", \"Kiosk\", \"Airport Counter\"]\n* \"checkin\\_start\": \"<Start Time>\"\n* \"checkin\\_deadline\": \"<Deadline Time>\"\n* \"boarding\\_pass\\_options\":\n\n * \"mobile\\_boarding\\_pass\\_available\": true/false\n * \"printed\\_boarding\\_pass\\_required\": true/false\n * \"additional\\_checkin\\_info\": \"<Extra instructions>\"\n\n2. Customer Support\n\n* \"customer\\_support\":\n\n * \"phone\": \"<Phone Number>\"\n * \"email\": \"<Email>\"\n * \"support\\_url\": \"<Support URL>\"\n * \"chat\\_url\": \"<Chat URL>\"\n * \"operating\\_hours\": \"<Hours>\"\n * \"additional\\_help\\_channels\": \\[\"WhatsApp\", \"Twitter Support\", \"Chatbot\"]\n\n3. Baggage Allowance\n\n* \"baggage\\_allowance\":\n\n * \"hand\\_baggage\":\n\n * \"weight\\_limit\": \"<Weight Limit>\"\n * \"size\\_limit\": \"<Size Limit>\"\n * \"additional\\_items\\_allowed\": \\[\"Handbag\", \"Laptop\", \"Baby Items\", \"Medical Equipment\"]\n * \"special\\_conditions\": \"<Any special baggage conditions>\"\n * \"checked\\_baggage\":\n\n * \"general\\_rules\": \"<Baggage Rules>\"\n * \"class\\_specific\\_limits\": \"<Limits for different travel classes>\"\n * \"baggage\\_calculator\\_url\": \"<URL>\"\n * \"oversized\\_special\\_baggage\": \"\\<Details on sports/music equipment>\"\n\n4. Refund & Cancellation Policy\n\n* \"refund\\_policy\":\n\n * \"conditions\": \"<Refund conditions>\"\n * \"processing\\_time\": \"<Processing Time>\"\n * \"refund\\_policy\\_url\": \"<URL>\"\n* \"cancellation\\_policy\":\n\n * \"conditions\": \"<Cancellation conditions>\"\n * \"fees\\_or\\_penalties\": \"<Cancellation Fees>\"\n * \"cancellation\\_policy\\_url\": \"<URL>\"\n\n5. Airport & Travel Guidelines\n\n* \"airport\\_travel\\_guidelines\":\n\n * \"security\\_immigration\\_rules\": \"\\<Security & Immigration Rules>\"\n * \"airport\\_checkin\\_requirements\": \"<Check-in document requirements>\"\n * \"special\\_services\": \\[\"Priority Boarding\", \"Lounge Access\", \"Wheelchair Assistance\"]\n\n6. Frequently Asked Questions (FAQ)\n\n* \"faq\":\n\n * \"faq\\_url\": \"<FAQ Page URL>\"\n * \"questions\\_answers\": \\[\n {\"question\": \"<Q1>\", \"answer\": \"<A1>\"},\n {\"question\": \"<Q2>\", \"answer\": \"<A2>\"}\n ]\n\n7. Additional Details\n\n* \"additional\\_info\": \"<Any extra info from the page not captured above>\"\n\n---\n\nOutput Rules Summary:\n\n* Always return valid JSON.\n* Do **not** include empty fields or structures (null, {}, or \\[]).\n* Place unrelated or extra info under \"additional\\_info\".\n\nUse this structure for every page, even if some values are missing. Just remove the missing fields completely in the output.\n\n\n"
},
{
"type": "AIMessagePromptTemplate",
"message": "=You are an intelligent parser trained to extract structured data from messy airline webpages.\n\nYour task is to extract and return well-structured airline check-in and policy details from raw text. Always return the result as a clean, valid JSON object using the exact schema described below.\n\n---\n\nExtraction Guidelines:\n\n* Ensure consistent JSON structure for every airline.\n* If a key has no value or content, **remove it** (do not return null, empty arrays, or empty objects).\n* Include any other useful data under `\"additional_info\"` if it doesn’t fit existing keys.\n* Always extract **direct URLs** wherever available.\n* Your output should be compact, valid, and readable JSON.\n\n---\n\nJSON Structure Format:\n\n1. Web Check-in Details\n\n* \"web\\_checkin\\_available\": true/false\n* \"checkin\\_url\": \"<URL>\"\n* \"checkin\\_methods\": \\[\"Online\", \"Mobile App\", \"Kiosk\", \"Airport Counter\"]\n* \"checkin\\_start\": \"<Start Time>\"\n* \"checkin\\_deadline\": \"<Deadline Time>\"\n* \"boarding\\_pass\\_options\":\n\n * \"mobile\\_boarding\\_pass\\_available\": true/false\n * \"printed\\_boarding\\_pass\\_required\": true/false\n * \"additional\\_checkin\\_info\": \"<Extra instructions>\"\n\n2. Customer Support\n\n* \"customer\\_support\":\n\n * \"phone\": \"<Phone Number>\"\n * \"email\": \"<Email>\"\n * \"support\\_url\": \"<Support URL>\"\n * \"chat\\_url\": \"<Chat URL>\"\n * \"operating\\_hours\": \"<Hours>\"\n * \"additional\\_help\\_channels\": \\[\"WhatsApp\", \"Twitter Support\", \"Chatbot\"]\n\n3. Baggage Allowance\n\n* \"baggage\\_allowance\":\n\n * \"hand\\_baggage\":\n\n * \"weight\\_limit\": \"<Weight Limit>\"\n * \"size\\_limit\": \"<Size Limit>\"\n * \"additional\\_items\\_allowed\": \\[\"Handbag\", \"Laptop\", \"Baby Items\", \"Medical Equipment\"]\n * \"special\\_conditions\": \"<Any special baggage conditions>\"\n * \"checked\\_baggage\":\n\n * \"general\\_rules\": \"<Baggage Rules>\"\n * \"class\\_specific\\_limits\": \"<Limits for different travel classes>\"\n * \"baggage\\_calculator\\_url\": \"<URL>\"\n * \"oversized\\_special\\_baggage\": \"\\<Details on sports/music equipment>\"\n\n4. Refund & Cancellation Policy\n\n* \"refund\\_policy\":\n\n * \"conditions\": \"<Refund conditions>\"\n * \"processing\\_time\": \"<Processing Time>\"\n * \"refund\\_policy\\_url\": \"<URL>\"\n* \"cancellation\\_policy\":\n\n * \"conditions\": \"<Cancellation conditions>\"\n * \"fees\\_or\\_penalties\": \"<Cancellation Fees>\"\n * \"cancellation\\_policy\\_url\": \"<URL>\"\n\n5. Airport & Travel Guidelines\n\n* \"airport\\_travel\\_guidelines\":\n\n * \"security\\_immigration\\_rules\": \"\\<Security & Immigration Rules>\"\n * \"airport\\_checkin\\_requirements\": \"<Check-in document requirements>\"\n * \"special\\_services\": \\[\"Priority Boarding\", \"Lounge Access\", \"Wheelchair Assistance\"]\n\n6. Frequently Asked Questions (FAQ)\n\n* \"faq\":\n\n * \"faq\\_url\": \"<FAQ Page URL>\"\n * \"questions\\_answers\": \\[\n {\"question\": \"<Q1>\", \"answer\": \"<A1>\"},\n {\"question\": \"<Q2>\", \"answer\": \"<A2>\"}\n ]\n\n7. Additional Details\n\n* \"additional\\_info\": \"<Any extra info from the page not captured above>\"\n\n---\n\nOutput Rules Summary:\n\n* Always return valid JSON.\n* Do **not** include empty fields or structures (null, {}, or \\[]).\n* Place unrelated or extra info under \"additional\\_info\".\n\nUse this structure for every page, even if some values are missing. Just remove the missing fields completely in the output.\n\n\n"
}
]
},
"promptType": "define"
},
"typeVersion": 1.5
},
{
"id": "ba090b45-e6e8-434a-9577-51d281dd4a5b",
"name": "Chat Model",
"type": "@n8n/n8n-nodes-langchain.lmChatOllama",
"position": [
308,
220
],
"parameters": {
"options": {}
},
"credentials": {
"ollamaApi": {
"id": "7td3WzXCW2wNhraP",
"name": "Ollama - test"
}
},
"typeVersion": 1
},
{
"id": "d557adab-856e-460e-aa81-f929a66ca465",
"name": "Auf Antwort warten",
"type": "n8n-nodes-base.wait",
"position": [
580,
0
],
"webhookId": "b29f8fd3-b6ff-43ee-878b-17de4b411f99",
"parameters": {},
"typeVersion": 1.1
},
{
"id": "279b24fc-e1f3-4a1c-9c70-0177b13f32d8",
"name": "Extrahiere Informationen speichern",
"type": "n8n-nodes-base.googleSheets",
"position": [
816,
0
],
"parameters": {
"columns": {
"value": {
"row_number": "={{ $('Loop Over Items').item.json.row_number }}",
"web check in details": "={{ $json.text.removeTags().replace(/^```json|```$/g, '').trim() }}"
},
"schema": [
{
"id": "Airline",
"type": "string",
"display": true,
"required": false,
"displayName": "Airline",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "WEB CHECK IN URL",
"type": "string",
"display": true,
"removed": true,
"required": false,
"displayName": "WEB CHECK IN URL",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "web check in details",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "web check in details",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "output",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "output",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "row_number",
"type": "string",
"display": true,
"removed": false,
"readOnly": true,
"required": false,
"displayName": "row_number",
"defaultMatch": false,
"canBeUsedToMatch": true
}
],
"mappingMode": "defineBelow",
"matchingColumns": [
"row_number"
],
"attemptToConvertTypes": false,
"convertFieldsToString": false
},
"options": {},
"operation": "update",
"sheetName": {
"__rl": true,
"mode": "list",
"value": 2125635496,
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit#gid=2125635496",
"cachedResultName": "Sheet1"
},
"documentId": {
"__rl": true,
"mode": "list",
"value": "1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit?usp=drivesdk",
"cachedResultName": "airline_faq_urls"
},
"authentication": "serviceAccount"
},
"credentials": {
"googleApi": {
"id": "ScSS2KxGQULuPtdy",
"name": "Google Sheets- test"
}
},
"typeVersion": 4.5
},
{
"id": "866e9eca-68ad-419e-acf0-c28141bf7727",
"name": "Embeddings erzeugen",
"type": "@n8n/n8n-nodes-langchain.embeddingsOllama",
"position": [
1036,
220
],
"parameters": {},
"credentials": {
"ollamaApi": {
"id": "7td3WzXCW2wNhraP",
"name": "Ollama - test"
}
},
"typeVersion": 1
},
{
"id": "56553cae-a61f-4b64-8709-06dbab314bce",
"name": "Text für Vektordatenbank vorbereiten",
"type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
"position": [
1156,
222.5
],
"parameters": {
"options": {}
},
"typeVersion": 1
},
{
"id": "82da65d6-9ecd-451a-b2f8-466795cd07a0",
"name": "Lange Texte aufteilen",
"type": "@n8n/n8n-nodes-langchain.textSplitterTokenSplitter",
"position": [
1244,
420
],
"parameters": {
"chunkSize": 10000
},
"typeVersion": 1
},
{
"id": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"name": "In Vektordatenbank speichern",
"type": "@n8n/n8n-nodes-langchain.vectorStorePGVector",
"position": [
1052,
0
],
"parameters": {
"mode": "insert",
"options": {
"collection": {
"values": {
"useCollection": true
}
}
}
},
"credentials": {
"postgres": {
"id": "4Y4qEFGqF2krfRHZ",
"name": "Postgres-test"
}
},
"typeVersion": 1
},
{
"id": "7c4941f0-4dff-49d0-ac9b-901a23987686",
"name": "Vor nächstem Batch warten",
"type": "n8n-nodes-base.wait",
"position": [
1532,
175
],
"webhookId": "43d5c764-27a7-4b37-b879-96ebd8c84fce",
"parameters": {
"amount": 15
},
"typeVersion": 1.1
},
{
"id": "0fef87ce-bfc3-4edd-aff8-8e10a0e7489a",
"name": "Notizzettel",
"type": "n8n-nodes-base.stickyNote",
"position": [
-740,
-860
],
"parameters": {
"width": 660,
"height": 860,
"content": "\n\n### 📝 Web Check-in Details Extractor (LLM Prompt Guide)\n\n#### ✅ What is this?\n\nThis is a powerful AI prompt used inside the **\"Basic LLM Chain\"** node. It tells the AI how to **extract structured airline web check-in data** (like check-in time, baggage policy, cancellation rules) from messy airline webpages.\n\n#### 🎯 Why is it used?\n\nAirline websites often present data in unstructured formats. This LLM-based step:\n\n* Cleans the content scraped from airline URLs.\n* Extracts important travel-related info in a consistent JSON format.\n* Helps automate the enrichment of airline data stored in your Google Sheets and Vector DB.\n\n#### 🛠️ How to use it?\n\n1. **Input**: This node receives raw webpage content from the airline’s \"Web Check-in URL\".\n2. **Prompt**: It applies a fixed set of rules (in natural language) to guide the AI to convert the unstructured data into clean JSON.\n3. **Output**: The AI returns a **structured JSON** object with fields like:\n\n * `checkin_url`\n * `baggage_allowance`\n * `refund_policy`\n * `faq`\n * `additional_info`\n4. The next nodes save this output to:\n\n * Google Sheet (for visibility)\n * PGVector (for semantic search)\n\n💡 **Pro Tip:** This works best when the HTML content is readable and includes useful labels like “Check-in”, “Cancellation”, “Support”, etc.\n\n\n"
},
"typeVersion": 1
}
],
"active": false,
"pinData": {},
"settings": {
"executionOrder": "v1"
},
"versionId": "b785e5c9-19bf-42c0-8c99-1659b1c2509b",
"connections": {
"ba090b45-e6e8-434a-9577-51d281dd4a5b": {
"ai_languageModel": [
[
{
"node": "27072e20-58dc-49e2-ae6b-1053750607f9",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"ee7a49bf-a2dc-4d12-aef0-9add291a398c": {
"main": [
[],
[
{
"node": "c11c66ea-3e36-4c12-a263-109d03d8be1a",
"type": "main",
"index": 0
}
]
]
},
"82da65d6-9ecd-451a-b2f8-466795cd07a0": {
"ai_textSplitter": [
[
{
"node": "56553cae-a61f-4b64-8709-06dbab314bce",
"type": "ai_textSplitter",
"index": 0
}
]
]
},
"d557adab-856e-460e-aa81-f929a66ca465": {
"main": [
[
{
"node": "279b24fc-e1f3-4a1c-9c70-0177b13f32d8",
"type": "main",
"index": 0
}
]
]
},
"a670a7c1-af95-452d-92e5-a82d5be2d0a5": {
"main": [
[
{
"node": "7c4941f0-4dff-49d0-ac9b-901a23987686",
"type": "main",
"index": 0
}
]
]
},
"049cfbd5-bbc7-483c-964e-a32cdab1e6b8": {
"main": [
[
{
"node": "ee7a49bf-a2dc-4d12-aef0-9add291a398c",
"type": "main",
"index": 0
}
]
]
},
"866e9eca-68ad-419e-acf0-c28141bf7727": {
"ai_embedding": [
[
{
"node": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"type": "ai_embedding",
"index": 0
}
]
]
},
"7e2ca713-229f-490c-bd2e-481cf8f18184": {
"main": [
[
{
"node": "049cfbd5-bbc7-483c-964e-a32cdab1e6b8",
"type": "main",
"index": 0
}
]
]
},
"279b24fc-e1f3-4a1c-9c70-0177b13f32d8": {
"main": [
[
{
"node": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"type": "main",
"index": 0
}
]
]
},
"27072e20-58dc-49e2-ae6b-1053750607f9": {
"main": [
[
{
"node": "d557adab-856e-460e-aa81-f929a66ca465",
"type": "main",
"index": 0
}
]
]
},
"c11c66ea-3e36-4c12-a263-109d03d8be1a": {
"main": [
[
{
"node": "27072e20-58dc-49e2-ae6b-1053750607f9",
"type": "main",
"index": 0
}
]
]
},
"7c4941f0-4dff-49d0-ac9b-901a23987686": {
"main": [
[
{
"node": "ee7a49bf-a2dc-4d12-aef0-9add291a398c",
"type": "main",
"index": 0
}
]
]
},
"56553cae-a61f-4b64-8709-06dbab314bce": {
"ai_document": [
[
{
"node": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"type": "ai_document",
"index": 0
}
]
]
}
}
}Wie verwende ich diesen Workflow?
Kopieren Sie den obigen JSON-Code, erstellen Sie einen neuen Workflow in Ihrer n8n-Instanz und wählen Sie "Aus JSON importieren". Fügen Sie die Konfiguration ein und passen Sie die Anmeldedaten nach Bedarf an.
Für welche Szenarien ist dieser Workflow geeignet?
Fortgeschritten - Dokumentenextraktion, KI RAG
Ist es kostenpflichtig?
Dieser Workflow ist völlig kostenlos. Beachten Sie jedoch, dass Drittanbieterdienste (wie OpenAI API), die im Workflow verwendet werden, möglicherweise kostenpflichtig sind.
Verwandte Workflows
Oneclick AI Squad
@oneclick-aiThe AI Squad Initiative is a pioneering effort to build, automate and scale AI-powered workflows using n8n.io. Our mission is to help individuals and businesses integrate AI agents seamlessly into their daily operations from automating tasks and enhancing productivity to creating innovative, intelligent solutions. We design modular, reusable AI workflow templates that empower creators, developers and teams to supercharge their automation with minimal effort and maximum impact.
Diesen Workflow teilen