毎日のRAG研究論文センター:arXiv、Gemini AI、Notion
上級
これはContent Creation, Multimodal AI分野の自動化ワークフローで、22個のノードを含みます。主にIf, Code, Gmail, Notion, Switchなどのノードを使用。 毎日のRAG研究 Paper Hub: arXiv、Gemini AI、Notion
前提条件
- •Googleアカウント + Gmail API認証情報
- •Notion API Key
- •ターゲットAPIの認証情報が必要な場合あり
- •Google Gemini API Key
使用ノード (22)
ワークフロープレビュー
ノード接続関係を可視化、ズームとパンをサポート
ワークフローをエクスポート
以下のJSON設定をn8nにインポートして、このワークフローを使用できます
{
"meta": {
"instanceId": "a6011e4876c6b1225fa48dae1dbfa92e1932a633b3186bbb7bfd5c9e6ad2d878"
},
"nodes": [
{
"id": "7e9f18f1-edfe-4af6-835b-12fe16a99034",
"name": "Basic LLM チェーン",
"type": "@n8n/n8n-nodes-langchain.chainLlm",
"position": [
272,
0
],
"parameters": {
"text": "={{ $json.data }}",
"batching": {},
"messages": {
"messageValues": [
{
"message": "You are a paper content analysis assistant. You can analyze and inspect JSON data, accurately identify the content in the `summary` field, make judgments, and enrich the data items. The main tasks are as follows:\n\n1. RAG Relevance and Labeling:\n - Analyze the `summary` field to determine whether the content is related to RAG (Retrieval-Augmented Generation) and assign labels.\n - For each data item, add three new fields:\n - `RAG_TF`: \"T\" if related, \"F\" if not\n - `RAG_REASON`: if not related, provide the reason in English; otherwise, leave empty\n - `RAG_Category`: if related, assign a category label based on the `summary` content (e.g., Framework / Application / …); otherwise, leave empty\n\n2. RAG Method Extraction:\n - Analyze the `summary` and extract the RAG method proposed in the paper.\n - Store it in the new field `RAG_NAME`.\n\n3. External Link Extraction:\n - Analyze the `summary` content for `github` or `huggingface` links.\n - If present, extract the URLs and populate the existing `github` and `huggingface` fields.\n - If not present, leave them unchanged.\n\nOutput Format: standard JSON\n\nExample:\n\nGiven a data item with the following `summary`:\n\n\"summary\":\"Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answer\n"
}
]
},
"promptType": "define"
},
"typeVersion": 1.7
},
{
"id": "92d37dc1-aaaf-47ec-987a-e6d23c93e055",
"name": "Google Gemini チャットモデル",
"type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
"position": [
272,
144
],
"parameters": {
"options": {},
"modelName": "=models/gemini-2.5-flash"
},
"credentials": {
"googlePalmApi": {
"id": "ra9slZSGvLJTHQw1",
"name": "Google Gemini(PaLM) Api account"
}
},
"typeVersion": 1
},
{
"id": "aaa67776-c308-443e-98f6-e1fe7035cbb5",
"name": "submittedDate:T-1",
"type": "n8n-nodes-base.code",
"position": [
-1664,
320
],
"parameters": {
"jsCode": "// Function 节点代码\nconst now = new Date();\nconst yesterday = new Date(now);\nyesterday.setDate(now.getDate() - 2);\n\nconst y = yesterday.getFullYear();\nconst m = String(yesterday.getMonth() + 1).padStart(2, '0');\nconst d = String(yesterday.getDate()).padStart(2, '0');\n\nreturn [\n {\n json: {\n from: `${y}${m}${d}0000`,\n to: `${y}${m}${d}2359`\n }\n }\n];\n"
},
"typeVersion": 2
},
{
"id": "c3685631-8bbd-409a-978a-fbb3e9847115",
"name": "If",
"type": "n8n-nodes-base.if",
"position": [
-160,
16
],
"parameters": {
"options": {},
"conditions": {
"options": {
"version": 2,
"leftValue": "",
"caseSensitive": true,
"typeValidation": "strict"
},
"combinator": "and",
"conditions": [
{
"id": "de0a5a7e-67dd-4dd0-8ccc-3406e17bd09c",
"operator": {
"type": "number",
"operation": "notEquals"
},
"leftValue": "={{ $json.paperCount }}",
"rightValue": 0
}
]
}
},
"typeVersion": 2.2
},
{
"id": "4dd24343-1872-472d-8d7d-4cd28a9dbabe",
"name": "スケジュールトリガー",
"type": "n8n-nodes-base.scheduleTrigger",
"position": [
-1856,
320
],
"parameters": {
"rule": {
"interval": [
{
"triggerAtHour": 6
}
]
}
},
"typeVersion": 1.2
},
{
"id": "a38b1b58-a6f6-4c6b-ba6e-f153980a220d",
"name": "FEISHU",
"type": "n8n-nodes-base.switch",
"position": [
576,
720
],
"parameters": {
"rules": {
"values": [
{
"conditions": {
"options": {
"version": 2,
"leftValue": "",
"caseSensitive": true,
"typeValidation": "strict"
},
"combinator": "and",
"conditions": [
{
"id": "7b804f5e-6702-4d4a-99b9-3f06f8eb20d4",
"operator": {
"type": "string",
"operation": "equals"
},
"leftValue": "={{ $json.type }}",
"rightValue": "feishu"
}
]
}
}
]
},
"options": {}
},
"typeVersion": 3.2
},
{
"id": "ac6b1c0d-b18e-4b42-b49e-8cb4daf0d384",
"name": "FEISHU POST",
"type": "n8n-nodes-base.httpRequest",
"position": [
800,
720
],
"parameters": {
"url": "=",
"method": "POST",
"options": {},
"sendBody": true,
"bodyParameters": {
"parameters": [
{
"name": "msg_type",
"value": "={{ $json.msg_type }}"
},
{
"name": "content",
"value": "={{ $json.content }}"
}
]
}
},
"typeVersion": 4.2
},
{
"id": "9151ab18-379f-4d3b-8ca2-cf65c547e78d",
"name": "gmail",
"type": "n8n-nodes-base.switch",
"position": [
576,
544
],
"parameters": {
"rules": {
"values": [
{
"conditions": {
"options": {
"version": 2,
"leftValue": "",
"caseSensitive": true,
"typeValidation": "strict"
},
"combinator": "and",
"conditions": [
{
"id": "3222832c-bbf2-46a2-abd8-2bb14095b7bf",
"operator": {
"type": "string",
"operation": "equals"
},
"leftValue": "={{ $json.type }}",
"rightValue": "gmail"
}
]
}
}
]
},
"options": {}
},
"typeVersion": 3.2
},
{
"id": "869f80ec-c14c-4d1e-ae11-bb6eb4c99e5d",
"name": "Send a message",
"type": "n8n-nodes-base.gmail",
"position": [
800,
544
],
"webhookId": "cb0a1f30-59e0-4505-af24-db689d9c1f23",
"parameters": {
"sendTo": "xing.adam@gmail.com",
"message": "={{ $json.message }}",
"options": {},
"subject": "={{ $json.subject }}"
},
"credentials": {
"gmailOAuth2": {
"id": "WoyY5hj4D93bD2Fp",
"name": "Gmail account"
}
},
"typeVersion": 2.1
},
{
"id": "3df82b76-e9c8-4b0b-a552-428f2fc12c97",
"name": "Message a model",
"type": "@n8n/n8n-nodes-langchain.googleGemini",
"position": [
-1040,
320
],
"parameters": {
"modelId": {
"__rl": true,
"mode": "list",
"value": "models/gemini-2.5-flash-lite",
"cachedResultName": "models/gemini-2.5-flash-lite"
},
"options": {},
"messages": {
"values": [
{
"role": "model",
"content": "You are a daily paper content summarization assistant capable of analyzing XML data. Your main tasks are as follows:\n\n1. Set the daily title field `Title`: {yyyy-mm-dd} paper summary\n2. Set the daily date field `Date`: yyyy-mm-dd\n3. Identify the `<opensearch:totalResults>` tag in the XML and set its numeric value to the field `Number of papers`.\n4. Provide a brief summary of all papers for the day, covering all topics. Set the Chinese summary as `SUMMARY_CN` and the English summary as `SUMMARY_EN`. Ensure that both summaries reflect the comprehensive summary of all papers for the day.\n5. Output format: standard JSON. If there are no papers for the day, set `Number of papers` to 0, but still include the `SUMMARY_CN` and `SUMMARY_EN` fields with empty content.\n\nExample: If there are papers:\n{\n \"Number of papers\":\"2025-09-13 paper summary\",\n \"Date\":2025-09-13,\n \"Number of papers\": 2,\n \"SUMMARY_CN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a KG based on climate publications to improve access and utilization of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm AGP-Static++ and enhancing dynamic graph support for better query and update efficiency.\",\n \"SUMMARY_EN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a domain-specific KG built from climate publications aimed at improving access and use of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm, AGP-Static++, and improving dynamic graph support, enhancing query and update efficiency.\"\n}\n\nIf the number of papers is 0, maintain the JSON structure:\n{\n \"Number of papers\":\"2025-09-13 paper summary\",\n \"Date\":2025-09-13,\n \"Number of papers\": 0,\n \"SUMMARY_CN\": \"\",\n \"SUMMARY_EN\": \"\"\n}"
},
{
"content": "={{ $json.data }}"
}
]
},
"simplify": false
},
"credentials": {
"googlePalmApi": {
"id": "ra9slZSGvLJTHQw1",
"name": "Google Gemini(PaLM) Api account"
}
},
"typeVersion": 1
},
{
"id": "024c6399-857e-45a3-a15d-8b733e16da67",
"name": "RAG Daily Paper Summary",
"type": "n8n-nodes-base.notion",
"position": [
800,
320
],
"parameters": {
"title": "={{ $json.title }}",
"simple": false,
"options": {},
"resource": "databasePage",
"databaseId": {
"__rl": true,
"mode": "list",
"value": "26fa136d-cee4-8092-8b85-cf9e9cbc424f",
"cachedResultUrl": "https://www.notion.so/26fa136dcee480928b85cf9e9cbc424f",
"cachedResultName": "RAG Daily Paper Summary"
},
"propertiesUi": {
"propertyValues": [
{
"key": "DATE|date",
"date": "={{ $json.date }}"
},
{
"key": "Number of papers|number",
"numberValue": "={{ $json.paperCount }}"
},
{
"key": "SUMMARY_EN|rich_text",
"textContent": "={{ $json.summaryEN }}"
},
{
"key": "SUMMARY_CN|rich_text",
"textContent": "={{ $json.summaryCN }}"
}
]
}
},
"credentials": {
"notionApi": {
"id": "BNsFk38kgqvRDJpX",
"name": "Notion account"
}
},
"typeVersion": 2.2
},
{
"id": "3282f989-a9a4-4d4f-aaf0-097fc0d72e0d",
"name": "JSON FORMAT",
"type": "n8n-nodes-base.code",
"position": [
-688,
320
],
"parameters": {
"jsCode": "const items = $input.all();\nconst response = items[0].json;\n\ntry {\n // Extract text content from Gemini API response\n // Note: response is directly an object, not an array\n const text = response.candidates[0].content.parts[0].text;\n \n // Extract JSON content\n const jsonMatch = text.match(/```json\\n([\\s\\S]*?)\\n```/);\n const jsonStr = jsonMatch[1];\n \n // Parse JSON\n const data = JSON.parse(jsonStr);\n \n // Manually handle duplicate keys - extract from original string\n const titleMatch = jsonStr.match(/\"Number of papers\":\\s*\"([^\"]+)\"/);\n const countMatch = jsonStr.match(/\"Number of papers\":\\s*(\\d+)/);\n \n // Construct result\n items[0].json = {\n title: titleMatch ? titleMatch[1] : '',\n date: data.Date || '',\n paperCount: countMatch ? parseInt(countMatch[1]) : 0,\n summaryCN: data.SUMMARY_CN || '',\n summaryEN: data.SUMMARY_EN || ''\n };\n \n} catch (error) {\n items[0].json = {\n error: error.message,\n originalData: response\n };\n}\n\nreturn items;\n"
},
"typeVersion": 2
},
{
"id": "f1a331fa-d830-4656-b108-7e18e7430b04",
"name": "付箋3",
"type": "n8n-nodes-base.stickyNote",
"position": [
-1984,
544
],
"parameters": {
"width": 736,
"height": 768,
"content": "## 1. Data Retrieval\n### arXiv API\n\nThe arXiv provides a public API that allows users to query research papers by topic or by predefined categories.\n\n[arXiv API User Manual](https://info.arxiv.org/help/api/user-manual.html#arxiv-api-users-manual)\n\n**Key Notes:**\n\n1. **Response Format**: The API returns data as a typical *Atom Response*.\n2. **Timezone & Update Frequency**: \n - The arXiv submission process operates on a 24-hour cycle. \n - Newly submitted articles become available in the API only at midnight *after* they have been processed. \n - Feeds are updated daily at midnight Eastern Standard Time (EST). \n - Therefore, a single request per day is sufficient. \n3. **Request Limits**: \n - The maximum number of results per call (`max_results`) is **30,000**, \n - Results must be retrieved in slices of at most **2,000** at a time, using the `max_results` and `start` query parameters. \n4. **Time Format**: \n - The expected format is `[YYYYMMDDTTTT+TO+YYYYMMDDTTTT]`, \n - `TTTT` is provided in 24-hour time to the minute, in GMT.\n\n### Scheduled Task\n\n- **Execution Frequency**: Daily \n- **Execution Time**: 6:00 AM \n- **Time Parameter Handling (JS)**: \n According to arXiv’s update rules, the scheduled task should query the **previous day’s (T-1)** `submittedDate` data.\n\n"
},
"typeVersion": 1
},
{
"id": "ae855e91-2363-4b97-8933-761934b269fe",
"name": "arXiv API",
"type": "n8n-nodes-base.httpRequest",
"position": [
-1440,
320
],
"parameters": {
"url": "=https://export.arxiv.org/api/query?search_query=all:RAG+AND+submittedDate:[{{$json[\"from\"]}}+TO+{{$json[\"to\"]}}]",
"options": {},
"sendQuery": true,
"queryParameters": {
"parameters": [
{
"name": "={{ $json.from }}"
},
{
"name": "={{ $json.to }}"
}
]
}
},
"typeVersion": 4.2
},
{
"id": "6f3df3be-a376-42e9-b0be-32c4fba5a8e2",
"name": "Message Construction",
"type": "n8n-nodes-base.code",
"position": [
-128,
528
],
"parameters": {
"jsCode": "// Get current date\nconst now = new Date();\nconst year = now.getFullYear();\nconst month = String(now.getMonth() + 1).padStart(2, '0');\nconst day = String(now.getDate()).padStart(2, '0');\nconst date = `${year}-${month}-${day}`;\n\n// Get input data\nconst inputData = $input.first().json;\n\n// Generate message content\nconst messageContent = inputData.SUMMARY_CN;\n\n// Gmail message body\nconst gmailMessage = {\n subject: inputData.title || `Daily Paper Summary - ${date}`,\n message: `<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">\n<head>\n <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\n <title> RAG Daily Paper Summary - ${date}</title>\n <style type=\"text/css\">\n /* Gmail safe styles */\n body {\n font-family: Arial, sans-serif;\n line-height: 1.4;\n margin: 0;\n padding: 0;\n background-color: #f9f9f9;\n color: #333333;\n }\n \n table {\n border-collapse: collapse;\n mso-table-lspace: 0pt;\n mso-table-rspace: 0pt;\n }\n \n .email-wrapper {\n width: 100%;\n background-color: #f9f9f9;\n padding: 40px 20px;\n }\n \n .email-container {\n width: 100%;\n max-width: 600px;\n margin: 0 auto;\n background-color: #ffffff;\n border-radius: 8px;\n box-shadow: 0 2px 12px rgba(0, 0, 0, 0.1);\n }\n \n .header {\n background-color: #2563eb;\n padding: 24px;\n text-align: center;\n border-radius: 8px 8px 0 0;\n }\n \n .header h1 {\n margin: 0 0 8px 0;\n font-size: 24px;\n font-weight: 600;\n color: #ffffff;\n }\n \n .date {\n font-size: 14px;\n color: #ffffff;\n opacity: 0.9;\n }\n \n .stats {\n background-color: #f1f5f9;\n padding: 16px 24px;\n font-size: 14px;\n color: #64748b;\n }\n \n .content {\n padding: 32px 24px 40px 24px;\n }\n \n .section {\n margin-bottom: 24px;\n }\n \n .section-title {\n font-size: 16px;\n font-weight: 600;\n color: #1e293b;\n margin-bottom: 12px;\n padding-bottom: 8px;\n border-bottom: 1px solid #e2e8f0;\n }\n \n .flag {\n display: inline-block;\n width: 20px;\n height: 14px;\n margin-right: 8px;\n border-radius: 2px;\n vertical-align: middle;\n }\n \n .flag-cn {\n background-color: #de2910;\n }\n \n .flag-en {\n background-color: #012169;\n }\n \n .summary {\n font-size: 14px;\n line-height: 1.6;\n color: #475569;\n padding: 16px;\n background-color: #f8fafc;\n border-radius: 6px;\n border-left: 3px solid #2563eb;\n }\n \n .divider {\n height: 1px;\n background-color: #e2e8f0;\n margin: 20px 0;\n border: none;\n }\n \n /* Mobile responsive */\n @media screen and (max-width: 600px) {\n .email-wrapper {\n padding: 20px 10px !important;\n }\n \n .header, .stats {\n padding: 20px 16px !important;\n }\n \n .content {\n padding: 24px 16px 32px 16px !important;\n }\n \n .email-container {\n border-radius: 0;\n }\n }\n \n /* Gmail specific fixes */\n .gmail-fix {\n display: none;\n }\n \n /* Outlook specific fixes */\n .ExternalClass {\n width: 100%;\n }\n \n .ExternalClass,\n .ExternalClass p,\n .ExternalClass span,\n .ExternalClass font,\n .ExternalClass td,\n .ExternalClass div {\n line-height: 100%;\n }\n </style>\n <!--[if mso]>\n <style type=\"text/css\">\n .email-container {\n width: 600px !important;\n }\n </style>\n <![endif]-->\n</head>\n<body>\n <table role=\"presentation\" class=\"email-wrapper\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n <tr>\n <td align=\"center\">\n <table role=\"presentation\" class=\"email-container\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n <!-- Header -->\n <tr>\n <td class=\"header\">\n <h1>RAG Daily Papers</h1>\n <div class=\"date\">${inputData.Date || date}</div>\n </td>\n </tr>\n \n <!-- Stats -->\n <tr>\n <td class=\"stats\">\n <strong>${inputData[\"Number of papers\"] || inputData.paperCount || 0} papers</strong> reviewed today\n </td>\n </tr>\n \n <!-- Content -->\n <tr>\n <td class=\"content\">\n <!-- Chinese Section -->\n <div class=\"section\">\n <h2 class=\"section-title\">\n 🇨🇳 Chinese\n </h2>\n <div class=\"summary\">\n ${inputData.SUMMARY_CN || inputData.summaryCN || 'No Chinese summary available'}\n </div>\n </div>\n \n <!-- Divider -->\n <hr class=\"divider\">\n \n <!-- English Section -->\n <div class=\"section\">\n <h2 class=\"section-title\">\n 🇺🇸 English\n </h2>\n <div class=\"summary\">\n ${inputData.SUMMARY_EN || inputData.summaryEN || 'No English summary available'}\n </div>\n </div>\n </td>\n </tr>\n </table>\n </td>\n </tr>\n </table>\n</body>\n</html>`\n};\n\n// Feishu message body\nconst feishuMessage = {\n msg_type: \"text\",\n content: {\n text: `Today ${$input.first().json.date} ${$input.first().json.paperCount} papers. ${$input.first().json.summaryEN} ${$input.first().json.summaryCN}`\n }\n};\n\n// n8n output format\nreturn [\n { json: { type: \"gmail\", ...gmailMessage } },\n { json: { type: \"feishu\", ...feishuMessage } }\n];\n"
},
"typeVersion": 2
},
{
"id": "2582c7df-9b15-4473-bc47-91cf6f7304e0",
"name": "付箋",
"type": "n8n-nodes-base.stickyNote",
"position": [
-176,
896
],
"parameters": {
"width": 1152,
"height": 576,
"content": "## 5. Message Push\n\nSet up two channels for message delivery: **EMAIL** and **IM**, and define the message format and content.\n\n### Email: Gmail\n\n**GMAIL OAuth 2.0 – Official Documentation** \n[Configure your OAuth consent screen](https://docs.n8n.io/integrations/builtin/credentials/google/oauth-single-service/?utm_source=n8n_app&utm_medium=credential_settings&utm_campaign=create_new_credentials_modal#configure-your-oauth-consent-screen)\n\n**Steps:**\n- Enable Gmail API \n- Create OAuth consent screen \n- Create OAuth client credentials \n- Audience: Add **Test users** under Testing status \n\n**Message format**: HTML \n(Model: OpenAI GPT — used to design an HTML email template)\n\n### IM: Feishu (LARK)\n\n**Bots in groups** \n[Use bots in groups](https://www.larksuite.com/hc/en-US/articles/360048487736-use-bots-in-groups)\n"
},
"typeVersion": 1
},
{
"id": "f7ba78f8-19cb-492c-840c-3570d2865fb1",
"name": "RAG Daily papers",
"type": "n8n-nodes-base.notion",
"position": [
800,
0
],
"parameters": {
"title": "={{ $json.title }}",
"simple": false,
"blockUi": {
"blockValues": [
{
"textContent": "={{ $json.summary }}"
}
]
},
"options": {},
"resource": "databasePage",
"databaseId": {
"__rl": true,
"mode": "list",
"value": "26ba136d-cee4-8029-ad3d-e0e8ac64993f",
"cachedResultUrl": "https://www.notion.so/26ba136dcee48029ad3de0e8ac64993f",
"cachedResultName": "RAG DAILY"
},
"propertiesUi": {
"propertyValues": [
{
"key": "published|date",
"date": "={{ $json.published }}"
},
{
"key": "summary|rich_text",
"textContent": "={{ $json.summary }}"
},
{
"key": "id|rich_text",
"textContent": "={{ $json.id }}"
},
{
"key": "html_url|url",
"urlValue": "={{ $json.html_url }}"
},
{
"key": "pdf_url|url",
"urlValue": "={{ $json.pdf_url }}"
},
{
"key": "primary_category|rich_text",
"textContent": "={{ $json.primary_category }}"
},
{
"key": "github|url",
"urlValue": "={{ $json.github }}",
"ignoreIfEmpty": true
},
{
"key": "huggingface|url",
"urlValue": "={{ $json.huggingface }}",
"ignoreIfEmpty": true
},
{
"key": "RAG_TF|rich_text",
"textContent": "={{ $json.RAG_TF }}"
},
{
"key": "RAG_REASON|rich_text",
"textContent": "={{ $json.RAG_REASON }}"
},
{
"key": "RAG_Category|rich_text",
"textContent": "={{ $json.RAG_Category }}"
},
{
"key": "RAG_NAME|rich_text",
"textContent": "={{ $json.RAG_NAME }}"
},
{
"key": "updated|date",
"date": "={{ $json.updated }}"
},
{
"key": "author|multi_select",
"multiSelectValue": "={{ $json.authors }}"
},
{
"key": "category|multi_select",
"multiSelectValue": "={{ $json.categories }}"
}
]
}
},
"credentials": {
"notionApi": {
"id": "BNsFk38kgqvRDJpX",
"name": "Notion account"
}
},
"typeVersion": 2.2
},
{
"id": "5d897d4d-968b-4336-bbee-d1d3b4dcae06",
"name": "Data Extraction",
"type": "n8n-nodes-base.code",
"position": [
112,
0
],
"parameters": {
"jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n return [{\n json: {\n error: \"XML data not found. Please ensure the input contains XML content\",\n message: \"Check the field names in the input data\",\n success: false\n }\n }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n if (!isoString) return '';\n \n try {\n const date = new Date(isoString);\n if (isNaN(date.getTime())) return '';\n \n const year = date.getFullYear();\n const month = String(date.getMonth() + 1).padStart(2, '0');\n const day = String(date.getDate()).padStart(2, '0');\n const hours = String(date.getUTCHours()).padStart(2, '0');\n const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n \n return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n } catch (error) {\n return '';\n }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n const match = xml.match(regex);\n return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n // Fixed link extraction to fit actual XML format\n // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n const patterns = [\n new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n ];\n \n for (const pattern of patterns) {\n const match = entryXml.match(pattern);\n if (match && match[1]) {\n return match[1];\n }\n }\n return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n const authors = [];\n \n for (const block of authorBlocks) {\n const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n if (nameMatch && nameMatch[1].trim()) {\n authors.push(nameMatch[1].trim());\n }\n }\n \n return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n const categories = [];\n const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n let match;\n \n while ((match = regex.exec(entryXml)) !== null) {\n if (match[1]) {\n categories.push(match[1]);\n }\n }\n \n return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n // Handle namespace-prefixed primary category extraction\n const patterns = [\n /primary_category[^>]*term=\"([^\"]*)\"/i,\n /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n ];\n \n for (const pattern of patterns) {\n const match = entryXml.match(pattern);\n if (match && match[1]) {\n return match[1];\n }\n }\n return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n // Extract all entry blocks\n const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n const entries = [];\n let match;\n \n while ((match = entryRegex.exec(xmlData)) !== null) {\n entries.push(match[1]);\n }\n \n if (entries.length === 0) {\n return [{\n json: {\n error: \"No <entry> elements found\",\n message: \"Please check if the XML data format is correct\",\n success: false\n }\n }];\n }\n\n // Process each entry\n const processedData = [];\n let processedCount = 0;\n\n for (let i = 0; i < entries.length; i++) {\n const entryXml = entries[i];\n \n try {\n const item = {\n id: extractTagContent(entryXml, 'id'),\n updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n published: formatDateTime(extractTagContent(entryXml, 'published')),\n title: extractTagContent(entryXml, 'title'),\n summary: extractTagContent(entryXml, 'summary'),\n authors: extractAuthors(entryXml), // field name changed to authors, returns array\n html_url: extractLink(entryXml, 'text/html'),\n pdf_url: extractLink(entryXml, 'application/pdf'),\n primary_category: extractPrimaryCategory(entryXml),\n categories: extractCategories(entryXml), // field name changed to categories\n arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n github: '',\n huggingface: ''\n };\n\n // Validate required fields\n if (item.id && item.title) {\n processedData.push(item);\n processedCount++;\n }\n \n } catch (error) {\n console.log(`Error processing entry ${i+1}: ${error.message}`);\n // Continue processing next entry\n }\n }\n\n // Return processed results\n return [{\n json: {\n success: true,\n message: `Successfully processed ${processedCount} entries`,\n data: processedData,\n processing_time: new Date().toISOString()\n }\n }];\n\n} catch (error) {\n // Error handling\n return [{\n json: {\n error: \"An error occurred during processing\",\n message: error.message,\n success: false\n }\n }];\n}\n"
},
"typeVersion": 2
},
{
"id": "ae2d8994-7a52-4f7b-81fd-61c0538ba380",
"name": "JSON Format",
"type": "n8n-nodes-base.code",
"position": [
592,
0
],
"parameters": {
"jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n return [{\n json: {\n error: \"XML data not found. Please ensure the input contains XML content\",\n message: \"Check the field names in the input data\",\n success: false\n }\n }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n if (!isoString) return '';\n \n try {\n const date = new Date(isoString);\n if (isNaN(date.getTime())) return '';\n \n const year = date.getFullYear();\n const month = String(date.getMonth() + 1).padStart(2, '0');\n const day = String(date.getDate()).padStart(2, '0');\n const hours = String(date.getUTCHours()).padStart(2, '0');\n const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n \n return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n } catch (error) {\n return '';\n }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n const match = xml.match(regex);\n return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n // Fixed link extraction to fit actual XML format\n // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n const patterns = [\n new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n ];\n \n for (const pattern of patterns) {\n const match = entryXml.match(pattern);\n if (match && match[1]) {\n return match[1];\n }\n }\n return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n const authors = [];\n \n for (const block of authorBlocks) {\n const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n if (nameMatch && nameMatch[1].trim()) {\n authors.push(nameMatch[1].trim());\n }\n }\n \n return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n const categories = [];\n const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n let match;\n \n while ((match = regex.exec(entryXml)) !== null) {\n if (match[1]) {\n categories.push(match[1]);\n }\n }\n \n return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n // Handle namespace-prefixed primary category extraction\n const patterns = [\n /primary_category[^>]*term=\"([^\"]*)\"/i,\n /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n ];\n \n for (const pattern of patterns) {\n const match = entryXml.match(pattern);\n if (match && match[1]) {\n return match[1];\n }\n }\n return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n // Extract all entry blocks\n const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n const entries = [];\n let match;\n \n while ((match = entryRegex.exec(xmlData)) !== null) {\n entries.push(match[1]);\n }\n \n if (entries.length === 0) {\n return [{\n json: {\n error: \"No <entry> elements found\",\n message: \"Please check if the XML data format is correct\",\n success: false\n }\n }];\n }\n\n // Process each entry\n const processedData = [];\n let processedCount = 0;\n\n for (let i = 0; i < entries.length; i++) {\n const entryXml = entries[i];\n \n try {\n const item = {\n id: extractTagContent(entryXml, 'id'),\n updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n published: formatDateTime(extractTagContent(entryXml, 'published')),\n title: extractTagContent(entryXml, 'title'),\n summary: extractTagContent(entryXml, 'summary'),\n authors: extractAuthors(entryXml), // field name changed to authors, returns array\n html_url: extractLink(entryXml, 'text/html'),\n pdf_url: extractLink(entryXml, 'application/pdf'),\n primary_category: extractPrimaryCategory(entryXml),\n categories: extractCategories(entryXml), // field name changed to categories\n arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n github: '',\n huggingface: ''\n };\n\n // Validate required fields\n if (item.id && item.title) {\n processedData.push(item);\n processedCount++;\n }\n \n } catch (error) {\n console.log(`Error processing entry ${i+1}: ${error.message}`);\n // Continue processing next entry\n }\n }\n\n // Return processed results\n return [{\n json: {\n success: true,\n message: `Successfully processed ${processedCount} entries`,\n data: processedData,\n processing_time: new Date().toISOString()\n }\n }];\n\n} catch (error) {\n // Error handling\n return [{\n json: {\n error: \"An error occurred during processing\",\n message: error.message,\n success: false\n }\n }];\n}\n"
},
"typeVersion": 2
},
{
"id": "8fbefc67-e9f7-4597-b935-d5f5895cf93c",
"name": "付箋1",
"type": "n8n-nodes-base.stickyNote",
"position": [
-160,
-224
],
"parameters": {
"width": 656,
"height": 192,
"content": "## 3. Data Processing\n\nAnalyze and summarize paper data using AI, then standardize output as JSON.\n\n### Single Paper Basic Information Analysis and Enhancement \n### Daily Paper Summary and Multilingual Translation"
},
"typeVersion": 1
},
{
"id": "884f2c40-4628-4376-a040-709e2db34c48",
"name": "付箋2",
"type": "n8n-nodes-base.stickyNote",
"position": [
1024,
16
],
"parameters": {
"width": 624,
"height": 368,
"content": "## 4. Data Storage: Notion Database\n\n- Create a corresponding database in Notion with the same predefined field names. \n- In Notion, create an integration under **Integrations** and grant access to the database. Obtain the corresponding **Secret Key**. \n- Use the Notion **\"Create a database page\"** node to configure the field mapping and store the data. \n\n**Notes** \n- **\"Create a database page\"** only adds new entries; data will not be updated. \n- The `updated` and `published` timestamps of arXiv papers are in **UTC**. \n- Notion **single-select** and **multi-select** fields only accept arrays. They do not automatically parse comma-separated strings. You need to format them as proper arrays. \n- Notion does not accept `null` values, which causes a **400 error**. \n"
},
"typeVersion": 1
},
{
"id": "4991129d-9406-4c52-bd8f-87e2721c4a6f",
"name": "付箋4",
"type": "n8n-nodes-base.stickyNote",
"position": [
-1088,
544
],
"parameters": {
"width": 624,
"height": 912,
"content": "## 2. **Data Extraction**\n\n### Data Cleaning Rules (Convert to Standard JSON)\n\n1. **Remove Header** \n - Keep only the `<entry></entry>` blocks representing paper items.\n\n2. **Single Item** \n - Each `<entry></entry>` represents a single item.\n\n3. **Field Processing Rules** \n - `<id></id>` ➡️ `id` \n Extract content. \n Example: `<id>http://arxiv.org/abs/2409.06062v1</id>` → `http://arxiv.org/abs/2409.06062v1` \n - `<updated></updated>` ➡️ `updated` \n Convert timestamp to `yyyy-mm-dd hh:mm:ss` \n - `<published></published>` ➡️ `published` \n Convert timestamp to `yyyy-mm-dd hh:mm:ss` \n - `<title></title>` ➡️ `title` \n Extract text content \n - `<summary></summary>` ➡️ `summary` \n Keep text, remove line breaks \n - `<author></author>` ➡️ `author` \n Combine all authors into an array \n Example: `[ \"Ernest Pusateri\", \"Anmol Walia\" ]` (for Notion multi-select field) \n - `<arxiv:comment></arxiv:comment>` ➡️ Ignore / discard \n - `<link type=\"text/html\">` ➡️ `html_url` \n Extract URL \n - `<link type=\"application/pdf\">` ➡️ `pdf_url` \n Extract URL \n - `<arxiv:primary_category term=\"cs.CL\">` ➡️ `primary_category` \n Extract `term` value \n - `<category>` ➡️ `category` \n Merge all `<category>` values into an array \n Example: `[ \"eess.AS\", \"cs.SD\" ]` (for Notion multi-select field) \n\n4. **Add Empty Fields** \n - `github` \n - `huggingface`\n"
},
"typeVersion": 1
}
],
"pinData": {},
"connections": {
"c3685631-8bbd-409a-978a-fbb3e9847115": {
"main": [
[
{
"node": "5d897d4d-968b-4336-bbee-d1d3b4dcae06",
"type": "main",
"index": 0
}
]
]
},
"9151ab18-379f-4d3b-8ca2-cf65c547e78d": {
"main": [
[
{
"node": "869f80ec-c14c-4d1e-ae11-bb6eb4c99e5d",
"type": "main",
"index": 0
}
]
]
},
"a38b1b58-a6f6-4c6b-ba6e-f153980a220d": {
"main": [
[
{
"node": "ac6b1c0d-b18e-4b42-b49e-8cb4daf0d384",
"type": "main",
"index": 0
}
]
]
},
"ae855e91-2363-4b97-8933-761934b269fe": {
"main": [
[
{
"node": "3df82b76-e9c8-4b0b-a552-428f2fc12c97",
"type": "main",
"index": 0
}
]
]
},
"3282f989-a9a4-4d4f-aaf0-097fc0d72e0d": {
"main": [
[
{
"node": "024c6399-857e-45a3-a15d-8b733e16da67",
"type": "main",
"index": 0
},
{
"node": "c3685631-8bbd-409a-978a-fbb3e9847115",
"type": "main",
"index": 0
},
{
"node": "6f3df3be-a376-42e9-b0be-32c4fba5a8e2",
"type": "main",
"index": 0
}
]
]
},
"ae2d8994-7a52-4f7b-81fd-61c0538ba380": {
"main": [
[
{
"node": "f7ba78f8-19cb-492c-840c-3570d2865fb1",
"type": "main",
"index": 0
}
]
]
},
"Basic LLM Chain": {
"main": [
[
{
"node": "ae2d8994-7a52-4f7b-81fd-61c0538ba380",
"type": "main",
"index": 0
}
]
]
},
"5d897d4d-968b-4336-bbee-d1d3b4dcae06": {
"main": [
[
{
"node": "Basic LLM Chain",
"type": "main",
"index": 0
}
]
]
},
"3df82b76-e9c8-4b0b-a552-428f2fc12c97": {
"main": [
[
{
"node": "3282f989-a9a4-4d4f-aaf0-097fc0d72e0d",
"type": "main",
"index": 0
}
]
]
},
"Schedule Trigger": {
"main": [
[
{
"node": "aaa67776-c308-443e-98f6-e1fe7035cbb5",
"type": "main",
"index": 0
}
]
]
},
"aaa67776-c308-443e-98f6-e1fe7035cbb5": {
"main": [
[
{
"node": "ae855e91-2363-4b97-8933-761934b269fe",
"type": "main",
"index": 0
}
]
]
},
"6f3df3be-a376-42e9-b0be-32c4fba5a8e2": {
"main": [
[
{
"node": "9151ab18-379f-4d3b-8ca2-cf65c547e78d",
"type": "main",
"index": 0
},
{
"node": "a38b1b58-a6f6-4c6b-ba6e-f153980a220d",
"type": "main",
"index": 0
}
]
]
},
"Google Gemini Chat Model": {
"ai_languageModel": [
[
{
"node": "Basic LLM Chain",
"type": "ai_languageModel",
"index": 0
}
]
]
}
}
}よくある質問
このワークフローの使い方は?
上記のJSON設定コードをコピーし、n8nインスタンスで新しいワークフローを作成して「JSONからインポート」を選択、設定を貼り付けて認証情報を必要に応じて変更してください。
このワークフローはどんな場面に適していますか?
上級 - コンテンツ作成, マルチモーダルAI
有料ですか?
このワークフローは完全無料です。ただし、ワークフローで使用するサードパーティサービス(OpenAI APIなど)は別途料金が発生する場合があります。
関連ワークフロー
Groq、Gemini、Slack承認システムを使用してRSSからMediumへの公開を自動化
Groq、Gemini、Slack承認システムを用いたRSSからMediumへの自動公開プロセス
If
Set
Code
+
If
Set
Code
41 ノードObisDev
コンテンツ作成
Gemini、Tavily および人間の審査を使用して SEO 最適化された WordPress ブログを生成
Gemini、Tavily、そして人間の審査を使ってSEO最適化されたWordPressブログを生成する
If
Set
Code
+
If
Set
Code
38 ノードAryan Shinde
コンテンツ作成
AIを活用したリードの資格評価とパーソナライズドアウトリーチ(Relevance AI使用)
AIを活用したリードの資格評価とパーソナライズドアウトリーチ:Relevance AIを使用
Set
Code
Gmail
+
Set
Code
Gmail
34 ノードDiptamoy Barman
コンテンツ作成
LinkedIn カルーセル
Gemini AI と Post Nitro を使って LinkedIn のカルーセルを自動生成して投稿する
Code
Form Trigger
Http Request
+
Code
Form Trigger
Http Request
33 ノードShayan Ali Bakhsh
コンテンツ作成
blog_workflow_template_n8n
Claude AIを使用してニュースから自動のにWordPress記事を生成し、LinkedInに共有
If
Code
Wait
+
If
Code
Wait
23 ノードMarco Venturi
コンテンツ作成
スマートな求人情報エキスパート
Gemini AI、Notion 追跡、マルチプラットフォーム検索を用いた求人情報の自動化
If
Set
Code
+
If
Set
Code
16 ノードTegar karunia ilham
コンテンツ作成