Generar archivo llms.txt listo para IA desde el rastreo de sitios web de Screaming Frog
Este es unAIflujo de automatización del dominio deautomatización que contiene 23 nodos.Utiliza principalmente nodos como Set, Filter, Summarize, FormTrigger, ConvertToFile, combinando tecnología de inteligencia artificial para lograr automatización inteligente. Generar un archivo llms.txt listo para IA a partir del rastreo del sitio web de Screaming Frog
- •Clave de API de OpenAI
Nodos utilizados (23)
Categoría
{
"id": "",
"meta": {
"instanceId": "",
"templateCredsSetupCompleted": true
},
"name": "Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls",
"tags": [],
"nodes": [
{
"id": "ca701618-b2d5-48ee-a503-d3513d018a65",
"name": "Nota adhesiva",
"type": "n8n-nodes-base.stickyNote",
"position": [
360,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Form - Screaming Frog internal_html.csv upload \n\nThis form node is used to trigger the workflow. \n\nIt contains **three input fields**: \n- Name of the website \n- Short description of the website \n- **Screaming Frog** export containing the internal URLs \n\n\n\nIt is recommended to use the **internal_html.csv** export, but **internal_all.csv** will also work, as the workflow includes a filter to process only indexable URLs.\n"
},
"typeVersion": 1
},
{
"id": "bc040ca0-f38d-4458-a60c-17f71dbfd1ea",
"name": "Nota adhesiva1",
"type": "n8n-nodes-base.stickyNote",
"position": [
780,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Extract data from Screaming Frog file\n\nThis node extracts data from the **CSV file** provided by the user. \n\nIt produces an output that is **easily usable** in the following nodes. \n\n⚠️ **Caution:** \nIf the uploaded file is **not** the expected Screaming Frog export, the workflow will still proceed but will likely **fail in the next steps** due to missing required fields. \n\n"
},
"typeVersion": 1
},
{
"id": "f71a7d10-847d-48e7-8820-ec0c1e7ea055",
"name": "Nota adhesiva2",
"type": "n8n-nodes-base.stickyNote",
"position": [
1200,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Set Useful Fields \n\nThis node sets **7 key fields** from the Screaming Frog export: \n\n- `url` → from the **\"Address\"** column \n- `title` → from the **\"Title 1\"** column \n- `description` → from the **\"Meta Description 1\"** column \n- `status` → from the **\"Status Code\"** column \n- `indexability` → from the **\"Indexability\"** column \n- `content_type` → from the **\"Content Type\"** column \n- `word_count` → from the **\"Word Count\"** column \n\n\n**Multi-language compatibility** \nIf you're using Screaming Frog in **French, Italian, German, or Spanish**, the column names will be different. \nHowever, the workflow is designed to handle this, so it will **still work correctly**! 🥳\n"
},
"typeVersion": 1
},
{
"id": "6f6546b8-adeb-4998-ae19-d93525337eb7",
"name": "Establecer campos útiles",
"type": "n8n-nodes-base.set",
"position": [
1340,
60
],
"parameters": {
"options": {},
"assignments": {
"assignments": [
{
"id": "0e7d4a06-83fc-4834-93fe-2e758cbe2307",
"name": "url",
"type": "string",
"value": "={{ $json.Address || $json.Adresse || $json.Dirección || $json.Indirizzo }}"
},
{
"id": "c82f4d4c-9d0b-4c7d-9647-5d0240b58643",
"name": "title",
"type": "string",
"value": "={{ $json['Title 1'] || $json['Titolo 1'] || $json['Titolo 1'] || $json['Título 1'] || $json['Titel 1'] }}"
},
{
"id": "abea81db-ce3b-4ac1-bd21-09ccfffb567a",
"name": "description",
"type": "string",
"value": "={{ $json['Meta Description 1'] || $json['Meta description 1'] }}"
},
{
"id": "2ca75d74-70f8-400b-b862-9da186135915",
"name": "statut",
"type": "string",
"value": "={{ $json['Status Code'] || $json['Code HTTP'] || $json['Status-Code'] || $json['Código de respuesta'] || $json['Codice di stato']}}"
},
{
"id": "754d3202-38b0-4d79-ba24-8078b3244307",
"name": "indexability",
"type": "string",
"value": "={{ $json.Indexability || $json.Indexabilité || $json.Indicizzabilità || $json.Indexabilidad || $json.Indexierbarkeit}}"
},
{
"id": "8bc6583d-bb34-4d22-b310-fe79bb8ac85d",
"name": "content_type",
"type": "string",
"value": "={{ $json['Content Type'] || $json['Type de contenu'] || $json['Tipo di contenuto'] || $json['Tipo de contenido'] || $json['Inhaltstyp']}}"
},
{
"id": "c874ba1a-769e-43d3-9555-8c9914ca9b76",
"name": "word_count",
"type": "string",
"value": "={{ $json['Word Count'] || $json['Nombre de mots'] || $json['Conteggio delle parole'] || $json['Conteggio delle parole'] || $json['Recuento de palabras'] || $json['Wortanzahl'] }}"
}
]
}
},
"typeVersion": 3.4
},
{
"id": "1a9af7a0-d2d5-44cb-9770-2d5a1e5706f4",
"name": "Clasificador de texto",
"type": "@n8n/n8n-nodes-langchain.textClassifier",
"disabled": true,
"position": [
2260,
60
],
"parameters": {
"options": {},
"inputText": "=url : {{ $json.url }}\ntitle : {{ $json.title }}\ndescription : {{ $json.description }}\nwords count : {{ $json.word_count }}",
"categories": {
"categories": [
{
"category": "useful_content",
"description": "Pages that are likely to contain high-quality content, making them suitable for inclusion in a file that aids content discovery for an LLM. "
},
{
"category": "other_content",
"description": "Pages that should not be included (e.g., pagination, or low-value content)."
}
]
}
},
"typeVersion": 1
},
{
"id": "74a4e378-4228-4142-92ca-e541efde2b15",
"name": "OpenAI Chat Model",
"type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
"position": [
2180,
240
],
"parameters": {
"model": {
"__rl": true,
"mode": "list",
"value": "gpt-4o-mini"
},
"options": {}
},
"credentials": {
"openAiApi": {
"id": "",
"name": "OpenAi Connection"
}
},
"typeVersion": 1.2
},
{
"id": "63dc6cfe-bc73-43b5-8c7d-4f5fd6501d3b",
"name": "Sin operación, no hacer nada",
"type": "n8n-nodes-base.noOp",
"position": [
2580,
200
],
"parameters": {},
"typeVersion": 1
},
{
"id": "cb555b99-9e63-4b6b-a1fc-512b5467d666",
"name": "Nota adhesiva3",
"type": "n8n-nodes-base.stickyNote",
"position": [
1620,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Filter URLs \n\nThis **filter node** is used to keep only the URLs that meet the following conditions: \n- `status` = **200** \n- `indexability` = **indexable** \n- `content_type` contains **text/html** \n\n\nThese filters are even **more useful** if the uploaded file is an **internal_all.csv** instead of an **internal_html.csv**. \n\n### **Tips:** \nYou can **add more filters** to refine the URLs included in your `llms.txt` file. \n\n💡 **Examples:** \n- **Filter by word count** → Ensure pages contain **enough text content**. \n- **Filter by URL path** → Keep only **specific folders or categories** in the `llms.txt` file. \n- **Filter by meta description** → Exclude URLs **without a meta description**, as this field will be used in the `llms.txt` file to describe each piece of content. \n"
},
"typeVersion": 1
},
{
"id": "e34e56e2-5cc8-4e50-bfb0-3aa2e5e04abf",
"name": "Filtrar URLs",
"type": "n8n-nodes-base.filter",
"position": [
1740,
60
],
"parameters": {
"options": {},
"conditions": {
"options": {
"version": 2,
"leftValue": "",
"caseSensitive": true,
"typeValidation": "strict"
},
"combinator": "and",
"conditions": [
{
"id": "cef4feaa-1c46-45b1-92b7-f5c2051b1dc5",
"operator": {
"type": "number",
"operation": "equals"
},
"leftValue": "={{ Number($json.statut) }}",
"rightValue": 200
},
{
"id": "bb821656-9740-4da4-8aa9-f65ad098c470",
"operator": {
"type": "boolean",
"operation": "true",
"singleValue": true
},
"leftValue": "={{ [\"Indexable\", \"Indicizzabile\", \"Indexierbar\"].includes($json.indexability) }}",
"rightValue": "={{ \"Indexable\" || \"Indicizzabile\" }}"
},
{
"id": "5c93ddb8-8091-406a-bc04-fa14e8b73fb9",
"operator": {
"type": "string",
"operation": "contains"
},
"leftValue": "={{ $json.content_type }}",
"rightValue": "text/html"
}
]
}
},
"typeVersion": 2.2
},
{
"id": "b98f19a8-afd3-4d26-8063-dee3ee75055f",
"name": "Nota adhesiva4",
"type": "n8n-nodes-base.stickyNote",
"position": [
2040,
-800
],
"parameters": {
"color": 2,
"width": 740,
"height": 1160,
"content": "## Text Classifier\n\n🚫 **This node is deactivated by default** in the template. \n\nYou can **enable it** if you want to add a more **\"intelligent\" 🤓 filter** to refine the URLs included in the `llms.txt` file, helping LLMs discover and prioritize valuable content.\n\n### How It Works:\nThis node has **two outputs**: \n- **`useful_content`** → Pages that are **likely to contain high-quality content**, making them suitable for inclusion in a file that **aids content discovery for an LLM**. \n- **`other_content`** → Pages that should **not** be included (e.g., pagination or low-value content). \n\n\nYou can **modify the description** in the node to fine-tune the classification according to your needs. \n\n### Input Fields:\n- **url** → `{{ $json.url }}` \n- **title** → `{{ $json.title }}` \n- **description** → `{{ $json.description }}` \n- **word_count** → `{{ $json.word_count }}` \n\n### Why use an LLM? \nA **language model (LLM)** can **analyze** the **URL, title, and description** to identify pages that **most likely contain meaningful and relevant content**. \nThis allows it to **prioritize valuable pages** and structure the data for **better content discovery and training purposes**. \n\n### **For large websites** \nIf you have a **very large website**, consider using a **Loop Over Items** node to make the workflow **more robust** and ensure all pages are processed. \nAlso, using a **Loop Over Items** node make it **easier** to handle: \n- **Timeouts** \n- **API quotas** \n- **Other scalability issues**\n\n### Tokens usage\nFinally, keep in mind that **more pages mean more tokens and more billed LLM API calls**.\n\n\n\n\n\n\n\n"
},
"typeVersion": 1
},
{
"id": "63e3ea7a-cec3-442c-9812-771def0a9949",
"name": "Nota adhesiva5",
"type": "n8n-nodes-base.stickyNote",
"position": [
2840,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Set Field - llms.txt Row\n\nThis node **sets** the row format for the `llms.txt` file. \n\n### Row Structure:\nEach row follows this format: \n\n- `- [title](link): description` \n\nIf the URL **has no description** (from the **Meta Description** in the Screaming Frog export), the row will be: \n\n- `- [title](link)` \n"
},
"typeVersion": 1
},
{
"id": "78f58220-feb5-4044-b994-39a0e4f1e9e4",
"name": "Nota adhesiva6",
"type": "n8n-nodes-base.stickyNote",
"position": [
3260,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Summarize - Concatenate\n\nThis node concatenates all the output from the previous node, ensuring each row is on a separate line."
},
"typeVersion": 1
},
{
"id": "7a119633-7cd3-4de5-a1cd-7f708e1abf4a",
"name": "Nota adhesiva7",
"type": "n8n-nodes-base.stickyNote",
"position": [
3680,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Set Fields - llms.txt Content\n\nThis node sets the content of the `llms.txt` file using:\n\n- The **website title** provided in the form (first node).\n- The **website description** provided in the form (first node).\n- The output from the previous node, which includes all the URLs, their titles, and their descriptions that will appear in the `llms.txt` file.\n"
},
"typeVersion": 1
},
{
"id": "554f6858-68e8-4b35-a6c4-21bed6832323",
"name": "Nota adhesiva8",
"type": "n8n-nodes-base.stickyNote",
"position": [
4100,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Generate llms.txt file\n\nThis node **creates** the `llms.txt` file, which can be **downloaded directly** within n8n. \n"
},
"typeVersion": 1
},
{
"id": "24bdefba-e2f2-41f0-93e7-9f8d2fc11f43",
"name": "Nota adhesiva9",
"type": "n8n-nodes-base.stickyNote",
"position": [
4520,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## upload file anywhere\n\nInstead of downloading the file directly from the n8n workflow, you can **replace this node node** with a Drive node (e.g., **Google Drive** or **OneDrive**) to upload the `llms.txt` file to a folder of your choice. \n \n**Name the file properly** (e.g., include the website name) to make it easier to find and distinguish between files when working on multiple websites. \n"
},
"typeVersion": 1
},
{
"id": "a3be51e3-810c-40a7-a996-98a3d383c2b9",
"name": "Resumir - Concatenar",
"type": "n8n-nodes-base.summarize",
"position": [
3380,
40
],
"parameters": {
"options": {},
"fieldsToSummarize": {
"values": [
{
"field": "llmTxtRow",
"separateBy": "\n",
"aggregation": "concatenate"
}
]
}
},
"typeVersion": 1.1
},
{
"id": "8d3a892a-3d11-4d8a-8ec6-84f8f3af1183",
"name": "Establecer campos - Contenido de llms.txt",
"type": "n8n-nodes-base.set",
"position": [
3820,
40
],
"parameters": {
"options": {},
"assignments": {
"assignments": [
{
"id": "97062a99-e944-4e1e-89b1-62cf9e3462dd",
"name": "llmsTxtFile",
"type": "string",
"value": "=# {{ $('Form - Screaming frog internal_html.csv upload').item.json['What is the name of your website?'] }}\n> {{ $('Form - Screaming frog internal_html.csv upload').item.json['Can you provide a short description of your website? (in the language of the website)'] }}\n\n{{ $json.concatenated_llmTxtRow }}"
}
]
}
},
"typeVersion": 3.4
},
{
"id": "bc2a692a-47ea-4bf1-a102-e607fd544158",
"name": "subir archivo a cualquier lugar",
"type": "n8n-nodes-base.noOp",
"position": [
4640,
40
],
"parameters": {},
"typeVersion": 1
},
{
"id": "404510a2-35b2-44cf-9d02-eb0abcf4e9b3",
"name": "Establecer campo - Fila de llms.txt",
"type": "n8n-nodes-base.set",
"position": [
2960,
40
],
"parameters": {
"options": {},
"assignments": {
"assignments": [
{
"id": "95e75caa-8110-476b-9cb1-73c15361fa56",
"name": "llmTxtRow",
"type": "string",
"value": "=- [{{ $json.title }}]({{ $json.url }}){{ $json.description ? ': ' + $json.description : '' }}"
}
]
}
},
"typeVersion": 3.4
},
{
"id": "f54d51f2-17bc-4c58-b177-0e77e16f7b72",
"name": "Nota adhesiva10",
"type": "n8n-nodes-base.stickyNote",
"position": [
-420,
-1020
],
"parameters": {
"color": 5,
"width": 700,
"height": 1380,
"content": "# Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls \n\nThis workflow helps you generate an **llms.txt** file (if you're unfamiliar with it, check out [this article](https://towardsdatascience.com/llms-txt-414d5121bcb3/)) using a **Screaming Frog export**. \n\n[Screaming Frog](https://www.screamingfrog.co.uk/seo-spider/) is a well-known website crawler. \nYou can easily crawl a website. Then, export the **\"internal_html\"** section in CSV format. \n\n## How It Works: \n\nA **form** allows you to enter: \n- The **name of the website** \n- A **short description** \n- The **internal_html.csv** file from your Screaming Frog export \n\n\nOnce the form is submitted, the **workflow is triggered automatically**, and you can **download the llms.txt file directly from n8n**. \n\n## Downloading the File\nSince the last node in this workflow is **\"Convert to File\"**, you will need to **download the file directly from the n8n UI**. \nHowever, you can easily **add a node** (e.g., Google Drive, OneDrive) to automatically upload the file **wherever you want**. \n\n## AI-Powered Filtering (Optional): \nThis workflow includes a **text classifier node**, which is **deactivated by default**. \n- You can **activate it** to apply a more **intelligent filter** to select URLs for the `llms.txt` file. \n- Consider modifying the **description** in the classifier node to specify the type of URLs you want to include. \n\n## How to Use This Workflow \n\n1. **Crawl the website** you want to generate an `llms.txt` file for using **Screaming Frog**. \n2. **Export the \"internal_html\"** section in CSV format. \n  \n3. In **n8n**, click **\"Test Workflow\"**, fill in the form, and **upload** the `internal_html.csv` file. \n4. Once the workflow is complete, go to the **\"Export to File\"** node and **download the output**. \n\n**That's it! You now have your llms.txt file!** \n\n\n\n**Recommended Usage:** \nUse this workflow **directly in the n8n UI by clicking** 'Test Workflow' and uploading the file in the form."
},
"typeVersion": 1
},
{
"id": "e33104af-802a-43f2-b26d-f368f7de2fd7",
"name": "Formulario - Carga de Screaming Frog internal_html.csv",
"type": "n8n-nodes-base.formTrigger",
"position": [
460,
60
],
"webhookId": "8791f39a-3d81-405c-b177-0a733ebf74cb",
"parameters": {
"options": {
"buttonLabel": "Get the llms.txt file"
},
"formTitle": "llms.txt Generator - From Screaming Frog export",
"formFields": {
"values": [
{
"fieldLabel": "What is the name of your website?",
"placeholder": "Example : The best website ever",
"requiredField": true
},
{
"fieldLabel": "Can you provide a short description of your website? (in the language of the website)",
"placeholder": "Example : This is the best website ever because all the content is engaging and valuable.",
"requiredField": true
},
{
"fieldType": "file",
"fieldLabel": "screaming_frog_export",
"multipleFiles": false,
"requiredField": true,
"acceptFileTypes": ".csv"
}
]
},
"responseMode": "lastNode",
"formDescription": "Generate a simple llms.txt file from a Screaming Frog Export\nIt is recommended to use the internal_html.csv export, although internal_all.csv will also work.\n\nFill in the fields in this form.Just fill in the fields in this form 😄"
},
"typeVersion": 2.2
},
{
"id": "f6b17fdd-a098-411e-8d53-3f6e638cc3ba",
"name": "Extraer datos del archivo de Screaming Frog",
"type": "n8n-nodes-base.extractFromFile",
"position": [
900,
60
],
"parameters": {
"options": {},
"operation": "xls",
"binaryPropertyName": "screaming_frog_export"
},
"typeVersion": 1
},
{
"id": "6bbd8d1f-3322-4c6d-af08-c842386239ce",
"name": "Generar archivo llms.txt",
"type": "n8n-nodes-base.convertToFile",
"position": [
4220,
40
],
"parameters": {
"options": {
"encoding": "utf8",
"fileName": "llms.txt"
},
"operation": "toText",
"sourceProperty": "llmsTxtFile"
},
"typeVersion": 1.1
}
],
"active": false,
"pinData": {},
"settings": {
"executionOrder": "v1"
},
"versionId": "",
"connections": {
"e34e56e2-5cc8-4e50-bfb0-3aa2e5e04abf": {
"main": [
[
{
"node": "1a9af7a0-d2d5-44cb-9770-2d5a1e5706f4",
"type": "main",
"index": 0
}
]
]
},
"1a9af7a0-d2d5-44cb-9770-2d5a1e5706f4": {
"main": [
[
{
"node": "404510a2-35b2-44cf-9d02-eb0abcf4e9b3",
"type": "main",
"index": 0
}
],
[
{
"node": "63dc6cfe-bc73-43b5-8c7d-4f5fd6501d3b",
"type": "main",
"index": 0
}
]
]
},
"74a4e378-4228-4142-92ca-e541efde2b15": {
"ai_languageModel": [
[
{
"node": "1a9af7a0-d2d5-44cb-9770-2d5a1e5706f4",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"6f6546b8-adeb-4998-ae19-d93525337eb7": {
"main": [
[
{
"node": "e34e56e2-5cc8-4e50-bfb0-3aa2e5e04abf",
"type": "main",
"index": 0
}
]
]
},
"6bbd8d1f-3322-4c6d-af08-c842386239ce": {
"main": [
[]
]
},
"a3be51e3-810c-40a7-a996-98a3d383c2b9": {
"main": [
[
{
"node": "8d3a892a-3d11-4d8a-8ec6-84f8f3af1183",
"type": "main",
"index": 0
}
]
]
},
"404510a2-35b2-44cf-9d02-eb0abcf4e9b3": {
"main": [
[
{
"node": "a3be51e3-810c-40a7-a996-98a3d383c2b9",
"type": "main",
"index": 0
}
]
]
},
"8d3a892a-3d11-4d8a-8ec6-84f8f3af1183": {
"main": [
[
{
"node": "6bbd8d1f-3322-4c6d-af08-c842386239ce",
"type": "main",
"index": 0
}
]
]
},
"f6b17fdd-a098-411e-8d53-3f6e638cc3ba": {
"main": [
[
{
"node": "6f6546b8-adeb-4998-ae19-d93525337eb7",
"type": "main",
"index": 0
}
]
]
},
"e33104af-802a-43f2-b26d-f368f7de2fd7": {
"main": [
[
{
"node": "f6b17fdd-a098-411e-8d53-3f6e638cc3ba",
"type": "main",
"index": 0
}
]
]
}
}
}¿Cómo usar este flujo de trabajo?
Copie el código de configuración JSON de arriba, cree un nuevo flujo de trabajo en su instancia de n8n y seleccione "Importar desde JSON", pegue la configuración y luego modifique la configuración de credenciales según sea necesario.
¿En qué escenarios es adecuado este flujo de trabajo?
Avanzado - Inteligencia Artificial
¿Es de pago?
Este flujo de trabajo es completamente gratuito, puede importarlo y usarlo directamente. Sin embargo, tenga en cuenta que los servicios de terceros utilizados en el flujo de trabajo (como la API de OpenAI) pueden requerir un pago por su cuenta.
Flujos de trabajo relacionados recomendados
Dataki
@datakiI am passionate about transforming complex processes into seamless automations with n8n. My expertise spans across creating ETL pipelines, sales automations, and data & AI-driven workflows. As an avid problem solver, I thrive on optimizing workflows to drive efficiency and innovation.
Compartir este flujo de trabajo