Rastreo automatizado de artículos de noticias con ScrapegraphAI y almacenamiento en Google Sheets

Name: Rastreo automatizado de artículos de noticias con ScrapegraphAI y almacenamiento en Google Sheets
Rating: 4.5 (10 reviews)
Author: vinci-king-01

Intermedio

Este es unMarket Research, AI Summarizationflujo de automatización del dominio deautomatización que contiene 8 nodos.Utiliza principalmente nodos como Code, GoogleSheets, ScheduleTrigger, ScrapegraphAi. Extraer automáticamente artículos de noticias con ScrapegraphAI y almacenarlos en Google Sheets

Requisitos previos

•Credenciales de API de Google Sheets

Nodos utilizados (8)

Categoría

Investigación de mercado

Resumen de IA

Vista previa del flujo de trabajo

Visualización de las conexiones entre nodos, con soporte para zoom y panorámica

Activador de recolección automática de noticias

Raspador de artículos de noticias con IA

Almacenamiento de noticias en Google Sheets

Formateo y procesamiento de datos de noticias

React Flow

Exportar flujo de trabajo

Copie la siguiente configuración JSON en n8n para importar y usar este flujo de trabajo

{
  "id": "MIllJmbqayQrZM1F",
  "meta": {
    "instanceId": "521567c5f495f323b77849c4cfd0c9f4f2396c986e324e0e66c8425b6f124744",
    "templateCredsSetupCompleted": true
  },
  "name": "Automate News Article Scraping with ScrapegraphAI and Store in Google Sheets",
  "tags": [],
  "nodes": [
    {
      "id": "37df323b-5c75-495f-ba19-b8642c02d96f",
      "name": "Activador de recolección automática de noticias",
      "type": "n8n-nodes-base.scheduleTrigger",
      "position": [
        700,
        820
      ],
      "parameters": {
        "rule": {
          "interval": [
            {}
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "efd61ca5-e248-4027-b705-6d9c5dabe820",
      "name": "Raspador de artículos de noticias con IA",
      "type": "n8n-nodes-scrapegraphai.scrapegraphAi",
      "position": [
        1380,
        820
      ],
      "parameters": {
        "userPrompt": "Extract all the articles from this site. Use the following schema for response {   \"request_id\": \"5a9de102-8a43-4e89-8aae-397c9ca80a9b\",   \"status\": \"completed\",   \"website_url\": \"https://www.bbc.com/\",   \"user_prompt\": \"Extract all the articles from this site.\",   \"title\": \"'My friend died right in front of me' - Student describes moment air force jet crashed into school\",   \"url\": \"https://www.bbc.com/news/articles/cglzw8y5wy5o\",   \"category\": \"Asia\" }",
        "websiteUrl": "https://www.bbc.com/"
      },
      "credentials": {
        "scrapegraphAIApi": {
          "id": "",
          "name": ""
        }
      },
      "typeVersion": 1
    },
    {
      "id": "976d9123-7585-4700-9972-5b2838571a44",
      "name": "Almacenamiento de noticias en Google Sheets",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        2980,
        820
      ],
      "parameters": {
        "columns": {
          "value": {},
          "schema": [
            {
              "id": "title",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "title",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "url",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "url",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "category",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "category",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "autoMapInputData",
          "matchingColumns": []
        },
        "options": {},
        "operation": "append",
        "sheetName": {
          "__rl": true,
          "mode": "name",
          "value": "Sheet1"
        },
        "documentId": {
          "__rl": true,
          "mode": "url",
          "value": ""
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "id": "",
          "name": ""
        }
      },
      "typeVersion": 4.5
    },
    {
      "id": "6d11ae64-e2f8-47ed-854a-c749881ce72c",
      "name": "Formateo y procesamiento de datos de noticias",
      "type": "n8n-nodes-base.code",
      "notes": "Hey this is where \nyou \nformat results ",
      "position": [
        2140,
        820
      ],
      "parameters": {
        "jsCode": "// Get the input data\nconst inputData = $input.all()[0].json;\n\n// Extract articles array\nconst articles = inputData.result.articles;\n\n// Map each article and return only title, url, category\nreturn articles.map(article => ({\n  json: {\n    title: article.title,\n    url: article.url,\n    category: article.category\n  }\n}));"
      },
      "notesInFlow": true,
      "typeVersion": 2
    },
    {
      "id": "ca78baaf-0480-490d-aa9a-3663ca93f5d0",
      "name": "Nota adhesiva 1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1180,
        460
      ],
      "parameters": {
        "color": 5,
        "width": 574.9363634768473,
        "height": 530.4701664623029,
        "content": "# Step 2: AI-Powered News Article Scraper 🤖\n\nThis is the core node which uses ScrapeGraphAI to intelligently extract news articles from any website.\n\n## How to Use\n- Configure the target news website URL\n- Use natural language to describe what data to extract\n- The AI will automatically parse and structure the results\n\n## Configuration\n- **Website URL**: Target news website (e.g., BBC, CNN, Reuters)\n- **User Prompt**: Natural language instructions for data extraction\n- **API Credentials**: ScrapeGraphAI API key required\n\n## Example\n- **Website**: BBC News homepage\n- **Instruction**: \"Extract all article titles, URLs, and categories\"\n\n⚠️ **Note**: This is a community node requiring self-hosting"
      },
      "typeVersion": 1
    },
    {
      "id": "51a1337b-6a50-43a5-8d6f-8345bc771c7b",
      "name": "Nota adhesiva 2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1920,
        460
      ],
      "parameters": {
        "color": 5,
        "width": 574.9363634768473,
        "height": 530.4701664623029,
        "content": "# Step 3: News Data Formatting and Processing 🧱\n\nThis node transforms and structures the scraped news data for optimal Google Sheets compatibility.\n\n## What it does\n- Extracts articles array from ScrapeGraphAI response\n- Maps each article to standardized format\n- Ensures data consistency and structure\n- Prepares clean data for spreadsheet storage\n\n## Data Structure\n- **title**: Article headline and title\n- **url**: Direct link to the article\n- **category**: Article category or section\n\n## Customization\n- Modify the JavaScript code to extract additional fields\n- Add data validation and cleaning logic\n- Implement error handling for malformed data"
      },
      "typeVersion": 1
    },
    {
      "id": "2e8cde8e-f534-4f37-a1f9-bcf0fe0b09f9",
      "name": "Nota adhesiva 3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        460,
        460
      ],
      "parameters": {
        "color": 5,
        "width": 574.9363634768473,
        "height": 530.4701664623029,
        "content": "# Step 1: Automated News Collection Trigger ⏱️\n\nThis trigger automatically invokes the workflow at specified intervals to collect fresh news content.\n\n## Configuration Options\n- **Frequency**: Daily, hourly, or custom intervals\n- **Time Zone**: Configure for your business hours\n- **Execution Time**: Choose optimal times for news collection\n\n## Best Practices\n- Set appropriate intervals to respect rate limits\n- Consider news website update frequencies\n- Monitor execution logs for any issues\n- Adjust frequency based on your monitoring needs"
      },
      "typeVersion": 1
    },
    {
      "id": "5606537c-a531-490a-b4ff-6d0dc5e642b4",
      "name": "Nota adhesiva",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2680,
        460
      ],
      "parameters": {
        "color": 5,
        "width": 574.9363634768473,
        "height": 530.4701664623029,
        "content": "# Step 4: Google Sheets News Storage 📊\n\nThis node securely stores the processed news article data in your Google Sheets for analysis and tracking.\n\n## What it does\n- Connects to your Google Sheets account via OAuth2\n- Appends new article data as rows\n- Maintains historical data for trend analysis\n- Provides structured data for business intelligence\n\n## Configuration\n- **Spreadsheet**: Select or create target Google Sheets document\n- **Sheet Name**: Configure worksheet (default: Sheet1)\n- **Operation**: Append mode for continuous data collection\n- **Column Mapping**: Automatic mapping of title, url, category fields\n\n## Data Management\n- Each execution adds new article entries\n- Historical data preserved for analysis\n- Easy export and sharing capabilities\n- Built-in Google Sheets analytics and filtering"
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "pinData": {},
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "c2fee060-f99e-48aa-a280-ac5492715fd9",
  "connections": {
    "efd61ca5-e248-4027-b705-6d9c5dabe820": {
      "main": [
        [
          {
            "node": "6d11ae64-e2f8-47ed-854a-c749881ce72c",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "37df323b-5c75-495f-ba19-b8642c02d96f": {
      "main": [
        [
          {
            "node": "efd61ca5-e248-4027-b705-6d9c5dabe820",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "6d11ae64-e2f8-47ed-854a-c749881ce72c": {
      "main": [
        [
          {
            "node": "976d9123-7585-4700-9972-5b2838571a44",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Preguntas frecuentes

¿Cómo usar este flujo de trabajo?

Copie el código de configuración JSON de arriba, cree un nuevo flujo de trabajo en su instancia de n8n y seleccione "Importar desde JSON", pegue la configuración y luego modifique la configuración de credenciales según sea necesario.

¿En qué escenarios es adecuado este flujo de trabajo?

Intermedio - Investigación de mercado, Resumen de IA

¿Es de pago?

Este flujo de trabajo es completamente gratuito, puede importarlo y usarlo directamente. Sin embargo, tenga en cuenta que los servicios de terceros utilizados en el flujo de trabajo (como la API de OpenAI) pueden requerir un pago por su cuenta.