Automatisches Scrapen von Nachrichtenartikeln mit ScrapegraphAI und Speicherung in Google Sheets

Fortgeschritten

Dies ist ein Market Research, AI Summarization-Bereich Automatisierungsworkflow mit 8 Nodes. Hauptsächlich werden Code, GoogleSheets, ScheduleTrigger, ScrapegraphAi und andere Nodes verwendet. Automatisiertes Scrapen von Nachrichtenartikeln mit ScrapegraphAI und Speicherung in Google Sheets

Voraussetzungen
  • Google Sheets API-Anmeldedaten
Workflow-Vorschau
Visualisierung der Node-Verbindungen, mit Zoom und Pan
Workflow exportieren
Kopieren Sie die folgende JSON-Konfiguration und importieren Sie sie in n8n
{
  "id": "MIllJmbqayQrZM1F",
  "meta": {
    "instanceId": "521567c5f495f323b77849c4cfd0c9f4f2396c986e324e0e66c8425b6f124744",
    "templateCredsSetupCompleted": true
  },
  "name": "Automate News Article Scraping with ScrapegraphAI and Store in Google Sheets",
  "tags": [],
  "nodes": [
    {
      "id": "37df323b-5c75-495f-ba19-b8642c02d96f",
      "name": "Automatisierter Nachrichten-Sammel-Trigger",
      "type": "n8n-nodes-base.scheduleTrigger",
      "position": [
        700,
        820
      ],
      "parameters": {
        "rule": {
          "interval": [
            {}
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "efd61ca5-e248-4027-b705-6d9c5dabe820",
      "name": "KI-gestützter Nachrichtenartikel-Scraper",
      "type": "n8n-nodes-scrapegraphai.scrapegraphAi",
      "position": [
        1380,
        820
      ],
      "parameters": {
        "userPrompt": "Extract all the articles from this site. Use the following schema for response {   \"request_id\": \"5a9de102-8a43-4e89-8aae-397c9ca80a9b\",   \"status\": \"completed\",   \"website_url\": \"https://www.bbc.com/\",   \"user_prompt\": \"Extract all the articles from this site.\",   \"title\": \"'My friend died right in front of me' - Student describes moment air force jet crashed into school\",   \"url\": \"https://www.bbc.com/news/articles/cglzw8y5wy5o\",   \"category\": \"Asia\" }",
        "websiteUrl": "https://www.bbc.com/"
      },
      "credentials": {
        "scrapegraphAIApi": {
          "id": "",
          "name": ""
        }
      },
      "typeVersion": 1
    },
    {
      "id": "976d9123-7585-4700-9972-5b2838571a44",
      "name": "Google Sheets Nachrichtenspeicher",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        2980,
        820
      ],
      "parameters": {
        "columns": {
          "value": {},
          "schema": [
            {
              "id": "title",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "title",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "url",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "url",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "category",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "category",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "autoMapInputData",
          "matchingColumns": []
        },
        "options": {},
        "operation": "append",
        "sheetName": {
          "__rl": true,
          "mode": "name",
          "value": "Sheet1"
        },
        "documentId": {
          "__rl": true,
          "mode": "url",
          "value": ""
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "id": "",
          "name": ""
        }
      },
      "typeVersion": 4.5
    },
    {
      "id": "6d11ae64-e2f8-47ed-854a-c749881ce72c",
      "name": "Nachrichtendaten-Formatierung und -Verarbeitung",
      "type": "n8n-nodes-base.code",
      "notes": "Hey this is where \nyou \nformat results ",
      "position": [
        2140,
        820
      ],
      "parameters": {
        "jsCode": "// Get the input data\nconst inputData = $input.all()[0].json;\n\n// Extract articles array\nconst articles = inputData.result.articles;\n\n// Map each article and return only title, url, category\nreturn articles.map(article => ({\n  json: {\n    title: article.title,\n    url: article.url,\n    category: article.category\n  }\n}));"
      },
      "notesInFlow": true,
      "typeVersion": 2
    },
    {
      "id": "ca78baaf-0480-490d-aa9a-3663ca93f5d0",
      "name": "Haftnotiz1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1180,
        460
      ],
      "parameters": {
        "color": 5,
        "width": 574.9363634768473,
        "height": 530.4701664623029,
        "content": "# Step 2: AI-Powered News Article Scraper 🤖\n\nThis is the core node which uses ScrapeGraphAI to intelligently extract news articles from any website.\n\n## How to Use\n- Configure the target news website URL\n- Use natural language to describe what data to extract\n- The AI will automatically parse and structure the results\n\n## Configuration\n- **Website URL**: Target news website (e.g., BBC, CNN, Reuters)\n- **User Prompt**: Natural language instructions for data extraction\n- **API Credentials**: ScrapeGraphAI API key required\n\n## Example\n- **Website**: BBC News homepage\n- **Instruction**: \"Extract all article titles, URLs, and categories\"\n\n⚠️ **Note**: This is a community node requiring self-hosting"
      },
      "typeVersion": 1
    },
    {
      "id": "51a1337b-6a50-43a5-8d6f-8345bc771c7b",
      "name": "Haftnotiz2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1920,
        460
      ],
      "parameters": {
        "color": 5,
        "width": 574.9363634768473,
        "height": 530.4701664623029,
        "content": "# Step 3: News Data Formatting and Processing 🧱\n\nThis node transforms and structures the scraped news data for optimal Google Sheets compatibility.\n\n## What it does\n- Extracts articles array from ScrapeGraphAI response\n- Maps each article to standardized format\n- Ensures data consistency and structure\n- Prepares clean data for spreadsheet storage\n\n## Data Structure\n- **title**: Article headline and title\n- **url**: Direct link to the article\n- **category**: Article category or section\n\n## Customization\n- Modify the JavaScript code to extract additional fields\n- Add data validation and cleaning logic\n- Implement error handling for malformed data"
      },
      "typeVersion": 1
    },
    {
      "id": "2e8cde8e-f534-4f37-a1f9-bcf0fe0b09f9",
      "name": "Haftnotiz3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        460,
        460
      ],
      "parameters": {
        "color": 5,
        "width": 574.9363634768473,
        "height": 530.4701664623029,
        "content": "# Step 1: Automated News Collection Trigger ⏱️\n\nThis trigger automatically invokes the workflow at specified intervals to collect fresh news content.\n\n## Configuration Options\n- **Frequency**: Daily, hourly, or custom intervals\n- **Time Zone**: Configure for your business hours\n- **Execution Time**: Choose optimal times for news collection\n\n## Best Practices\n- Set appropriate intervals to respect rate limits\n- Consider news website update frequencies\n- Monitor execution logs for any issues\n- Adjust frequency based on your monitoring needs"
      },
      "typeVersion": 1
    },
    {
      "id": "5606537c-a531-490a-b4ff-6d0dc5e642b4",
      "name": "Haftnotiz",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2680,
        460
      ],
      "parameters": {
        "color": 5,
        "width": 574.9363634768473,
        "height": 530.4701664623029,
        "content": "# Step 4: Google Sheets News Storage 📊\n\nThis node securely stores the processed news article data in your Google Sheets for analysis and tracking.\n\n## What it does\n- Connects to your Google Sheets account via OAuth2\n- Appends new article data as rows\n- Maintains historical data for trend analysis\n- Provides structured data for business intelligence\n\n## Configuration\n- **Spreadsheet**: Select or create target Google Sheets document\n- **Sheet Name**: Configure worksheet (default: Sheet1)\n- **Operation**: Append mode for continuous data collection\n- **Column Mapping**: Automatic mapping of title, url, category fields\n\n## Data Management\n- Each execution adds new article entries\n- Historical data preserved for analysis\n- Easy export and sharing capabilities\n- Built-in Google Sheets analytics and filtering"
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "pinData": {},
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "c2fee060-f99e-48aa-a280-ac5492715fd9",
  "connections": {
    "efd61ca5-e248-4027-b705-6d9c5dabe820": {
      "main": [
        [
          {
            "node": "6d11ae64-e2f8-47ed-854a-c749881ce72c",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "37df323b-5c75-495f-ba19-b8642c02d96f": {
      "main": [
        [
          {
            "node": "efd61ca5-e248-4027-b705-6d9c5dabe820",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "6d11ae64-e2f8-47ed-854a-c749881ce72c": {
      "main": [
        [
          {
            "node": "976d9123-7585-4700-9972-5b2838571a44",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
Häufig gestellte Fragen

Wie verwende ich diesen Workflow?

Kopieren Sie den obigen JSON-Code, erstellen Sie einen neuen Workflow in Ihrer n8n-Instanz und wählen Sie "Aus JSON importieren". Fügen Sie die Konfiguration ein und passen Sie die Anmeldedaten nach Bedarf an.

Für welche Szenarien ist dieser Workflow geeignet?

Fortgeschritten - Marktforschung, KI-Zusammenfassung

Ist es kostenpflichtig?

Dieser Workflow ist völlig kostenlos. Beachten Sie jedoch, dass Drittanbieterdienste (wie OpenAI API), die im Workflow verwendet werden, möglicherweise kostenpflichtig sind.

Workflow-Informationen
Schwierigkeitsgrad
Fortgeschritten
Anzahl der Nodes8
Kategorie2
Node-Typen5
Schwierigkeitsbeschreibung

Für erfahrene Benutzer, mittelkomplexe Workflows mit 6-15 Nodes

Externe Links
Auf n8n.io ansehen

Diesen Workflow teilen

Kategorien

Kategorien: 34