Automatisches Scrapen von Nachrichtenartikeln mit ScrapegraphAI und Speicherung in Google Sheets
Dies ist ein Market Research, AI Summarization-Bereich Automatisierungsworkflow mit 8 Nodes. Hauptsächlich werden Code, GoogleSheets, ScheduleTrigger, ScrapegraphAi und andere Nodes verwendet. Automatisiertes Scrapen von Nachrichtenartikeln mit ScrapegraphAI und Speicherung in Google Sheets
- •Google Sheets API-Anmeldedaten
Verwendete Nodes (8)
Kategorie
{
"id": "MIllJmbqayQrZM1F",
"meta": {
"instanceId": "521567c5f495f323b77849c4cfd0c9f4f2396c986e324e0e66c8425b6f124744",
"templateCredsSetupCompleted": true
},
"name": "Automate News Article Scraping with ScrapegraphAI and Store in Google Sheets",
"tags": [],
"nodes": [
{
"id": "37df323b-5c75-495f-ba19-b8642c02d96f",
"name": "Automatisierter Nachrichten-Sammel-Trigger",
"type": "n8n-nodes-base.scheduleTrigger",
"position": [
700,
820
],
"parameters": {
"rule": {
"interval": [
{}
]
}
},
"typeVersion": 1.2
},
{
"id": "efd61ca5-e248-4027-b705-6d9c5dabe820",
"name": "KI-gestützter Nachrichtenartikel-Scraper",
"type": "n8n-nodes-scrapegraphai.scrapegraphAi",
"position": [
1380,
820
],
"parameters": {
"userPrompt": "Extract all the articles from this site. Use the following schema for response { \"request_id\": \"5a9de102-8a43-4e89-8aae-397c9ca80a9b\", \"status\": \"completed\", \"website_url\": \"https://www.bbc.com/\", \"user_prompt\": \"Extract all the articles from this site.\", \"title\": \"'My friend died right in front of me' - Student describes moment air force jet crashed into school\", \"url\": \"https://www.bbc.com/news/articles/cglzw8y5wy5o\", \"category\": \"Asia\" }",
"websiteUrl": "https://www.bbc.com/"
},
"credentials": {
"scrapegraphAIApi": {
"id": "",
"name": ""
}
},
"typeVersion": 1
},
{
"id": "976d9123-7585-4700-9972-5b2838571a44",
"name": "Google Sheets Nachrichtenspeicher",
"type": "n8n-nodes-base.googleSheets",
"position": [
2980,
820
],
"parameters": {
"columns": {
"value": {},
"schema": [
{
"id": "title",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "title",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "url",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "url",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "category",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "category",
"defaultMatch": false,
"canBeUsedToMatch": true
}
],
"mappingMode": "autoMapInputData",
"matchingColumns": []
},
"options": {},
"operation": "append",
"sheetName": {
"__rl": true,
"mode": "name",
"value": "Sheet1"
},
"documentId": {
"__rl": true,
"mode": "url",
"value": ""
}
},
"credentials": {
"googleSheetsOAuth2Api": {
"id": "",
"name": ""
}
},
"typeVersion": 4.5
},
{
"id": "6d11ae64-e2f8-47ed-854a-c749881ce72c",
"name": "Nachrichtendaten-Formatierung und -Verarbeitung",
"type": "n8n-nodes-base.code",
"notes": "Hey this is where \nyou \nformat results ",
"position": [
2140,
820
],
"parameters": {
"jsCode": "// Get the input data\nconst inputData = $input.all()[0].json;\n\n// Extract articles array\nconst articles = inputData.result.articles;\n\n// Map each article and return only title, url, category\nreturn articles.map(article => ({\n json: {\n title: article.title,\n url: article.url,\n category: article.category\n }\n}));"
},
"notesInFlow": true,
"typeVersion": 2
},
{
"id": "ca78baaf-0480-490d-aa9a-3663ca93f5d0",
"name": "Haftnotiz1",
"type": "n8n-nodes-base.stickyNote",
"position": [
1180,
460
],
"parameters": {
"color": 5,
"width": 574.9363634768473,
"height": 530.4701664623029,
"content": "# Step 2: AI-Powered News Article Scraper 🤖\n\nThis is the core node which uses ScrapeGraphAI to intelligently extract news articles from any website.\n\n## How to Use\n- Configure the target news website URL\n- Use natural language to describe what data to extract\n- The AI will automatically parse and structure the results\n\n## Configuration\n- **Website URL**: Target news website (e.g., BBC, CNN, Reuters)\n- **User Prompt**: Natural language instructions for data extraction\n- **API Credentials**: ScrapeGraphAI API key required\n\n## Example\n- **Website**: BBC News homepage\n- **Instruction**: \"Extract all article titles, URLs, and categories\"\n\n⚠️ **Note**: This is a community node requiring self-hosting"
},
"typeVersion": 1
},
{
"id": "51a1337b-6a50-43a5-8d6f-8345bc771c7b",
"name": "Haftnotiz2",
"type": "n8n-nodes-base.stickyNote",
"position": [
1920,
460
],
"parameters": {
"color": 5,
"width": 574.9363634768473,
"height": 530.4701664623029,
"content": "# Step 3: News Data Formatting and Processing 🧱\n\nThis node transforms and structures the scraped news data for optimal Google Sheets compatibility.\n\n## What it does\n- Extracts articles array from ScrapeGraphAI response\n- Maps each article to standardized format\n- Ensures data consistency and structure\n- Prepares clean data for spreadsheet storage\n\n## Data Structure\n- **title**: Article headline and title\n- **url**: Direct link to the article\n- **category**: Article category or section\n\n## Customization\n- Modify the JavaScript code to extract additional fields\n- Add data validation and cleaning logic\n- Implement error handling for malformed data"
},
"typeVersion": 1
},
{
"id": "2e8cde8e-f534-4f37-a1f9-bcf0fe0b09f9",
"name": "Haftnotiz3",
"type": "n8n-nodes-base.stickyNote",
"position": [
460,
460
],
"parameters": {
"color": 5,
"width": 574.9363634768473,
"height": 530.4701664623029,
"content": "# Step 1: Automated News Collection Trigger ⏱️\n\nThis trigger automatically invokes the workflow at specified intervals to collect fresh news content.\n\n## Configuration Options\n- **Frequency**: Daily, hourly, or custom intervals\n- **Time Zone**: Configure for your business hours\n- **Execution Time**: Choose optimal times for news collection\n\n## Best Practices\n- Set appropriate intervals to respect rate limits\n- Consider news website update frequencies\n- Monitor execution logs for any issues\n- Adjust frequency based on your monitoring needs"
},
"typeVersion": 1
},
{
"id": "5606537c-a531-490a-b4ff-6d0dc5e642b4",
"name": "Haftnotiz",
"type": "n8n-nodes-base.stickyNote",
"position": [
2680,
460
],
"parameters": {
"color": 5,
"width": 574.9363634768473,
"height": 530.4701664623029,
"content": "# Step 4: Google Sheets News Storage 📊\n\nThis node securely stores the processed news article data in your Google Sheets for analysis and tracking.\n\n## What it does\n- Connects to your Google Sheets account via OAuth2\n- Appends new article data as rows\n- Maintains historical data for trend analysis\n- Provides structured data for business intelligence\n\n## Configuration\n- **Spreadsheet**: Select or create target Google Sheets document\n- **Sheet Name**: Configure worksheet (default: Sheet1)\n- **Operation**: Append mode for continuous data collection\n- **Column Mapping**: Automatic mapping of title, url, category fields\n\n## Data Management\n- Each execution adds new article entries\n- Historical data preserved for analysis\n- Easy export and sharing capabilities\n- Built-in Google Sheets analytics and filtering"
},
"typeVersion": 1
}
],
"active": false,
"pinData": {},
"settings": {
"executionOrder": "v1"
},
"versionId": "c2fee060-f99e-48aa-a280-ac5492715fd9",
"connections": {
"efd61ca5-e248-4027-b705-6d9c5dabe820": {
"main": [
[
{
"node": "6d11ae64-e2f8-47ed-854a-c749881ce72c",
"type": "main",
"index": 0
}
]
]
},
"37df323b-5c75-495f-ba19-b8642c02d96f": {
"main": [
[
{
"node": "efd61ca5-e248-4027-b705-6d9c5dabe820",
"type": "main",
"index": 0
}
]
]
},
"6d11ae64-e2f8-47ed-854a-c749881ce72c": {
"main": [
[
{
"node": "976d9123-7585-4700-9972-5b2838571a44",
"type": "main",
"index": 0
}
]
]
}
}
}Wie verwende ich diesen Workflow?
Kopieren Sie den obigen JSON-Code, erstellen Sie einen neuen Workflow in Ihrer n8n-Instanz und wählen Sie "Aus JSON importieren". Fügen Sie die Konfiguration ein und passen Sie die Anmeldedaten nach Bedarf an.
Für welche Szenarien ist dieser Workflow geeignet?
Fortgeschritten - Marktforschung, KI-Zusammenfassung
Ist es kostenpflichtig?
Dieser Workflow ist völlig kostenlos. Beachten Sie jedoch, dass Drittanbieterdienste (wie OpenAI API), die im Workflow verwendet werden, möglicherweise kostenpflichtig sind.
Verwandte Workflows
vinci-king-01
@vinci-king-01Diesen Workflow teilen