Live-Autoscraping von Y Combinator-Startups mit Apify und Google Sheets
Dies ist ein Lead Generation, Multimodal AI-Bereich Automatisierungsworkflow mit 9 Nodes. Hauptsächlich werden GoogleSheets, Apify, ManualTrigger und andere Nodes verwendet. Automatisierung des Scrapens von Y Combinator-Startups mit Apify und Google Sheets
- •Google Sheets API-Anmeldedaten
Verwendete Nodes (9)
Kategorie
{
"id": "f0l6j5GkLScFOfqK",
"meta": {
"instanceId": "1a54c41d9050a8f1fa6f74ca858828ad9fb97b9fafa3e9760e576171c531a787",
"templateCredsSetupCompleted": true
},
"name": "Live-Automate Scraping Y Combinator Startups with Apify & Google Sheets",
"tags": [],
"nodes": [
{
"id": "4d88b9f9-6909-47c8-91a5-c27ebc97de49",
"name": "Actor ausführen",
"type": "@apify/n8n-nodes-apify.apify",
"position": [
1632,
1632
],
"parameters": {
"actorId": {
"__rl": true,
"mode": "list",
"value": "XXsXDaNQLjoF4lgmU",
"cachedResultUrl": "https://console.apify.com/actors/XXsXDaNQLjoF4lgmU/input",
"cachedResultName": "Y Combinator Directory Scraper | Fast & Reliable | $4.5 / 1K (fatihtahta/y-combinator-directory-scraper)"
},
"customBody": "{\n \"maxCompanies\": 5,\n \"startUrls\": \"{https://www.ycombinator.com/companies?industry=Fintech®ions=America%20%2F%20Canada&team_size=%5B%221%22%2C%2225%22%5D}\",\n \"proxyConfiguration\": {\n \"useApifyProxy\": true\n }\n}"
},
"credentials": {
"apifyApi": {
"id": "8decwrzbYTySCGCT",
"name": "Apify account 4"
}
},
"typeVersion": 1
},
{
"id": "e524c759-a193-42b6-9553-683656413431",
"name": "Datensatzelemente abrufen",
"type": "@apify/n8n-nodes-apify.apify",
"position": [
2432,
1968
],
"parameters": {
"resource": "Datasets",
"datasetId": "={{ $json.defaultDatasetId }}"
},
"credentials": {
"apifyApi": {
"id": "8decwrzbYTySCGCT",
"name": "Apify account 4"
}
},
"typeVersion": 1
},
{
"id": "4eea9bab-911c-4480-9073-831b8ac46571",
"name": "Haftnotiz",
"type": "n8n-nodes-base.stickyNote",
"position": [
608,
1744
],
"parameters": {
"width": 528,
"height": 336,
"content": "### **Step 1 – Manual Trigger**\n\n- The workflow begins with a **Manual Trigger node**, allowing you to start the process on demand. \n- This approach ensures full control over when company data from **Y Combinator** is scraped and logged. \n"
},
"typeVersion": 1
},
{
"id": "b5814a97-7dd1-4488-8af3-6bf0af555d51",
"name": "Workflow starten",
"type": "n8n-nodes-base.manualTrigger",
"position": [
816,
1936
],
"parameters": {},
"typeVersion": 1
},
{
"id": "3eacc0a3-ca74-4405-ad0e-a25b9b4b964e",
"name": "Haftnotiz1",
"type": "n8n-nodes-base.stickyNote",
"position": [
1392,
1424
],
"parameters": {
"color": 3,
"width": 592,
"height": 368,
"content": "### **Step 2 – Apify Actor (Scrape Company Data)**\n\n- This step uses an **Apify Actor node** to scrape details of companies listed on **Y Combinator**. \n- You need to provide the **URL of the Y Combinator search page** with your desired filters applied (e.g., industry, location, funding stage). \n- The actor then extracts structured company data, including names, descriptions, websites, and other available details, preparing it for downstream logging and processing.\n"
},
"typeVersion": 1
},
{
"id": "d67e5ff1-ff84-4196-9a76-cc59215e4061",
"name": "Haftnotiz2",
"type": "n8n-nodes-base.stickyNote",
"position": [
2176,
1760
],
"parameters": {
"color": 4,
"width": 592,
"height": 368,
"content": "### **Step 3 – Apify Get Dataset Items**\n\n- This step uses the **Apify Get Dataset Items node** to fetch the actual company data generated by the Apify Actor in the previous step. \n- The node requires the **Dataset ID** returned by the Apify Actor to retrieve structured results. \n- The output includes detailed company information (e.g., name, description, website, location, sector), which is then prepared for logging into Google Sheets.\n"
},
"typeVersion": 1
},
{
"id": "04149226-1821-419d-b7c6-f2288de0f4cc",
"name": "Haftnotiz3",
"type": "n8n-nodes-base.stickyNote",
"position": [
3040,
1104
],
"parameters": {
"color": 5,
"width": 640,
"height": 720,
"content": "### **Step 4 – Add or Update Row in Google Sheet**\n\n- This step uses the **Google Sheets (Add or Update Row) node** to log the company data into a connected Google Sheet. \n- You must **select the target Google Document and specific Sheet** where the data will be stored. \n- Ensure the following columns are already created in the sheet (**case-sensitive**): \n - Company \n - Location \n - Website \n - LinkedIn \n - Founded \n - Description \n - Industry Tags \n - Founder 1 Name \n - Founder 1 LinkedIn \n - Founder 2 Name \n - Founder 2 LinkedIn \n\n- The node will automatically add new rows or update existing entries, keeping the sheet clean and up to date with the latest scraped company details.\n"
},
"typeVersion": 1
},
{
"id": "e0cff6ae-ea8b-47c6-8cc1-884459e8224e",
"name": "Daten zu Google Sheet hinzufügen",
"type": "n8n-nodes-base.googleSheets",
"position": [
3312,
1616
],
"parameters": {
"columns": {
"value": {
"Company": "={{ $json.company_name }}",
"Founded": "={{ $json.year_founded }}",
"Website": "={{ $json.website }}",
"LinkedIn": "={{ $json.company_linkedin }}",
"Location": "={{ $json.company_location }}",
"Description": "={{ $json.long_description }}",
"Industry Tags": "={{ $json['tags/0'] }} {{ $json['tags/1'] }} {{ $json['tags/2'] }} {{ $json['tags/3'] }}",
"Founder 1 Name": "={{ $json['founders/0/name'] }}",
"Founder 2 Name": "={{ $json['founders/1/name'] }}",
"Founder 1 LinkedIn": "={{ $json['founders/0/linkedin'] }}",
"Founder 2 LinkedIn": "={{ $json['founders/1/linkedin'] }}"
},
"schema": [
{
"id": "Company",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "Company",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Location",
"type": "string",
"display": true,
"required": false,
"displayName": "Location",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Website",
"type": "string",
"display": true,
"required": false,
"displayName": "Website",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "LinkedIn",
"type": "string",
"display": true,
"required": false,
"displayName": "LinkedIn",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Founded",
"type": "string",
"display": true,
"required": false,
"displayName": "Founded",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Description",
"type": "string",
"display": true,
"required": false,
"displayName": "Description",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Industry Tags",
"type": "string",
"display": true,
"required": false,
"displayName": "Industry Tags",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Founder 1 Name",
"type": "string",
"display": true,
"required": false,
"displayName": "Founder 1 Name",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Founder 1 LinkedIn",
"type": "string",
"display": true,
"required": false,
"displayName": "Founder 1 LinkedIn",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Founder 2 Name",
"type": "string",
"display": true,
"required": false,
"displayName": "Founder 2 Name",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Founder 2 LinkedIn",
"type": "string",
"display": true,
"required": false,
"displayName": "Founder 2 LinkedIn",
"defaultMatch": false,
"canBeUsedToMatch": true
}
],
"mappingMode": "defineBelow",
"matchingColumns": [
"Company"
],
"attemptToConvertTypes": false,
"convertFieldsToString": false
},
"options": {},
"operation": "appendOrUpdate",
"sheetName": {
"__rl": true,
"mode": "list",
"value": "gid=0",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1AEOYMIRNgxYN3gihT1bIrGswnkCzuWbFljX2ac4XjUU/edit#gid=0",
"cachedResultName": "Sheet1"
},
"documentId": {
"__rl": true,
"mode": "list",
"value": "1AEOYMIRNgxYN3gihT1bIrGswnkCzuWbFljX2ac4XjUU",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1AEOYMIRNgxYN3gihT1bIrGswnkCzuWbFljX2ac4XjUU/edit?usp=drivesdk",
"cachedResultName": "YCom Apify Scrapped "
}
},
"credentials": {
"googleSheetsOAuth2Api": {
"id": "dZG6jp43p2oX45HG",
"name": "Google Sheets account 4-Smit"
}
},
"typeVersion": 4.7
},
{
"id": "c8f614e2-2aa5-4f4a-8be9-090fb24bf616",
"name": "Haftnotiz4",
"type": "n8n-nodes-base.stickyNote",
"position": [
368,
944
],
"parameters": {
"color": 3,
"width": 768,
"height": 672,
"content": "### **Step 0 – Prerequisites**\n\nBefore running the workflow, ensure the following configurations are complete:\n\n- **Apify Setup:**\n - Connect your Apify account in n8n. \n - Select the **Y Combinator Directory Scraper** actor. \n - Paste the Y Combinator search URL (with filters applied) into the `searchUrls` parameter. \n - Adjust the `maxCompanies` parameter to control the number of companies scraped per run. \n\n- **Google Sheets Setup:**\n - Connect your Google account using **OAuth2 credentials** with both **Google Sheets** and **Google Drive** features enabled. \n - Ensure the target Google Sheet is created in advance with the following column headers (**case-sensitive**): \n - Company \n - Location \n - Website \n - LinkedIn \n - Founded \n - Description \n - Industry Tags \n - Founder 1 Name \n - Founder 1 LinkedIn \n - Founder 2 Name \n - Founder 2 LinkedIn \n\n- **n8n Configuration:**\n - Confirm that both Apify and Google integrations are properly authenticated and available in your workflow.\n"
},
"typeVersion": 1
}
],
"active": false,
"pinData": {},
"settings": {
"executionOrder": "v1"
},
"versionId": "36ae4ec1-b59a-49a4-b4e6-0f80bd2111f3",
"connections": {
"4d88b9f9-6909-47c8-91a5-c27ebc97de49": {
"main": [
[
{
"node": "e524c759-a193-42b6-9553-683656413431",
"type": "main",
"index": 0
}
]
]
},
"b5814a97-7dd1-4488-8af3-6bf0af555d51": {
"main": [
[
{
"node": "4d88b9f9-6909-47c8-91a5-c27ebc97de49",
"type": "main",
"index": 0
}
]
]
},
"e524c759-a193-42b6-9553-683656413431": {
"main": [
[
{
"node": "e0cff6ae-ea8b-47c6-8cc1-884459e8224e",
"type": "main",
"index": 0
}
]
]
}
}
}Wie verwende ich diesen Workflow?
Kopieren Sie den obigen JSON-Code, erstellen Sie einen neuen Workflow in Ihrer n8n-Instanz und wählen Sie "Aus JSON importieren". Fügen Sie die Konfiguration ein und passen Sie die Anmeldedaten nach Bedarf an.
Für welche Szenarien ist dieser Workflow geeignet?
Fortgeschritten - Lead-Generierung, Multimodales KI
Ist es kostenpflichtig?
Dieser Workflow ist völlig kostenlos. Beachten Sie jedoch, dass Drittanbieterdienste (wie OpenAI API), die im Workflow verwendet werden, möglicherweise kostenpflichtig sind.
Verwandte Workflows
Intuz
@intuzWorkflow automation can help automate your routine activities and help saves $$$, as well as hours of time. As a boutique tech consulting company, Intuz help businesses with custom AI/ML, AI Workflow Automations, and software development. Automate your business workflow for: Sales Marketing Accounting Finance Operations E-Commerce Customer Support Admin & Backoffice Logistics & Supply Chain
Diesen Workflow teilen