Quick Start Guide ๐
Get up and running with the Dataproc MCP Server in just 5 minutes!
Prerequisites
- Node.js 18+ - Download here
- Google Cloud Project with Dataproc API enabled
- Authentication - Service account key or gcloud CLI
๐ฏ 5-Minute Setup
Step 1: Install the Package
# Install globally for easy access
npm install -g @dataproc/mcp-server
# Or install locally in your project
npm install @dataproc/mcp-server
Step 2: Quick Setup
# Run the interactive setup
dataproc-mcp --setup
# This will create:
# - config/server.json (server configuration)
# - config/default-params.json (default parameters)
# - profiles/ (cluster profile directory)
Step 3: Configure Authentication
For detailed authentication setup, refer to the Authentication Implementation Guide.
Step 4: Configure Your Project
Edit config/default-params.json
:
{
"defaultEnvironment": "development",
"parameters": [
{"name": "projectId", "type": "string", "required": true},
{"name": "region", "type": "string", "required": true, "defaultValue": "us-central1"}
],
"environments": [
{
"environment": "development",
"parameters": {
"projectId": "your-project-id",
"region": "us-central1"
}
}
]
}
Step 5: Optional - Enable Semantic Search
For enhanced natural language queries (optional):
# Install and start Qdrant vector database
docker run -p 6334:6333 qdrant/qdrant
# Verify Qdrant is running
curl http://localhost:6334/health
Benefits of Semantic Search:
- Natural language cluster queries: โshow me clusters with pip packagesโ
- Intelligent data extraction and filtering
- Enhanced search capabilities with confidence scoring
Note: This is completely optional - all core functionality works without Qdrant.
Step 6: Start the Server
# Start the MCP server
dataproc-mcp
# Or run directly with Node.js
node /path/to/dataproc-mcp/build/index.js
๐ Claude.ai Web App Integration
NEW: Full Claude.ai compatibility is now available!
For Claude.ai web app integration, see our dedicated guides:
- Complete Claude.ai Integration Guide - Detailed setup with troubleshooting
Key Features:
- โ All 22 MCP tools available in Claude.ai
- โ HTTPS tunneling with Cloudflare
- โ OAuth authentication with GitHub
- โ Secure WebSocket connections
๐ง MCP Client Integration
Claude Desktop
Add to your Claude Desktop configuration:
File: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"dataproc": {
"command": "npx",
"args": [
"@dipseth/dataproc-mcp-server@latest"
],
"env": {
"LOG_LEVEL": "info",
"DATAPROC_CONFIG_PATH": "/path/to/your/config/server.json"
}
}
}
}
Roo (VS Code)
Add to your Roo MCP settings:
File: .roo/mcp.json
{
"mcpServers": {
"dataproc": {
"command": "npx",
"args": [
"@dipseth/dataproc-mcp-server@latest"
],
"env": {
"LOG_LEVEL": "info",
"DATAPROC_CONFIG_PATH": "/path/to/your/config/server.json"
},
"alwaysAllow": []
}
}
}
๐ฎ First Commands
Once connected, try these commands in your MCP client:
List Available Tools
What Dataproc tools are available?
Create a Simple Cluster
Create a small Dataproc cluster named "test-cluster" in my project
List Clusters
Show me all my Dataproc clusters
Submit a Spark Job
Submit a Spark job to process data from gs://my-bucket/data.csv
Cancel a Running Job
Cancel the job with ID "my-long-running-job-12345"
Monitor Job Status
Check the status of job "my-job-67890"
Try Semantic Search (if Qdrant enabled)
Show me clusters with machine learning packages installed
Find clusters using high-memory configurations
๐ Example Cluster Profile
Create a custom cluster profile in profiles/my-cluster.yaml
:
my-project-dev-cluster:
region: us-central1
tags:
- development
- testing
labels:
environment: dev
team: data-engineering
cluster_config:
master_config:
num_instances: 1
machine_type_uri: n1-standard-4
disk_config:
boot_disk_type: pd-standard
boot_disk_size_gb: 100
worker_config:
num_instances: 2
machine_type_uri: n1-standard-4
disk_config:
boot_disk_type: pd-standard
boot_disk_size_gb: 100
is_preemptible: true # Cost savings for dev
software_config:
image_version: 2.1.1-debian10
optional_components:
- JUPYTER
properties:
dataproc:dataproc.allow.zero.workers: "true"
lifecycle_config:
idle_delete_ttl:
seconds: 1800 # 30 minutes
๐ Verification
Test Your Setup
# Check if the server starts correctly
dataproc-mcp --test
# Verify authentication
dataproc-mcp --verify-auth
# List available profiles
dataproc-mcp --list-profiles
Health Check
# Run comprehensive health check
npm run pre-flight # If installed from source
# Or basic connectivity test
curl -X POST http://localhost:3000/health # If running as HTTP server
๐จ Troubleshooting
Common Issues
Authentication Errors
# Check your credentials
gcloud auth list
gcloud config list project
# Verify service account permissions
gcloud projects get-iam-policy YOUR_PROJECT_ID
Permission Errors
# Enable required APIs
gcloud services enable dataproc.googleapis.com
gcloud services enable compute.googleapis.com
gcloud services enable storage.googleapis.com
Connection Issues
# Check network connectivity
ping google.com
# Verify firewall rules
gcloud compute firewall-rules list
Getting Help
- Check the logs: Look for error messages in the console output
- Verify configuration: Ensure all required fields are filled
- Test authentication: Use
gcloud auth application-default print-access-token
- Check permissions: Verify your service account has Dataproc Admin role
๐ Next Steps
Learn More
- API Reference - Complete tool documentation
- Configuration Examples - Real-world setups
- Security Guide - Best practices
- Testing Guide - Testing and debugging information
Advanced Features
- Multi-environment setup for dev/staging/production
- Custom cluster profiles for different workloads
- Automated job scheduling with cron-like syntax
- Performance monitoring and alerting
- Cost optimization with preemptible instances
Community
- GitHub Issues - Bug reports and feature requests
- Community Support - Community Q&A
- Contributing Guide - How to contribute
๐ Youโre Ready!
Your Dataproc MCP Server is now configured and ready to use. Start by creating your first cluster and exploring the available tools through your MCP client.
Happy data processing! ๐
Need help? Check our testing guide or open an issue.