๐Ÿ”ง Dataproc MCP Server Documentation

Production-ready Model Context Protocol server for Google Cloud Dataproc operations.

Quick Start Guide ๐Ÿš€

Get up and running with the Dataproc MCP Server in just 5 minutes!

Prerequisites

๐ŸŽฏ 5-Minute Setup

Step 1: Install the Package

# Install globally for easy access
npm install -g @dataproc/mcp-server

# Or install locally in your project
npm install @dataproc/mcp-server

Step 2: Quick Setup

# Run the interactive setup
dataproc-mcp --setup

# This will create:
# - config/server.json (server configuration)
# - config/default-params.json (default parameters)
# - profiles/ (cluster profile directory)

Step 3: Configure Authentication

For detailed authentication setup, refer to the Authentication Implementation Guide.

Step 4: Configure Your Project

Edit config/default-params.json:

{
  "defaultEnvironment": "development",
  "parameters": [
    {"name": "projectId", "type": "string", "required": true},
    {"name": "region", "type": "string", "required": true, "defaultValue": "us-central1"}
  ],
  "environments": [
    {
      "environment": "development",
      "parameters": {
        "projectId": "your-project-id",
        "region": "us-central1"
      }
    }
  ]
}

For enhanced natural language queries (optional):

# Install and start Qdrant vector database
docker run -p 6334:6333 qdrant/qdrant

# Verify Qdrant is running
curl http://localhost:6334/health

Benefits of Semantic Search:

Note: This is completely optional - all core functionality works without Qdrant.

Step 6: Start the Server

# Start the MCP server
dataproc-mcp

# Or run directly with Node.js
node /path/to/dataproc-mcp/build/index.js

๐ŸŒ Claude.ai Web App Integration

NEW: Full Claude.ai compatibility is now available!

For Claude.ai web app integration, see our dedicated guides:

Key Features:

๐Ÿ”ง MCP Client Integration

Claude Desktop

Add to your Claude Desktop configuration:

File: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "dataproc": {
      "command": "npx",
      "args": [
        "@dipseth/dataproc-mcp-server@latest"
      ],
      "env": {
        "LOG_LEVEL": "info",
        "DATAPROC_CONFIG_PATH": "/path/to/your/config/server.json"
      }
    }
  }
}

Roo (VS Code)

Add to your Roo MCP settings:

File: .roo/mcp.json

{
  "mcpServers": {
    "dataproc": {
      "command": "npx",
      "args": [
        "@dipseth/dataproc-mcp-server@latest"
      ],
      "env": {
        "LOG_LEVEL": "info",
        "DATAPROC_CONFIG_PATH": "/path/to/your/config/server.json"
      },
      "alwaysAllow": []
    }
  }
}

๐ŸŽฎ First Commands

Once connected, try these commands in your MCP client:

List Available Tools

What Dataproc tools are available?

Create a Simple Cluster

Create a small Dataproc cluster named "test-cluster" in my project

List Clusters

Show me all my Dataproc clusters

Submit a Spark Job

Submit a Spark job to process data from gs://my-bucket/data.csv

Cancel a Running Job

Cancel the job with ID "my-long-running-job-12345"

Monitor Job Status

Check the status of job "my-job-67890"

Try Semantic Search (if Qdrant enabled)

Show me clusters with machine learning packages installed
Find clusters using high-memory configurations

๐Ÿ“‹ Example Cluster Profile

Create a custom cluster profile in profiles/my-cluster.yaml:

my-project-dev-cluster:
  region: us-central1
  tags:
    - development
    - testing
  labels:
    environment: dev
    team: data-engineering
  cluster_config:
    master_config:
      num_instances: 1
      machine_type_uri: n1-standard-4
      disk_config:
        boot_disk_type: pd-standard
        boot_disk_size_gb: 100
    worker_config:
      num_instances: 2
      machine_type_uri: n1-standard-4
      disk_config:
        boot_disk_type: pd-standard
        boot_disk_size_gb: 100
      is_preemptible: true  # Cost savings for dev
    software_config:
      image_version: 2.1.1-debian10
      optional_components:
        - JUPYTER
      properties:
        dataproc:dataproc.allow.zero.workers: "true"
    lifecycle_config:
      idle_delete_ttl:
        seconds: 1800  # 30 minutes

๐Ÿ” Verification

Test Your Setup

# Check if the server starts correctly
dataproc-mcp --test

# Verify authentication
dataproc-mcp --verify-auth

# List available profiles
dataproc-mcp --list-profiles

Health Check

# Run comprehensive health check
npm run pre-flight  # If installed from source

# Or basic connectivity test
curl -X POST http://localhost:3000/health  # If running as HTTP server

๐Ÿšจ Troubleshooting

Common Issues

Authentication Errors

# Check your credentials
gcloud auth list
gcloud config list project

# Verify service account permissions
gcloud projects get-iam-policy YOUR_PROJECT_ID

Permission Errors

# Enable required APIs
gcloud services enable dataproc.googleapis.com
gcloud services enable compute.googleapis.com
gcloud services enable storage.googleapis.com

Connection Issues

# Check network connectivity
ping google.com

# Verify firewall rules
gcloud compute firewall-rules list

Getting Help

  1. Check the logs: Look for error messages in the console output
  2. Verify configuration: Ensure all required fields are filled
  3. Test authentication: Use gcloud auth application-default print-access-token
  4. Check permissions: Verify your service account has Dataproc Admin role

๐Ÿ“š Next Steps

Learn More

Advanced Features

Community

๐ŸŽ‰ Youโ€™re Ready!

Your Dataproc MCP Server is now configured and ready to use. Start by creating your first cluster and exploring the available tools through your MCP client.

Happy data processing! ๐Ÿš€


Need help? Check our testing guide or open an issue.