LLM Landing Zone: Enterprise-Grade GenAI Infrastructure Pattern

A comprehensive guide to building secure, scalable LLM Gateway infrastructure on GCP and AWS using Private Service Connect, VPC isolation, and Terraform-based IaC for enterprise GenAI deployments.

GT
Gonnect Team
January 15, 202515 min readView on GitHub
GCPAWSTerraformPrivate Service ConnectVPCKubernetes

Introduction

As enterprises race to adopt Large Language Models, the gap between proof-of-concept and production becomes glaringly apparent. While spinning up an LLM API call is trivial, deploying GenAI at enterprise scale demands rigorous infrastructure patterns—network isolation, zero-trust security, cross-project connectivity, and infrastructure-as-code governance.

The LLM Landing Zone pattern addresses this gap head-on, providing a battle-tested architecture for deploying LLM Gateways that meet enterprise security, compliance, and scalability requirements.

This article presents a deep-dive into the architecture, covering both GCP and AWS implementations, with production-ready Terraform configurations.

The Problem: GenAI in the Enterprise

Enterprise GenAI deployments face challenges that don't exist in sandbox environments:

Network Security

  • LLM endpoints must not be exposed to public internet
  • Traffic must flow through private connectivity
  • Cross-project/account communication needs secure bridges

Multi-Tenancy

  • Multiple consumer applications need isolated access
  • Quota management and rate limiting per tenant
  • Audit trails for compliance

Governance

  • Infrastructure changes must be versioned and auditable
  • Least-privilege IAM across project boundaries
  • Encryption at rest and in transit

Operational Excellence

  • Health checks and auto-recovery
  • Observability and tracing
  • Blue-green deployment capabilities

Architecture Overview

The LLM Landing Zone implements a hub-and-spoke model with three core components:

┌──────────────────────────────────────────────────────────┐
│                    LLM LANDING ZONE                      │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  ┌────────────┐   ┌────────────┐   ┌────────────┐       │
│  │  BASE HUB  │   │  PRODUCER  │   │  CONSUMER  │       │
│  │            │   │  PROJECT   │   │  PROJECT   │       │
│  │ • TF State │   │            │   │            │       │
│  │ • Modules  │   │ • LLM GW   │   │ • Gen App  │       │
│  │ • Policies │   │ • ILB      │   │ • PSC EP   │       │
│  │            │   │ • PSC Att  │   │            │       │
│  └────────────┘   └────────────┘   └────────────┘       │
│                          │                │              │
│                          └────────────────┘              │
│                    Private Service Connect               │
│                                                          │
└──────────────────────────────────────────────────────────┘

Base Landing Zone (Hub)

The hub project serves as the control plane:

  • Terraform State Management: GCS bucket with versioning for state files
  • Modular Infrastructure Code: Reusable modules for network, IAM, PSC
  • Policy Definitions: Organization-wide security policies

Producer Project (LLM Gateway)

Hosts the LLM Gateway service with:

  • Custom VPC: Isolated network (10.0.0.0/24) for LLM workloads
  • PSC Subnet: Dedicated subnet (10.0.1.0/24) for Private Service Connect
  • Internal Load Balancer: All-ports enabled for flexible routing
  • PSC Service Attachment: Proxy Protocol enabled for client identification

Consumer Project (Gen Applications)

Consumes the LLM Gateway via:

  • PSC Endpoint: Private connectivity to producer services
  • No Public IPs: All traffic flows through private channels
  • IAP Access: Secure administrative access via Identity-Aware Proxy

GCP Implementation Deep Dive

Network Architecture

# VPC with custom subnets
resource "google_compute_network" "llm_network" {
  name                    = "llm-network"
  auto_create_subnetworks = false
  project                 = var.producer_project_id
}

# Main subnet for LLM Gateway
resource "google_compute_subnetwork" "main" {
  name          = "llm-main-subnet"
  ip_cidr_range = "10.0.0.0/24"
  network       = google_compute_network.llm_network.id
  region        = var.region

  private_ip_google_access = true
}

# PSC subnet for service attachment
resource "google_compute_subnetwork" "psc" {
  name          = "llm-psc-subnet"
  ip_cidr_range = "10.0.1.0/24"
  network       = google_compute_network.llm_network.id
  region        = var.region
  purpose       = "PRIVATE_SERVICE_CONNECT"
}

Firewall Rules

Security is enforced through network tags and granular firewall rules:

# Internal traffic within VPC
resource "google_compute_firewall" "internal" {
  name    = "allow-internal"
  network = google_compute_network.llm_network.name

  allow {
    protocol = "tcp"
    ports    = ["0-65535"]
  }
  allow {
    protocol = "udp"
    ports    = ["0-65535"]
  }
  allow {
    protocol = "icmp"
  }

  source_tags = ["llm-internal"]
  target_tags = ["llm-internal"]
}

# Health checks from GCP load balancers
resource "google_compute_firewall" "health_check" {
  name    = "allow-health-check"
  network = google_compute_network.llm_network.name

  allow {
    protocol = "tcp"
    ports    = ["80", "443", "8080"]
  }

  source_ranges = ["130.211.0.0/22", "35.191.0.0/16"]
  target_tags   = ["llm-gateway"]
}

# IAP for secure SSH access
resource "google_compute_firewall" "iap" {
  name    = "allow-iap"
  network = google_compute_network.llm_network.name

  allow {
    protocol = "tcp"
    ports    = ["22"]
  }

  source_ranges = ["35.235.240.0/20"]  # IAP IP range
  target_tags   = ["allow-iap"]
}

Private Service Connect

PSC enables secure cross-project connectivity:

# Internal Load Balancer
resource "google_compute_forwarding_rule" "llm_ilb" {
  name                  = "llm-gateway-ilb"
  region                = var.region
  load_balancing_scheme = "INTERNAL"
  backend_service       = google_compute_region_backend_service.llm.id
  all_ports             = true
  network               = google_compute_network.llm_network.id
  subnetwork            = google_compute_subnetwork.main.id
}

# PSC Service Attachment
resource "google_compute_service_attachment" "llm_psc" {
  name                  = "llm-gateway-psc"
  region                = var.region
  connection_preference = "ACCEPT_AUTOMATIC"

  nat_subnets           = [google_compute_subnetwork.psc.id]
  target_service        = google_compute_forwarding_rule.llm_ilb.id

  enable_proxy_protocol = true

  consumer_accept_lists {
    project_id_or_num = var.consumer_project_id
    connection_limit  = 10
  }
}

IAM Configuration

Least-privilege access across project boundaries:

# OS Login for VM access
resource "google_project_iam_member" "os_login" {
  project = var.producer_project_id
  role    = "roles/compute.osLogin"
  member  = "user:${var.admin_email}"
}

# IAP tunnel access
resource "google_project_iam_member" "iap_tunnel" {
  project = var.producer_project_id
  role    = "roles/iap.tunnelResourceAccessor"
  member  = "user:${var.admin_email}"
}

# Service account for LLM Gateway
resource "google_service_account" "llm_gateway" {
  account_id   = "llm-gateway-sa"
  display_name = "LLM Gateway Service Account"
  project      = var.producer_project_id
}

# Vertex AI access for the gateway
resource "google_project_iam_member" "vertex_ai" {
  project = var.producer_project_id
  role    = "roles/aiplatform.user"
  member  = "serviceAccount:${google_service_account.llm_gateway.email}"
}

AWS Implementation

The same pattern translates to AWS using PrivateLink:

┌──────────────────────────────────────────────────────────┐
│                  AWS LLM LANDING ZONE                    │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  ┌────────────┐   ┌────────────┐   ┌────────────┐       │
│  │ MANAGEMENT │   │  PRODUCER  │   │  CONSUMER  │       │
│  │  ACCOUNT   │   │  ACCOUNT   │   │  ACCOUNT   │       │
│  │            │   │            │   │            │       │
│  │ • S3 State │   │ • LLM GW   │   │ • Gen App  │       │
│  │ • Modules  │   │ • NLB      │   │ • VPC EP   │       │
│  │ • SCPs     │   │ • VPC EP   │   │            │       │
│  │            │   │            │   │            │       │
│  └────────────┘   └────────────┘   └────────────┘       │
│                          │                │              │
│                          └────────────────┘              │
│                      AWS PrivateLink                     │
│                                                          │
└──────────────────────────────────────────────────────────┘

VPC Configuration

# Producer VPC
resource "aws_vpc" "llm_producer" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "llm-producer-vpc"
  }
}

# Private subnets for LLM Gateway
resource "aws_subnet" "llm_private" {
  count             = 3
  vpc_id            = aws_vpc.llm_producer.id
  cidr_block        = "10.0.${count.index + 1}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name = "llm-private-${count.index + 1}"
  }
}
# Network Load Balancer
resource "aws_lb" "llm_nlb" {
  name               = "llm-gateway-nlb"
  internal           = true
  load_balancer_type = "network"
  subnets            = aws_subnet.llm_private[*].id

  enable_cross_zone_load_balancing = true
}

# VPC Endpoint Service
resource "aws_vpc_endpoint_service" "llm" {
  acceptance_required        = true
  network_load_balancer_arns = [aws_lb.llm_nlb.arn]

  allowed_principals = [
    "arn:aws:iam::${var.consumer_account_id}:root"
  ]

  tags = {
    Name = "llm-gateway-endpoint-service"
  }
}

# Consumer VPC Endpoint
resource "aws_vpc_endpoint" "llm_consumer" {
  provider            = aws.consumer
  vpc_id              = aws_vpc.consumer.id
  service_name        = aws_vpc_endpoint_service.llm.service_name
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.consumer_private[*].id
  security_group_ids  = [aws_security_group.llm_endpoint.id]

  private_dns_enabled = true
}

Security Groups

# LLM Gateway Security Group
resource "aws_security_group" "llm_gateway" {
  name        = "llm-gateway-sg"
  description = "Security group for LLM Gateway"
  vpc_id      = aws_vpc.llm_producer.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.0/8"]  # Internal only
  }

  ingress {
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.0/8"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# VPC Endpoint Security Group
resource "aws_security_group" "llm_endpoint" {
  name        = "llm-endpoint-sg"
  description = "Security group for LLM VPC Endpoint"
  vpc_id      = aws_vpc.consumer.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [aws_vpc.consumer.cidr_block]
  }
}

IAM Roles

# LLM Gateway Role
resource "aws_iam_role" "llm_gateway" {
  name = "llm-gateway-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })
}

# Bedrock Access Policy
resource "aws_iam_role_policy" "bedrock_access" {
  name = "bedrock-access"
  role = aws_iam_role.llm_gateway.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ]
      Resource = "*"
    }]
  })
}

LLM Gateway Service

The gateway service provides a unified interface to multiple LLM providers:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import vertexai
from vertexai.generative_models import GenerativeModel

app = FastAPI(title="LLM Gateway")

class CompletionRequest(BaseModel):
    prompt: str
    model: str = "gemini-pro"
    max_tokens: int = 1024
    temperature: float = 0.7

class CompletionResponse(BaseModel):
    text: str
    model: str
    usage: dict

@app.post("/v1/completions", response_model=CompletionResponse)
async def create_completion(request: CompletionRequest):
    """Unified completion endpoint supporting multiple models."""

    vertexai.init(project="llm-gw-project", location="us-central1")
    model = GenerativeModel(request.model)

    response = model.generate_content(
        request.prompt,
        generation_config={
            "max_output_tokens": request.max_tokens,
            "temperature": request.temperature
        }
    )

    return CompletionResponse(
        text=response.text,
        model=request.model,
        usage={
            "prompt_tokens": response.usage_metadata.prompt_token_count,
            "completion_tokens": response.usage_metadata.candidates_token_count
        }
    )

@app.get("/health")
async def health_check():
    return {"status": "healthy"}

Deployment Structure

llm-landing/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   └── prod/
│       ├── main.tf
│       ├── variables.tf
│       └── terraform.tfvars
├── modules/
│   ├── network/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── psc/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── iam/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   └── compute/
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
├── backend.tf
└── README.md

Key Benefits

Security

  • Zero Public IPs: All connectivity through private channels
  • Network Isolation: Dedicated VPCs with controlled ingress/egress
  • IAP/SSM Access: Secure administrative access without bastions

Scalability

  • Multi-Region: Deploy gateways in multiple regions for latency
  • Auto-Scaling: Horizontal scaling based on request volume
  • Load Balancing: Distribute traffic across healthy instances

Governance

  • IaC: All infrastructure defined in Terraform
  • Version Control: Changes tracked and auditable
  • Policy as Code: Organization policies enforced automatically

Operational Excellence

  • Health Checks: Automatic detection of unhealthy instances
  • Observability: Integrated logging and monitoring
  • Blue-Green Deployments: Zero-downtime updates

Conclusion

The LLM Landing Zone pattern provides a production-ready foundation for enterprise GenAI deployments. By leveraging Private Service Connect (GCP) or PrivateLink (AWS), organizations can securely expose LLM capabilities to internal applications without compromising on security or compliance.

The modular Terraform structure enables teams to deploy consistently across environments while maintaining the flexibility to customize for specific requirements.

For production implementations, consider additional enhancements:

  • Rate limiting and quota management
  • Request/response logging for compliance
  • Model versioning and A/B testing
  • Cost attribution per consumer

The complete implementation is available on GitHub: gonnect-uk/llm-landing