LLM Landing Zone: Enterprise-Grade GenAI Infrastructure Pattern
A comprehensive guide to building secure, scalable LLM Gateway infrastructure on GCP and AWS using Private Service Connect, VPC isolation, and Terraform-based IaC for enterprise GenAI deployments.
Table of Contents
Introduction
As enterprises race to adopt Large Language Models, the gap between proof-of-concept and production becomes glaringly apparent. While spinning up an LLM API call is trivial, deploying GenAI at enterprise scale demands rigorous infrastructure patterns—network isolation, zero-trust security, cross-project connectivity, and infrastructure-as-code governance.
The LLM Landing Zone pattern addresses this gap head-on, providing a battle-tested architecture for deploying LLM Gateways that meet enterprise security, compliance, and scalability requirements.
This article presents a deep-dive into the architecture, covering both GCP and AWS implementations, with production-ready Terraform configurations.
The Problem: GenAI in the Enterprise
Enterprise GenAI deployments face challenges that don't exist in sandbox environments:
Network Security
- LLM endpoints must not be exposed to public internet
- Traffic must flow through private connectivity
- Cross-project/account communication needs secure bridges
Multi-Tenancy
- Multiple consumer applications need isolated access
- Quota management and rate limiting per tenant
- Audit trails for compliance
Governance
- Infrastructure changes must be versioned and auditable
- Least-privilege IAM across project boundaries
- Encryption at rest and in transit
Operational Excellence
- Health checks and auto-recovery
- Observability and tracing
- Blue-green deployment capabilities
Architecture Overview
The LLM Landing Zone implements a hub-and-spoke model with three core components:
┌──────────────────────────────────────────────────────────┐
│ LLM LANDING ZONE │
├──────────────────────────────────────────────────────────┤
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ BASE HUB │ │ PRODUCER │ │ CONSUMER │ │
│ │ │ │ PROJECT │ │ PROJECT │ │
│ │ • TF State │ │ │ │ │ │
│ │ • Modules │ │ • LLM GW │ │ • Gen App │ │
│ │ • Policies │ │ • ILB │ │ • PSC EP │ │
│ │ │ │ • PSC Att │ │ │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │ │ │
│ └────────────────┘ │
│ Private Service Connect │
│ │
└──────────────────────────────────────────────────────────┘
Base Landing Zone (Hub)
The hub project serves as the control plane:
- Terraform State Management: GCS bucket with versioning for state files
- Modular Infrastructure Code: Reusable modules for network, IAM, PSC
- Policy Definitions: Organization-wide security policies
Producer Project (LLM Gateway)
Hosts the LLM Gateway service with:
- Custom VPC: Isolated network (
10.0.0.0/24) for LLM workloads - PSC Subnet: Dedicated subnet (
10.0.1.0/24) for Private Service Connect - Internal Load Balancer: All-ports enabled for flexible routing
- PSC Service Attachment: Proxy Protocol enabled for client identification
Consumer Project (Gen Applications)
Consumes the LLM Gateway via:
- PSC Endpoint: Private connectivity to producer services
- No Public IPs: All traffic flows through private channels
- IAP Access: Secure administrative access via Identity-Aware Proxy
GCP Implementation Deep Dive
Network Architecture
# VPC with custom subnets
resource "google_compute_network" "llm_network" {
name = "llm-network"
auto_create_subnetworks = false
project = var.producer_project_id
}
# Main subnet for LLM Gateway
resource "google_compute_subnetwork" "main" {
name = "llm-main-subnet"
ip_cidr_range = "10.0.0.0/24"
network = google_compute_network.llm_network.id
region = var.region
private_ip_google_access = true
}
# PSC subnet for service attachment
resource "google_compute_subnetwork" "psc" {
name = "llm-psc-subnet"
ip_cidr_range = "10.0.1.0/24"
network = google_compute_network.llm_network.id
region = var.region
purpose = "PRIVATE_SERVICE_CONNECT"
}
Firewall Rules
Security is enforced through network tags and granular firewall rules:
# Internal traffic within VPC
resource "google_compute_firewall" "internal" {
name = "allow-internal"
network = google_compute_network.llm_network.name
allow {
protocol = "tcp"
ports = ["0-65535"]
}
allow {
protocol = "udp"
ports = ["0-65535"]
}
allow {
protocol = "icmp"
}
source_tags = ["llm-internal"]
target_tags = ["llm-internal"]
}
# Health checks from GCP load balancers
resource "google_compute_firewall" "health_check" {
name = "allow-health-check"
network = google_compute_network.llm_network.name
allow {
protocol = "tcp"
ports = ["80", "443", "8080"]
}
source_ranges = ["130.211.0.0/22", "35.191.0.0/16"]
target_tags = ["llm-gateway"]
}
# IAP for secure SSH access
resource "google_compute_firewall" "iap" {
name = "allow-iap"
network = google_compute_network.llm_network.name
allow {
protocol = "tcp"
ports = ["22"]
}
source_ranges = ["35.235.240.0/20"] # IAP IP range
target_tags = ["allow-iap"]
}
Private Service Connect
PSC enables secure cross-project connectivity:
# Internal Load Balancer
resource "google_compute_forwarding_rule" "llm_ilb" {
name = "llm-gateway-ilb"
region = var.region
load_balancing_scheme = "INTERNAL"
backend_service = google_compute_region_backend_service.llm.id
all_ports = true
network = google_compute_network.llm_network.id
subnetwork = google_compute_subnetwork.main.id
}
# PSC Service Attachment
resource "google_compute_service_attachment" "llm_psc" {
name = "llm-gateway-psc"
region = var.region
connection_preference = "ACCEPT_AUTOMATIC"
nat_subnets = [google_compute_subnetwork.psc.id]
target_service = google_compute_forwarding_rule.llm_ilb.id
enable_proxy_protocol = true
consumer_accept_lists {
project_id_or_num = var.consumer_project_id
connection_limit = 10
}
}
IAM Configuration
Least-privilege access across project boundaries:
# OS Login for VM access
resource "google_project_iam_member" "os_login" {
project = var.producer_project_id
role = "roles/compute.osLogin"
member = "user:${var.admin_email}"
}
# IAP tunnel access
resource "google_project_iam_member" "iap_tunnel" {
project = var.producer_project_id
role = "roles/iap.tunnelResourceAccessor"
member = "user:${var.admin_email}"
}
# Service account for LLM Gateway
resource "google_service_account" "llm_gateway" {
account_id = "llm-gateway-sa"
display_name = "LLM Gateway Service Account"
project = var.producer_project_id
}
# Vertex AI access for the gateway
resource "google_project_iam_member" "vertex_ai" {
project = var.producer_project_id
role = "roles/aiplatform.user"
member = "serviceAccount:${google_service_account.llm_gateway.email}"
}
AWS Implementation
The same pattern translates to AWS using PrivateLink:
┌──────────────────────────────────────────────────────────┐
│ AWS LLM LANDING ZONE │
├──────────────────────────────────────────────────────────┤
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ MANAGEMENT │ │ PRODUCER │ │ CONSUMER │ │
│ │ ACCOUNT │ │ ACCOUNT │ │ ACCOUNT │ │
│ │ │ │ │ │ │ │
│ │ • S3 State │ │ • LLM GW │ │ • Gen App │ │
│ │ • Modules │ │ • NLB │ │ • VPC EP │ │
│ │ • SCPs │ │ • VPC EP │ │ │ │
│ │ │ │ │ │ │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │ │ │
│ └────────────────┘ │
│ AWS PrivateLink │
│ │
└──────────────────────────────────────────────────────────┘
VPC Configuration
# Producer VPC
resource "aws_vpc" "llm_producer" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "llm-producer-vpc"
}
}
# Private subnets for LLM Gateway
resource "aws_subnet" "llm_private" {
count = 3
vpc_id = aws_vpc.llm_producer.id
cidr_block = "10.0.${count.index + 1}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "llm-private-${count.index + 1}"
}
}
PrivateLink Service
# Network Load Balancer
resource "aws_lb" "llm_nlb" {
name = "llm-gateway-nlb"
internal = true
load_balancer_type = "network"
subnets = aws_subnet.llm_private[*].id
enable_cross_zone_load_balancing = true
}
# VPC Endpoint Service
resource "aws_vpc_endpoint_service" "llm" {
acceptance_required = true
network_load_balancer_arns = [aws_lb.llm_nlb.arn]
allowed_principals = [
"arn:aws:iam::${var.consumer_account_id}:root"
]
tags = {
Name = "llm-gateway-endpoint-service"
}
}
# Consumer VPC Endpoint
resource "aws_vpc_endpoint" "llm_consumer" {
provider = aws.consumer
vpc_id = aws_vpc.consumer.id
service_name = aws_vpc_endpoint_service.llm.service_name
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.consumer_private[*].id
security_group_ids = [aws_security_group.llm_endpoint.id]
private_dns_enabled = true
}
Security Groups
# LLM Gateway Security Group
resource "aws_security_group" "llm_gateway" {
name = "llm-gateway-sg"
description = "Security group for LLM Gateway"
vpc_id = aws_vpc.llm_producer.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"] # Internal only
}
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# VPC Endpoint Security Group
resource "aws_security_group" "llm_endpoint" {
name = "llm-endpoint-sg"
description = "Security group for LLM VPC Endpoint"
vpc_id = aws_vpc.consumer.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [aws_vpc.consumer.cidr_block]
}
}
IAM Roles
# LLM Gateway Role
resource "aws_iam_role" "llm_gateway" {
name = "llm-gateway-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
})
}
# Bedrock Access Policy
resource "aws_iam_role_policy" "bedrock_access" {
name = "bedrock-access"
role = aws_iam_role.llm_gateway.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
]
Resource = "*"
}]
})
}
LLM Gateway Service
The gateway service provides a unified interface to multiple LLM providers:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import vertexai
from vertexai.generative_models import GenerativeModel
app = FastAPI(title="LLM Gateway")
class CompletionRequest(BaseModel):
prompt: str
model: str = "gemini-pro"
max_tokens: int = 1024
temperature: float = 0.7
class CompletionResponse(BaseModel):
text: str
model: str
usage: dict
@app.post("/v1/completions", response_model=CompletionResponse)
async def create_completion(request: CompletionRequest):
"""Unified completion endpoint supporting multiple models."""
vertexai.init(project="llm-gw-project", location="us-central1")
model = GenerativeModel(request.model)
response = model.generate_content(
request.prompt,
generation_config={
"max_output_tokens": request.max_tokens,
"temperature": request.temperature
}
)
return CompletionResponse(
text=response.text,
model=request.model,
usage={
"prompt_tokens": response.usage_metadata.prompt_token_count,
"completion_tokens": response.usage_metadata.candidates_token_count
}
)
@app.get("/health")
async def health_check():
return {"status": "healthy"}
Deployment Structure
llm-landing/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── terraform.tfvars
│ └── prod/
│ ├── main.tf
│ ├── variables.tf
│ └── terraform.tfvars
├── modules/
│ ├── network/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── psc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── iam/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ └── compute/
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
├── backend.tf
└── README.md
Key Benefits
Security
- Zero Public IPs: All connectivity through private channels
- Network Isolation: Dedicated VPCs with controlled ingress/egress
- IAP/SSM Access: Secure administrative access without bastions
Scalability
- Multi-Region: Deploy gateways in multiple regions for latency
- Auto-Scaling: Horizontal scaling based on request volume
- Load Balancing: Distribute traffic across healthy instances
Governance
- IaC: All infrastructure defined in Terraform
- Version Control: Changes tracked and auditable
- Policy as Code: Organization policies enforced automatically
Operational Excellence
- Health Checks: Automatic detection of unhealthy instances
- Observability: Integrated logging and monitoring
- Blue-Green Deployments: Zero-downtime updates
Conclusion
The LLM Landing Zone pattern provides a production-ready foundation for enterprise GenAI deployments. By leveraging Private Service Connect (GCP) or PrivateLink (AWS), organizations can securely expose LLM capabilities to internal applications without compromising on security or compliance.
The modular Terraform structure enables teams to deploy consistently across environments while maintaining the flexibility to customize for specific requirements.
For production implementations, consider additional enhancements:
- Rate limiting and quota management
- Request/response logging for compliance
- Model versioning and A/B testing
- Cost attribution per consumer
The complete implementation is available on GitHub: gonnect-uk/llm-landing