Chakravyuh: Graph-Based ML Feature Store with Versioning

The Problem: Why ML Feature Management is Hard

Machine learning teams face a coordination nightmare. As models proliferate across an organization, the artifacts that feed them grow exponentially. The challenges compound:

Feature Duplication — Different teams create the same features independently, leading to inconsistent definitions and wasted compute
Version Chaos — Which version of feature X was used to train model Y? Nobody knows, and reproducing results becomes impossible
Lineage Blindness — When a data source changes, which models are affected? Teams discover issues only when production breaks
Hyperparameter Amnesia — That high-performing model from three months ago? The exact configuration is lost in someone's notebook
Platform Lock-in — Each ML platform (SageMaker, Kubeflow, Vertex AI) has its own way of tracking artifacts, fragmenting institutional knowledge

The root cause is simple: ML artifacts exist in a web of relationships, but most systems treat them as isolated files. You need a system that understands connections, not just storage.

The Solution: A Social Network for ML Artifacts

Chakravyuh approaches ML engineering from a different angle. Rather than storing feature values or model binaries, it tracks the relationships between them. Think of it as LinkedIn for your ML artifacts—a platform where features, datasets, models, and hyperparameters maintain their professional network.

What Chakravyuh Actually Tracks

1

Features

Definitions, versions, compositions via group/set theory

→

2

Datasets

Pointers, versions, and transformation lineage

→

3

Models

Training records, discovery, ranking, registration

→

4

Hyperparameters

Configuration tracking across training runs

→

5

Execution Runs

Complete provenance of what ran when

What Chakravyuh Is NOT

This distinction matters. Chakravyuh does not replace your existing infrastructure:

Not a feature value store — Use Redis, Feast, or your platform's native solution for that
Not a dataset warehouse — S3, GCS, or your data lake handles actual storage
Not a hyperparameter tuning framework — Optuna, Ray Tune, or SageMaker Hyperparameter Tuning does the optimization
Not a model registry — MLflow, SageMaker Model Registry, or Vertex AI stores the actual model binaries

Instead, it sits above all these systems, maintaining the metadata graph that connects everything together. Platform-agnostic by design.

Why Graph? The Natural Shape of ML Lineage

Consider a typical ML lineage question: "Which production models will be affected if we change the customer_lifetime_value feature?" In a traditional relational database, answering this requires multiple joins across features, datasets, training runs, and deployed models. The query complexity grows exponentially with relationship depth.

In a graph database like Neo4j, this becomes a simple traversal:

Lineage Query in Cypher

// Find all models affected by a feature change
MATCH (f:Feature {name: 'customer_lifetime_value'})
      -[:USED_IN]->(d:Dataset)
      -[:TRAINED]->(m:Model)
      -[:DEPLOYED_TO]->(env:Environment {type: 'production'})
RETURN m.name, m.version, env.name

Feature Composition with Set Theory

Features rarely exist in isolation. A "premium_customer" feature might combine purchase_frequency, average_order_value, and customer_tenure. Chakravyuh models these compositions using group and set theory:

Feature Group Definition

// Define a composite feature group
MATCH (f1:Feature {name: 'purchase_frequency'})
MATCH (f2:Feature {name: 'average_order_value'})
MATCH (f3:Feature {name: 'customer_tenure'})
CREATE (fg:FeatureGroup {name: 'premium_customer_signals'})
CREATE (fg)-[:CONTAINS]->(f1)
CREATE (fg)-[:CONTAINS]->(f2)
CREATE (fg)-[:CONTAINS]->(f3)
CREATE (fg)-[:DERIVED_BY]->(transform:Transformation {
    logic: 'weighted_combination',
    weights: [0.3, 0.5, 0.2]
})

When any constituent feature changes, the graph immediately reveals which composite features and downstream models need attention.

Domain Model: The Core Entities

The domain model captures the essential relationships in ML engineering workflows. Each entity type has specific attributes and relationships:

Entity	Key Attributes	Primary Relationships
Feature	name, version, dataType, description, owner	BELONGS_TO FeatureGroup, USED_IN Dataset
FeatureGroup	name, composition_logic, created_at	CONTAINS Features, DERIVED_BY Transformation
Dataset	name, version, location_uri, schema	USES Features, PRODUCED_BY Pipeline
Model	name, version, algorithm, metrics	TRAINED_ON Dataset, CONFIGURED_WITH Hyperparameters
Hyperparameters	config_map, search_space, tuning_method	APPLIED_TO TrainingRun, OPTIMIZED_BY Experiment
TrainingRun	run_id, start_time, duration, status	PRODUCED Model, USED Dataset, APPLIED Hyperparameters

Java Domain Entity Example

@Node
public class Feature {
    @Id @GeneratedValue
    private Long id;

    private String name;
    private String version;
    private String dataType;
    private String description;
    private LocalDateTime createdAt;

    @Relationship(type = "BELONGS_TO", direction = OUTGOING)
    private Set groups;

    @Relationship(type = "USED_IN", direction = OUTGOING)
    private Set datasets;

    @Relationship(type = "PREVIOUS_VERSION", direction = OUTGOING)
    private Feature previousVersion;
}

API Architecture: Three Pillars of ML Engineering

Chakravyuh organizes its RESTful APIs into three functional categories, each addressing a distinct phase of the ML lifecycle:

ML Engineering APIs

Feature CRUD, dataset registration, version management, lineage queries. The daily operations of building ML systems.

Collaboration APIs

Model discovery, feature search, team ownership, access control. Enabling cross-team reuse and knowledge sharing.

Deployment APIs

Model registration, serving configuration, OAS 3.0 spec generation. Bridging training and production.

Feature Management Endpoints

Feature API Examples

// Register a new feature
POST /api/v1/features
{
    "name": "customer_churn_score",
    "dataType": "FLOAT",
    "description": "Probability of customer churning in next 30 days",
    "sourceUri": "s3://features/churn/v1",
    "owner": "risk-team"
}

// Get feature lineage
GET /api/v1/features/{featureId}/lineage?depth=3

// Find features by pattern
GET /api/v1/features/search?query=customer*&owner=risk-team

// Create new version
POST /api/v1/features/{featureId}/versions
{
    "changes": "Added recency weighting",
    "sourceUri": "s3://features/churn/v2"
}

Model Discovery and Ranking

Model Discovery API

// Search models by capability
GET /api/v1/models/discover?task=classification&domain=fraud

// Response includes ranking by metrics
{
    "models": [
        {
            "name": "fraud_detector_v3",
            "metrics": {"auc": 0.94, "precision": 0.87},
            "rank": 1,
            "features_used": ["transaction_velocity", "merchant_risk_score"],
            "last_trained": "2024-01-15T10:30:00Z"
        }
    ]
}

Automatic OAS 3.0 Generation

When a model is ready for serving, Chakravyuh can generate an OpenAPI 3.0 specification based on its input features and output schema:

Generate Serving Spec

// Generate OAS 3.0 spec for model serving
POST /api/v1/models/{modelId}/generate-oas

// Returns complete OpenAPI specification
{
    "openapi": "3.0.0",
    "info": {
        "title": "Fraud Detector API",
        "version": "3.0.0"
    },
    "paths": {
        "/predict": {
            "post": {
                "requestBody": {
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/FraudPredictionRequest"
                            }
                        }
                    }
                }
            }
        }
    }
}

Hyperparameter Tracking: Beyond Configuration Files

Every ML practitioner has lost track of the exact hyperparameters that produced a winning model. Chakravyuh treats hyperparameter configurations as first-class citizens in the graph:

Hyperparameter Tracking

// Record hyperparameters for a training run
POST /api/v1/training-runs
{
    "modelId": "fraud_detector",
    "datasetId": "transactions_202401",
    "hyperparameters": {
        "learning_rate": 0.001,
        "batch_size": 256,
        "epochs": 50,
        "dropout": 0.3,
        "optimizer": "adam",
        "early_stopping_patience": 5
    },
    "search_space": {
        "learning_rate": {"type": "log_uniform", "min": 0.0001, "max": 0.1},
        "dropout": {"type": "uniform", "min": 0.1, "max": 0.5}
    },
    "tuning_method": "bayesian_optimization"
}

The graph structure enables powerful queries:

Hyperparameter Analysis Queries

// Find best hyperparameters for a model across all runs
GET /api/v1/models/{modelId}/best-hyperparameters?metric=auc

// Compare hyperparameters between model versions
GET /api/v1/models/{modelId}/versions/diff?v1=2&v2=3

// Find runs with similar configurations
GET /api/v1/training-runs/similar?config={"learning_rate": 0.001}

System Architecture

Layer	Technologies	Purpose
Runtime	Java 14	Type safety, performance, enterprise compatibility
Framework	Spring Boot 2.3.3	Dependency injection, configuration, REST support
Database	Neo4j	Native graph storage, Cypher queries, ACID compliance
Data Access	Spring Data Neo4j	Repository pattern, object-graph mapping
API Layer	Spring Web, OpenAPI	RESTful endpoints, documentation generation

Integration Patterns

Chakravyuh integrates with existing ML infrastructure through lightweight adapters:

Platform Integration Example

// SageMaker integration - sync training job metadata
@Service
public class SageMakerAdapter {

    @Autowired
    private TrainingRunRepository trainingRunRepository;

    public void syncTrainingJob(String sageMakerJobArn) {
        // Fetch job details from SageMaker
        DescribeTrainingJobResult job = sageMaker.describeTrainingJob(
            new DescribeTrainingJobRequest().withTrainingJobName(jobArn)
        );

        // Create graph node with relationships
        TrainingRun run = TrainingRun.builder()
            .externalId(sageMakerJobArn)
            .platform("SAGEMAKER")
            .hyperparameters(job.getHyperParameters())
            .metrics(job.getFinalMetricDataList())
            .build();

        trainingRunRepository.save(run);
    }
}

Real-World Impact

Organizations using graph-based ML metadata management report significant improvements:

60% Reduction in Feature Duplication

5x Faster Impact Analysis

100% Reproducibility of Training Runs

Where This Matters Most

Financial Services

Model governance, regulatory compliance, audit trails for credit decisions and fraud detection systems.

E-commerce

Recommendation engine features, A/B test tracking, personalization model lineage across customer touchpoints.

Healthcare

Clinical ML models require rigorous provenance tracking for FDA compliance and patient safety.

Multi-Cloud Enterprises

Unified metadata layer when ML workloads span AWS, GCP, and Azure with different native tooling.

Getting Started

Chakravyuh runs as a standalone service alongside your existing ML infrastructure:

Quick Start

# Clone the repository
git clone https://github.com/mgorav/Chakravyuh.git
cd Chakravyuh

# Start Neo4j (Docker)
docker run -d --name neo4j \
    -p 7474:7474 -p 7687:7687 \
    -e NEO4J_AUTH=neo4j/password \
    neo4j:latest

# Configure and run
./mvnw spring-boot:run -Dspring.profiles.active=dev

# API available at http://localhost:8080/api/v1

Explore the Code

The complete implementation is available on GitHub with documentation, domain models, and API specifications.

View on GitHub