k0rdent AI Docs

API Overview

Complete OpenAPI documentation for k0rdent AI Atlas, Arc, and shared services APIs

Overview

Draft: This documentation is currently a work in progress and subject to change.

This document provides an overview of the k0rdent API shape, covering Atlas (Provider Console), Arc (Customer AI Console), and shared services like Auth.


API Docs Variants

We're experimenting with different ways to present the API. Choose the format that works best for you.


Welcome to the k0rdent AI documentation

k0rdent AI is a Kubernetes-native platform for managing AI infrastructure at scale. Built for Neoclouds and enterprises, it provides a composable, template-driven approach to deploying and operating Kubernetes clusters and AI services across distributed environments.

What k0rdent AI solves

Modern AI infrastructure faces three core challenges:

Tech stack complexity: Divergent tooling, APIs, and security controls across providers and regions make it hard to standardize operations as hardware and platforms proliferate.

Operational efficiency: Without centralized visibility, cluster health, state, cost, and risk drift across environments, creating instability and compliance gaps.

Customer experience: Slow time-to-value onboarding new GPU capacity and services, difficulty maximizing utilization while meeting strict tenant requirements, and hard multi-tenancy raise the bar for reliability and isolation.

k0rdent AI addresses these challenges through Kubernetes-native composability and multi-cluster automation. Infrastructure and services are defined as templates and reconciled continuously into running environments, enabling repeatable rollout across many clusters and sites with controlled drift and upgrades.

Key capabilities

Composable infrastructure: Define infrastructure and services as templates, reconcile them across clusters and sites GPU efficiency: GPU partitioning, topology-aware scheduling, and allocation strategies to improve utilization Hardware-enforced multi-tenancy: Tenant isolation across GPU allocation, virtualization, and network boundaries Integrated observability and FinOps: Centralized monitoring and cost reporting for operational control and tenant-level consumption reporting Workload convergence: Support for VM and container workloads under one operational model

Platform components

k0rdent AI consists of three main components:

k0rdent AI Cluster Manager (KCM): Deployment and lifecycle management of Kubernetes clusters, including configuration, updates, and other CRUD operations.

k0rdent AI State Manager (KSM): Installation and lifecycle management of deployed services. KSM leverages Project Sveltos for an increasing amount of functionality.

k0rdent AI Observability and FinOps (KOF): Cluster and beach-head services monitoring, events and log management, with integrated cost reporting and tenant-level consumption tracking.

API Endpoint Reference

All endpoints are served from https://api.k0rdent.ai. Visibility is controlled per-endpoint via the x-visibility extension: internal (provider/Atlas operations only) or public (customer/Arc-facing).

Compute

Region-scoped compute resources under /v1/regions/{region}/projects/{project}/compute/.

Infrastructure

Provider-only (internal visibility) infrastructure resources.

EndpointPurpose
/v1/regions/{region}/projects/{project}/infrastructure/serversBare metal server lifecycle — enrollment, provisioning, state transitions

Organizations & Projects

Global resources for tenant and project management.

EndpointPurpose
/v1/regions/global/organizationsOrganization (tenant) management
/v1/regions/global/organizations/{id}/invitationsOrganization invitation management
/v1/regions/global/projectsProject resource grouping

Authentication

Token lifecycle and permission evaluation under /v1/regions/global/auth/.

EndpointPurpose
/v1/regions/global/auth/tokenMint access tokens (authorization_code, api_key, client_credentials)
/v1/regions/global/auth/introspectToken introspection (RFC 7662) — validate and decode token claims
/v1/regions/global/auth/checkEvaluate permissions for a principal against actions and resources
/v1/regions/global/auth/revokeToken revocation (RFC 7009)

IAM

Identity and access management resources under /v1/regions/global/iam/.

EndpointPurpose
/v1/regions/global/iam/usersUser management and profile
/v1/regions/global/iam/groupsUser groups for team-based access control
/v1/regions/global/iam/rolesRBAC role definitions
/v1/regions/global/iam/policiesIAM policy bindings
/v1/regions/global/iam/apikeysAPI keys for programmatic access
/v1/regions/global/iam/service-accountsMachine-to-machine service accounts and credentials
/v1/regions/global/iam/providersIdentity provider (OAuth/OIDC) configuration

Future / TBD

The following endpoints are under discussion and have not been implemented yet.

EndpointPurpose
/v1/regions/global/auditAudit logs
/v1/regions/global/billingBilling and consumption
/v1/regions/global/analyticsAnalytics
/v1/regions/global/webhooksWebhook subscriptions
/v1/regions/{region}/projects/{project}/compute/instances (VMs)Virtual machine lifecycle (in progress)
/v1/regions/{region}/inferenceInference endpoint lifecycle
/v1/regions/{region}/trainingTraining job lifecycle

API Changelog

For a full history of API endpoint changes and iterations, see the Changelog.


Last updated on

How is this guide?

On this page