k0rdent AI Docs

Standards

API design conventions, naming conventions, and response envelope standards

This document defines the foundational standards for all k0rdent APIs: request/response structure, naming conventions, field design, and operational patterns across Atlas, Arc, and shared services.

Draft: This documentation is currently a work in progress and subject to change.

Quick Navigation: - For complete endpoint implementations, see API Specifications - For code examples and patterns, see Data Ownership - For authentication and security, see Auth

Table of Contents


API Response Contract

All API responses use a consistent discriminated union envelope structure with a success boolean discriminator.

Type Definition

Base Meta Object

The meta object is always present and contains request tracking information:

Design Decision: requestId and timestamp are always included because they provide negligible overhead while being critical for distributed debugging and log correlation across services.

Extension Meta Types

Success responses can extend meta with additional context:

Error Response Structure

Response Examples

Meta Object Design Decisions

FieldDecisionRationale
requestIdAlways includeEssential for log correlation across services
timestampAlways includeNegligible overhead; critical for distributed debugging
apiVersionOmitURL path (/v1/) is the version; redundant in response
rateLimitAdd laterWhen rate limiting is implemented

Implementation Reference

Technology Stack

LayerTechnologyPurpose
FrameworkNext.js 16+ (App Router)Server components, API routes
API LayerHonoLightweight, type-safe API routes
ValidationZodRuntime validation, schema definitions
DatabaseDrizzle ORMType-safe queries, migrations
AuthCustom auth layerSession management, OAuth
WorkflowsWorkflow orchestrationDurable task execution

Zod Schema Definitions

Define all request/response schemas using Zod for validation and type inference:

Pattern: Workflows by Default. All operations that interact with infrastructure (BMC, K8s) are executed as durable workflows and return immediately with a workflow run ID. Clients poll workflow status via the Workflows API or receive webhooks.


Field Design Rules

Design fields for extension from day one. The cost of refactoring primitive fields into objects later is high and often requires breaking changes.

Principle: Objects Over Primitives. Always wrap values that might grow into structured objects. It's better to have nested objects early than to break APIs later when you need to add context.

Use Objects Over Primitives

Wrap values that might grow into objects immediately.

Never Use Booleans for State

States often grow beyond two values.

Use IDs with Optional Expansion

Don't embed full objects. Use IDs and let clients request expansion.


API Versioning

All API endpoints include version in the URL path: /v1/...

Domain-based routing separates Atlas and Arc APIs:

Version Format

Version Policy

  • Major versions (v1, v2) for breaking changes
  • No minor versions in URL - use feature flags and deprecation warnings instead
  • Deprecation timeline: 6 months notice before removing deprecated endpoints
  • Version in URL, not response: The apiVersion field is omitted from responses because the URL path is the source of truth

When deprecating fields or endpoints, include warnings in the meta.warnings array with sunset dates and migration guides.

Deprecation Example


Naming Conventions

Consistent naming across URLs, fields, and resources improves developer experience and reduces confusion.

URL Paths

Hono uses colon-prefixed route parameters.

RuleExample
Route params with colon/v1/clusters/:clusterId
Lowercase, hyphenated/v1/ai-services
Plural nouns for collections/servers, /clusters
Singular for singletons/me, /health

NOTE: Lowercase, hyphenated is only for url paths and not the same for the database, response, request body, and other contexts.

Resource IDs

Resources (clusters, servers, organizations, etc.) get globally unique, opaque IDs that do NOT contain region information. This decouples resource identity from physical location.

Format: {prefix}_{base62}

ResourcePrefixExample
Organizationorg_org_8TcVx2WkZddNmK3Pt9JwX7BzWrLM
Serversrv_srv_3KpQm9WnXccFjH2Ls8DkT6VzRqYU
Clustercls_cls_6NZtkvWLBbbmHfPi7L6oz7KZpqET
Stackstk_stk_5MfRp4WjYbbHmG8Nt2LvS9CxPqZK
Workflow Runrun_run_7NhTq6WlAbbKmF5Rt3MxU8DzSqWJ
Poolpool_pool_2LgPn8WmXccGjE7Mt4KwV9BySrTL
Allocationalloc_alloc_9QjSr3WnZddMmH6Pt5LxW2CzUrYK
API Keykey_key_4KfQm7WkYccJmG3Nt8MvX9BzSqWL
Eventevt_evt_6MgRp2WlXbbKmF9Rt5NxU3DzTqZJ

Special ID: org_system is reserved for platform-level admin operations. TBD if this is needed still. Originally it was for something else.

Field Names

RuleExample
camelCasecreatedAt, nodeCount
Suffix IDs with IdclusterId, organizationId
Use past tense for timestampscreatedAt, updatedAt, deletedAt

Pagination

Use cursor-based pagination for real-time data, offset/limit for stable datasets.

Query Parameters

Response


Filtering and Sorting

Query String Format

Use consistent query parameter patterns for filtering and sorting:

ParameterFormatExample
Filterfield=valuestatus=available
Multiple valuesfield=val1,val2status=available,provisioning
Sort ascendingsort=fieldsort=name
Sort descendingsort=-fieldsort=-createdAt
Multiple sortssort=field1,-field2sort=status,-createdAt

Implementation with Zod


Action Endpoints

For operations beyond CRUD, use a unified action endpoint with POST method. Actions represent commands that change resource state asynchronously.

Design Principles

  1. Unified Endpoint: Single /actions endpoint handles all action types (power, provision, deprovision, inspect, maintenance)
  2. Type-Safe Parameters: Each action type has its own request schema with action-specific options
  3. Async by Default: Actions return 202 Accepted with workflow/operation IDs for tracking
  4. Audit Trail: Logs show "POST /actions with type=power action=off" for clear tracking
  5. Granular Permissions: Easy to scope permissions like servers:lifecycle vs servers:update

Endpoint Pattern

Action Request Schema

Implementation Example


Bulk Operations

Bulk operations allow applying actions to multiple resources simultaneously. All bulk actions use partial success semantics - individual resource failures do not fail the entire bulk operation.

Design Principles

  1. Partial Success: Individual failures don't abort the entire bulk operation
  2. Explicit IDs: Use explicit ID lists for predictability and safety
  3. Per-Resource Results: Response includes success/failure status for each resource
  4. 207 Multi-Status: Always return 207 to indicate mixed results possible
  5. Dedicated Endpoints: Use /bulk pattern for consistency

Endpoint Pattern

The action type is specified in the request body, making the API flexible and maintainable.

Request Schema

Response Structure

Implementation Example

Safety Features

Dry-Run Mode

Preview which resources would be affected without executing:

Response:

Rate Limiting

Bulk operations are throttled to prevent resource overload. Default: 10 requests/min.


Workflow Operations

Long-running operations that interact with infrastructure (BMC, Kubernetes) return 202 Accepted immediately with a workflow ID for tracking. All infrastructure mutations flow through durable workflows.

Design Principles

  1. Immediate Response: Return 202 within < 1 second, don't wait for completion
  2. Workflow ID: Provide workflow run ID for polling or webhook correlation
  3. Estimated Duration: Give clients a hint for progress UI
  4. Status Endpoint: Query workflow status via /v1/workflows/runs/:id
  5. Webhook Integration: Support webhooks for completion notifications

Workflow Orchestration

Use a durable workflow engine for retryable task execution:

Pattern: Compensating Actions. Use onFailure to clean up partial state. Release allocated resources, update status to error, notify via webhook.

Workflow Status Endpoint

Client Integration Pattern


Implementation Reference


Error Handling

Typed Error Classes

Define semantic error types for consistent error responses:

Global Error Handler

Usage in Routes


Audit Logging

SOC 2 compliant audit logging for all API requests. Every significant action must be traceable to a user and timestamp.

What to Log

Event TypeLog?Rationale
All mutations (POST/PUT/PATCH/DELETE)✅ AlwaysCore audit trail
Failed authentication (401)✅ AlwaysSecurity monitoring
Failed authorization (403)✅ AlwaysAccess control audit
Server errors (5xx)✅ AlwaysIncident response
Reads on sensitive resources✅ AlwaysCompliance (see below)
General reads (GET)⚠️ OptionalHigh volume; enable for debugging
Health/metrics endpoints❌ NeverNoise

For multi-tenant security architecture and authorization patterns, see Auth Architecture.

Sensitive Entities Requiring Audit Logs

These entities require audit logging on all operations, including reads:

EntityWhy SensitiveExample Events
API KeysCredential accessapi_key.created, api_key.viewed, api_key.revoked
BMC CredentialsInfrastructure accessbmc_credential.created, bmc_credential.accessed
Cluster CredentialsKubeconfig accesscluster_credential.downloaded
SSH KeysServer accessssh_key.created, ssh_key.deleted
SecretsUser-managed secretssecret.created, secret.accessed, secret.deleted
Organization MembersAccess controlmember.invited, member.role_changed, member.removed
Billing/PaymentFinancial datapayment_method.added, invoice.viewed

Audit Event Schema

Audit Event Naming Convention

Use past-tense, dot-namespaced actions:


Multi-Tenancy Patterns

Row-Level Security (RLS)

Use PostgreSQL RLS for defense-in-depth isolation:

Setting Context Per Request

Critical: RLS context is set per-transaction. For connection pooling, always set context at the start of each request. Drizzle's transaction() helper ensures this.


Decision Log

Response Envelope Pattern

DecisionRationaleTrade-off
Discriminated union with success: boolean over separate success/error types• TypeScript discriminated unions provide excellent type narrowing
• Client code: if (response.success) gets correct types
• Consistent structure across all endpoints
• Easier to generate TypeScript clients
Slightly more verbose than HTTP-only error signaling.
Type safety worth it.

Resource ID Format

DecisionRationaleTrade-off
Prefixed nanoid (srv_abc123, cls_xyz789) over UUIDs or numeric IDs• Human-readable in logs
• Immediately identify resource type
• URL-safe
• Short enough for display
• Low collision probability
Slightly longer than pure nanoid.
Worth it for debugging and log correlation.

Action Endpoints

DecisionRationaleTrade-off
Dedicated POST endpoints (/power, /provision) over overloading PATCH• Semantic clarity: POST /power action=reboot clearer than PATCH { online: true }
• Action-specific parameters (e.g., force, imageUrl)
• Better audit trail: "POST /power action=off" vs "PATCH with field changes"
• Granular permissions: servers:lifecycle vs servers:update
Slightly more endpoints.
Worth it for clarity and permissions.

Bulk Operation Responses

DecisionRationaleTrade-off
Always 207 Multi-Status with per-resource results (not 200 OK with mixed results or fail-entire-operation)• Partial success is common in bulk operations
• Client needs to know which specific resources succeeded/failed
• Failing entire operation for one resource is poor UX
• 207 status code semantically correct for mixed outcomes
None significant.
Standard practice for bulk operations.

Async Operation Default

DecisionRationaleTrade-off
Return 202 immediately (not synchronous with long timeouts)• Infrastructure operations take 30s to 30min
• Prevents HTTP timeouts and connection issues
• Allows UI to show progress
• Supports horizontal scaling (request and execution on different instances)
• Better observability via workflow tracking
Requires more client code.
Mitigated by SDKs and clear polling patterns.

Testing

Test Structure

All API endpoints should have integration tests covering:

  1. Happy path: Successful requests with expected responses
  2. Validation: Invalid inputs return appropriate errors
  3. Authorization: Unauthorized users receive 403
  4. State transitions: Invalid state transitions are rejected
  5. Edge cases: Empty lists, missing resources, etc.

Test Helpers

Example Test


  • Specification - Complete API specification with detailed endpoint examples
  • Auth Architecture - Authentication, authorization, and multi-tenant security patterns
  • Data Ownership - Implementation patterns, workflow orchestration, and development guidance
  • Data Model - Database schema and relationships

Last updated on

How is this guide?

On this page