routing layer for AI pipelines

Every prompt pays
a routing tax.

Most AI products spend tokens deciding what each message needs — before
the first token of actual work. ConductorOS eliminates that cost entirely.

workflow.yaml
workflow:
  name: route-on-complexity
  routing:
    type: declarative    route:
      # Zero token overhead — this is metadata
      simple:   SLM   # < 128 tokens, factual
      reasoning: LLM  # logic, code, multi-step
      creative:  LLM  # writing, ideation
0 token cost for routing
1ms route decision latency
99% cost reduction vs frontier-only
The Problem

You’re paying twice
for every AI call.

Token waste on routing

Dynamic routers run a classifier on every prompt. That classifier itself calls a model — costing tokens before the first real answer is generated.

Opaque workflows

You can’t see where a message goes until after it goes there. Debugging means watching logs and guessing, not reading a workflow definition.

No version control

When routing logic lives in code, it changes silently between deploys. Rollbacks mean git archaeology and fragile migration scripts.

Scaling into debt

More AI usage means more routing decisions. At scale, the overhead becomes the dominant cost line — invisible, compounding, never audited.

How It Works

Route once. Define forever.

ConductorOS uses declarative routing — the workflow topology is fixed at definition time, not discovered at runtime. No LLM in the orchestration loop. No tokens spent deciding what runs next.

01

Declare your routing map

Write a YAML file that defines your agents, their models, inputs, outputs, and routing logic. It’s a config file, not code. Diff it in a pull request. Version it with your codebase.

agents:
  - id: triage
    model: arcee-blitz   # fast, cheap
    prompt: Classify complexity
  - id: deep-think
    model: virtuoso-large
    prompt: Reason through this

routing:
  rules:
    - condition: "input.tokens < 64"
      target: triage
    - condition: "input.type == 'reasoning'"
      target: deep-think
02

Inspect before you run

See the full routing graph before anything executes. Every decision path is visible in advance — which model handles which input, how conditions branch, where human gates sit.

input
triage
arcee-blitz
output
<64 tokens
else
deep-think
virtuoso-large
03

Execute with zero routing overhead

The orchestration layer consumes zero tokens. Your prompts go directly to the right model, every time. Scale to billions of executions without watching your routing bill grow faster than your inference bill.

87%
of queries route to SLMs, not LLMs
75%
lower cost vs single-model setup
Architecture
API
Your App
CLI
SDK
ConductorOS
Routing Engine
zero token overhead
Models
SLM
fast · cheap
LLM
capable · precise
Reasoner
chain-of-thought

Built for production.
Visible before deployment.

ConductorOS separates orchestration from execution. The routing graph is a JSON/YAML file — deterministic, auditable, source-controlled. Your workers are plain code in any language. No SDK ritual, no determinism constraints.

  • Source-controlled workflows you can diff and revert
  • Human oversight as a first-class workflow step
  • Retry logic, timeouts, compensating tasks
  • OpenAI-compatible API — drop in anywhere
Features

Everything you need.
Nothing you don’t.

Declarative Routing

YAML-defined workflow topology. Fixed at definition, not discovered at runtime. Route decisions cost zero tokens.

Sub-second Latency

Routing decisions resolve in under 1ms. No classifier model to invoke. Your latency budget stays intact.

Multi-Provider Support

OpenAI-compatible endpoint. Swap providers without rewriting workflows. Per-agent model overrides.

Human-in-the-Loop

Approval gates and review checkpoints built into the workflow graph. Not bolted on later — part of the declaration.

Durable Execution

Every task executes to completion, or every failure is explicitly handled. State persists. Restarts pick up where you left off.

Any Language Workers

Workers in Python, Go, JavaScript, Java, C#, Ruby, or Rust. No framework rules. Plain code, any library.

Routing is infrastructure, not intelligence. The moment your orchestration layer starts making decisions for you — instead of executing decisions you’ve already made — you’ve built a second AI system inside your AI system. You pay for it twice.

ConductorOS is the operating system for AI cost optimization. Define your routing once. Run it forever. Pay only for the work that matters.