How LAP Compresses API Specs 10×

Yesterday we talked about the problem: API specs are bloated, and agents pay the price. Today, the solution.

LAP compresses API specs by an average of 10× across our registry of 1,500+ APIs. Not by lossy summarization or AI-generated shortcuts — by systematic removal of everything an agent doesn’t need.

Here’s how.

Tier 1: Field Pruning

The easiest wins come from removing fields that serve human readers, not machine consumers.

A typical OpenAPI endpoint looks like this:

/users/{id}:
  get:
    summary: "Get a user by ID"
    description: "Retrieves a single user object by their unique identifier.
      The response includes all public profile fields. For private fields,
      use the /users/{id}/private endpoint with appropriate scopes.
      Rate limited to 100 requests per minute."
    operationId: getUserById
    tags: ["Users", "Core"]
    externalDocs:
      url: "https://docs.example.com/users"
    x-custom-field: "internal-tracking"

After LAP processing:

GET /users/{id} → Get a user by ID

What got removed:

  • description — the summary says it all. Descriptions often repeat the summary with extra prose
  • tags — organizational metadata for doc generators
  • externalDocs — agents can’t browse documentation URLs
  • operationId — internal identifier, not needed for calling the API
  • x-* extensions — vendor-specific metadata

This alone typically cuts 30-40% of token count.

Tier 2: Schema Deduplication

OpenAPI specs are full of repeated schema definitions. A User object might appear in GET /users, POST /users, PATCH /users, and GET /users/{id} — four copies of the same structure.

LAP resolves all $ref chains and deduplicates at the structural level. If two schemas are identical or near-identical (differing only in optionality), they’re collapsed into a single definition referenced by name.

The impact is format-dependent:

  • OpenAPI specs with deep $ref nesting: 40-60% schema reduction
  • Flat specs with inline schemas: 10-20% reduction

Tier 3: Enum Sampling

Some APIs define enums with dozens or hundreds of values. Currency codes (180 values), country codes (249), time zones (500+). An agent doesn’t need all of them to understand the parameter type.

LAP samples enums down to a representative subset — typically 3-5 values — with a count indicator:

Before: currency: enum [USD, EUR, GBP, JPY, AUD, CAD, CHF, CNY, ... 180 values]
After:  currency: enum [USD, EUR, GBP, ...+177]

The agent knows it’s a currency code, knows the format, and can infer valid values. The other 177 entries added nothing.

Tier 4: Description Truncation

When descriptions survive pruning (because no summary exists), they get truncated to the first meaningful sentence. API descriptions tend to follow a pattern: one sentence of what it does, three paragraphs of edge cases and caveats.

Before: "Creates a new payment intent. Payment intents guide you through
  the process of collecting a payment from your customer. They track
  the lifecycle of a customer checkout flow and trigger additional
  authentication steps when required by regulatory mandates, custom
  Radar rules, or redirect-based payment methods. For a list of
  supported payment methods..."

After: "Creates a new payment intent."

Agents that need the edge cases will discover them through API responses, not spec descriptions.

Tier 5: Response Simplification

Full response schemas are the biggest token sinks. A GET /orders response might include nested objects for line items, shipping, billing, tax breakdowns, refunds, and metadata — hundreds of schema lines for a single endpoint.

LAP reduces response schemas to their top-level structure. The agent knows what fields come back without drowning in nested definitions it doesn’t need upfront.

Format-Specific Handling

LAP isn’t just an OpenAPI tool. It handles five input formats, each with its own compression strategies:

FormatAvg. InputAvg. OutputCompression
OpenAPI 3.x89K tokens12K tokens7.4×
Swagger 2.065K tokens9K tokens7.2×
GraphQL34K tokens8K tokens4.3×
AsyncAPI28K tokens7K tokens4.0×
Protobuf22K tokens6K tokens3.7×

OpenAPI benefits most because it’s the most verbose format. GraphQL is already relatively compact — its schema language was designed for machines from the start.

What Stays

Compression is only useful if agents can still do their job. LAP preserves:

  • Every endpoint — no routes are removed
  • Parameter names, types, and constraints — required/optional, min/max, patterns
  • Auth requirements — security schemes and where credentials go
  • Request body structure — what the API expects
  • Top-level response shape — what comes back
  • One-line descriptions — enough to choose the right endpoint

The Result

Across 500 benchmark runs with Claude Sonnet, agents using LAP-compressed specs achieved a 0.851 success rate compared to 0.824 on full specs. Better performance with half the tokens.

The compression isn’t magic. It’s the recognition that 90% of an API spec is written for a human audience that’s increasingly not the one reading it.


See it yourself:

pip install lapsh
lapsh compile petstore.yaml -t lean

Then diff the output against the original. You’ll see exactly what was removed — and realize none of it mattered for the agent.

Star the repo on GitHub if this approach resonates.