Skip to content

fix: discovered A2A agents report Available=False (ExecutionEngine 'a2a' not found) #2532

@Nab-0

Description

@Nab-0

Problem

Agents created by A2AServer discovery are fully queryable but permanently report Available=False in kubectl get agents and as "Unavailable" in the dashboard, with the condition message:

ExecutionEngine 'a2a' not found in namespace '<ns>'

a2a is a built-in pseudo-engine, not a real ExecutionEngine custom resource, so the availability probe can never satisfy this check — the agent is pinned unavailable for its entire lifetime even though it works.

Reproduced on a live cluster with samples/a2a/simple-agent:

$ kubectl get a2aserver simple-agent
NAME           READY   DISCOVERING   ADDRESS                                            AGE
simple-agent   True    False         http://simple-agent.default.svc.cluster.local:80  ...

$ kubectl get agents
NAME           MODEL   AVAILABLE   AGE
simple-agent           False       ...          # condition: ExecutionEngine 'a2a' not found

$ ark agent query simple-agent "who are you?"
# ...responds correctly. Queries work despite Available=False.

Context

Two controllers interpret spec.executionEngine.name: a2a differently, and only the dispatcher knows it is built-in.

a2a is a code constant, not a CRD:

// ark/internal/a2a/a2a_types.go:9
const ExecutionEngineA2A = "a2a"

Query dispatch correctly special-cases it before any CR lookup:

// ark/internal/controller/query_controller.go (~324-338)
if agentCRD.Spec.ExecutionEngine == nil { ... }                 // default executor
if agentCRD.Spec.ExecutionEngine.Name == arka2a.ExecutionEngineA2A {
    // route straight to sendQueryA2A(), no ExecutionEngine CR lookup
}
// only here does it Get() a real ExecutionEngine CRD

Agent availability does not special-case it. checkDependencies runs a generic check for any agent with an executionEngine set:

// ark/internal/controller/agent_controller.go (~112-116)
if agent.Spec.ExecutionEngine != nil {
    if available, reason, msg := r.checkExecutionEngineDependency(ctx, agent); !available {
        return false, reason, msg
    }
}

and checkExecutionEngineDependency unconditionally fetches an ExecutionEngine CR by name, which 404s for a2a:

// ark/internal/controller/agent_controller.go (~195-201)
var engine arkv1prealpha1.ExecutionEngine
if err := r.Get(ctx, types.NamespacedName{Name: engineName, ...}, &engine); err != nil {
    if errors.IsNotFound(err) {
        return false, "ExecutionEngineNotFound",
            fmt.Sprintf("ExecutionEngine '%s' not found in namespace '%s'", engineName, engineNamespace)
    }
}

Note that checkDependencies already calls checkA2AServerDependency first (agent_controller.go ~100), which validates the owning A2AServer is Ready=True — the correct availability signal for A2A agents. That check passes; the agent then redundantly fails the ExecutionEngine-CR check that should not apply to the built-in engine.

Impact

  • Misleading status for every discovered A2A agent: kubectl get agents shows Available=False and the dashboard shows "Unavailable", contradicting the fact that the agent serves queries.
  • Erodes trust in the availability condition and complicates anything that gates on it (tooling, health checks, team membership readiness).
  • Purely cosmetic today (dispatch ignores the condition), but it is a correctness bug in the status surface.

Proposed Fix

Make availability agree with dispatch: treat the built-in a2a engine as resolved instead of looking for a nonexistent CR. In checkExecutionEngineDependency (or before calling it in checkDependencies):

if agent.Spec.ExecutionEngine.Name == arka2a.ExecutionEngineA2A {
    return true, "", ""   // built-in A2A engine; A2AServer readiness is the real dependency
}

A2A agents then derive availability from checkA2AServerDependency (already in place), mirroring how the nil-engine default-executor case is handled. No new CRD or resource is introduced.

Task Breakdown

  • Stop failing agent availability for the built-in a2a execution engine; let A2AServer readiness drive availability for discovered agents.
  • Keep the real-ExecutionEngine-CR existence/readiness check for agents that reference an actual execution engine.
  • Confirm a discovered A2A agent reaches Available=True while its A2AServer is Ready, and flips to unavailable when the server is not ready.

Testing Approach

  • Unit test for the agent controller: an agent with executionEngine.name: a2a owned by a Ready A2AServer resolves to Available=True without any ExecutionEngine CR present.
  • Unit test: an agent referencing a non-built-in execution engine still reports unavailable when that ExecutionEngine CR is missing or not ready (no regression).
  • Integration/e2e (extend the existing tests/a2a-* suites): deploy samples/a2a/simple-agent, confirm the discovered agent becomes Available=True and is queryable.
  • Edge case: A2AServer transitions to not-ready → owned agent flips to Available=False with an A2AServer-based reason (not an ExecutionEngine reason).

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions