Copilot & the agentic layer — technical plan (Issue #81)
Status: plan / not yet implemented. This is the technical/architecture plan for the copilot module and the core agentic primitives it sits on. It is the implementation contract for the design approved in the design phase. Companion: the ADRs listed in §13.
Scope cut (owner): v1 = thin vertical slice + mandatory PHI redaction + OpenAI provider only. The
Providerprotocol stays vendor-agnostic so Anthropic plugs in later with zero changes belowapp/core/llm/. Deferred: Anthropic provider, RAG/pgvector, proactive/scheduled agents, self-hosted/Ollama provider, admin dashboards. Rationale in the design phase; not repeated here.
1. Layering recap & what each layer changes
| Layer | Where | New / changed |
|---|---|---|
| A — core engine (reusable by any agent surface) | app/core/agents/, app/core/llm/ | Provider protocol + OpenAIProvider + provider factory (Anthropic later); orchestrator.py (provider-agnostic tool loop); redaction.py (PHI boundary); one optional field on Tool. |
| B — tool contract (owned by each domain module) | app/modules/<m>/tools.py | get_tools() wraps that module's own services. v1 backfill: patients, agenda. Elevated to a documented module obligation. |
| C — copilot surface (thin consumer) | app/modules/copilot/ + Nuxt layer | Conversation persistence, SSE chat, per-clinic settings/budget, drawer + /copilot page. depends: []. |
Refinements to the approved design doc (decided during the architecture audit)
- Inline confirmation is a turn-level pause, not the core
AgentApprovalQueue. Acceptance criteria require the actor to confirm their own write mid-turn. That is modelled as conversation state (an assistanttool_usewith notool_resultyet), resolved by a confirm/reject call. The asyncAgentApprovalQueuestays reserved for the later service/supervisor mode. - Frontend streaming uses
fetch()+ReadableStream.getReader(), notEventSource/useEventSource—EventSourceis GET-only and cannot send theAuthorization: Bearerheader this app uses (useApi.ts:29-52). - Per-clinic config + budget live in a dedicated
copilot_settingstable (atomic monthly counters), notClinic.settingsJSONB. - The global drawer mounts via a new
app.overlayshost slot — the generic mount point for agent surfaces (reused by future voice #64). One small host edit; copilot stays otherwise host-decoupled and removable. AgentContext.permissions = get_role_permissions(ctx.role)(the wildcard grant list). The chokepoint'spermission_matches(required, granted)already resolves wildcard grants, so this is identical to what routers enforce — no expansion needed for the backend path.
2. Layer A — core engine
2.1 LLM provider (app/core/llm/)
The whole point of this sub-package is that the orchestrator never knows which vendor it talks to. It works on neutral types; each provider adapts to/from its wire format.
base.py— neutral message + event types and the protocol:ProviderMessage— vendor-agnostic turn:role (system|user|assistant|tool)+ a list of content blocks (text,tool_use{id,name,input},tool_result{tool_call_id,content}). Each provider serializes these to its own shape (Anthropic content blocks vs OpenAItool_calls/role:"tool"messages).class Provider(Protocol):async def complete(self, *, system: str, messages: list[ProviderMessage], tools: list[dict], model: str, max_tokens: int) -> AsyncIterator[ProviderEvent].ProviderEventunion (vendor-agnostic):TextDelta,ToolUse(id, name, input),Usage(input_tokens, output_tokens),Done(stop_reason).
openai_provider.py— v1: the only live provider.OpenAIProviderover theopenaiSDK (client.chat.completions.create(..., stream=True)or the Responses API). Tool schemas viatool_to_openai_schema()(app/core/agents/tools/schema.py:28). Maps OpenAI streaming deltas →ProviderEvent: assembles fragmentedtool_callsdeltas (id/name/arguments arrive in pieces) into a singleToolUse; reads usage from the final chunk (stream_options={"include_usage": True}).anthropic_provider.py— deferred. When added:AnthropicProviderover theanthropicSDK (client.messages.stream(...)), tool schemas viatool_to_anthropic_schema()(app/core/agents/tools/schema.py:19). The serializer already exists, so this is a self-contained add — no orchestrator/redaction/budget changes.factory.py—get_provider(name: str) -> Providerresolving"openai"in v1 (raise a clear "unsupported provider" error for anything else;"anthropic"/"ollama"slot in later). The orchestrator/bridge passesdialect = "openai"toregistry.schemas_for(...)and the chosenmodelstring.- Resolution order: per-clinic
copilot_settings.provider(defaults to"openai") +.modeloverride the global defaults inapp/config.py. The provider readsOPENAI_API_KEY; a provider with no key configured is rejected at settings-save time so a clinic can't select a provider the deployment can't serve. - Deps: add
openai>=1.40tobackend/pyproject.toml.
Redaction, budget, audit, and the inline-confirm pause are all upstream of the provider and therefore vendor-agnostic — adding Anthropic later changes nothing below app/core/llm/.
2.2 Orchestrator (app/core/agents/orchestrator.py)
The reusable tool-use loop. No HTTP / SSE / copilot knowledge. Signature sketch:
async def run_turn(
*, ctx: AgentContext, provider: Provider, system: str,
history: list[ProviderMessage], tool_names: list[str],
redactor: Redactor, budget: BudgetGuard,
) -> AsyncIterator[TurnEvent]Loop per turn:
budget.check()→ if over limit, yieldBudgetExceededand stop.redactor.redact_outgoing(history)→ tokenized messages + tool schemas (ctx.tools.schemas_for(tool_names, "anthropic"), already permission-filtered in §4).provider.complete(...); streamTextDelta→ yieldToken(rehydrated for display). AccumulateToolUse+Usage.- On
ToolUse:- READ tool → execute immediately via
ctx.tools.call(...)(the chokepoint), redact the result, appendtool_result, continue the loop. - WRITE/DESTRUCTIVE tool → do not execute. Yield
ConfirmationRequired(call_id, tool, args)and return (turn suspended). The unansweredtool_useis the persisted pending state.
- READ tool → execute immediately via
- On
Donewith no tool calls → yieldFinal, tallyUsageintobudget.
The orchestrator calls ctx.tools.call() for execution, so guardrails + RBAC + Pydantic + audit are inherited unchanged. Copilot uses a GuardrailConfig with auto_require_approval_for_destructive=False and require_approval_for=[] (inline confirm replaces the queue) but keeps max_actions_per_minute/session and blocked_tools (the hard denylist).
2.3 Redaction (app/core/agents/redaction.py)
The PHI boundary. Per-session SymbolTable mapping real value ↔ stable opaque token (PATIENT_7a3f, PHONE_22b1, EMAIL_…, APPT_…). API: redact_outgoing(payload) -> payload', rehydrate(text) -> text', resolve_tool_args(args) -> args'.
v1 coverage (mandatory, honest about its edges):
- Structured tool inputs/results — key-based redaction over the JSON: a PII key denylist (
first_name,last_name,full_name,phone,mobile,email,dni,nif) plus UUID-valued*_idfields known to reference patients/appointments → tokenized; the same value always maps to the same token within a session (so the model can reason about "the same patient"). - Seeded context entities — the names/ids in
context_jsonbare pre-loaded into the symbol table at session start. - User free text — substring replacement of entities already in the symbol table. Known gap: a name typed by the user for an entity not yet loaded cannot be caught without NER. Documented; NER is a later milestone.
- Free-text-returning tools (e.g. a future "summarize history") carry
Tool.exposes_free_text=True(new optional field, defaultFalse) and are excluded from the cloud path in v1 — the registry filters them out oftool_nameswhen redaction is on. None of the v1 tools (§3) return free prose, so v1 ships clean. - Rehydration runs on every
Tokenbefore it reaches the client and on tool-call args beforectx.tools.call().
Tool gains exactly one optional field: exposes_free_text: bool = False. No other change to the frozen dataclass.
2.4 Config (app/config.py)
Add OPENAI_API_KEY: str = "", COPILOT_PROVIDER_DEFAULT: str = "openai", COPILOT_MODEL_CHAT_OPENAI: str = "gpt-4.1", COPILOT_MAX_TOKENS: int = 4096, COPILOT_REDACTION_DEFAULT: bool = True. Per-clinic copilot_settings overrides provider and model. (ANTHROPIC_API_KEY + Anthropic model default are added when that provider lands.)
3. Layer B — the tool contract & v1 backfill
3.1 The tools.py pattern (the maintainability guarantee)
Each module gets a tools.py; __init__.py.get_tools() returns tools.get_tools(). Handlers call the module's own existing service with ctx.clinic_id — zero logic duplication.
# backend/app/modules/patients/tools.py
class SearchPatientsArgs(BaseModel):
query: str
limit: int = 20
async def _search(ctx: AgentContext, p: SearchPatientsArgs):
items, _ = await PatientService.list_patients(ctx.db, ctx.clinic_id, search=p.query, page=1, page_size=p.limit)
return [{"id": str(i.id), "full_name": f"{i.first_name} {i.last_name}", "phone": i.phone} for i in items]
def get_tools() -> list[Tool]:
return [Tool(name="search_patients", description="Buscar pacientes por nombre/teléfono/email.",
parameters=SearchPatientsArgs, handler=_search,
permissions=["patients.read"], category=ToolCategory.READ)]3.2 v1 tool set (only what the slice needs)
| Module | Tool | Category | Wraps | Permission |
|---|---|---|---|---|
patients | search_patients | READ | PatientService.list_patients | patients.read |
patients | get_patient | READ | PatientService.get_patient | patients.read |
patients | create_patient | WRITE | PatientService.create_patient | patients.write |
agenda | get_day_overview | READ | AppointmentService.list_appointments (day filter) | agenda.appointments.read |
agenda | book_appointment | WRITE | AppointmentService.create_appointment | agenda.appointments.write |
agenda | cancel_appointment | DESTRUCTIVE | AppointmentService.cancel_appointment | agenda.appointments.write |
Deferred: find_free_slots (free-slot computation lives in schedules; it registers its own tool when its slice lands — no cross-module import from agenda). All other modules backfill incrementally under the same contract.
Post-v1 backfill
The live inventory is each module's CLAUDE.md "Tools exposed" table. Additions since v1, newest last:
| Module | Tool | Category | Permission |
|---|---|---|---|
agenda | get_appointment, list_cabinets, list_professionals | READ | agenda.appointments.read / agenda.cabinets.read |
schedules | get_availability, find_free_slots | READ | schedules.availability.read (+agenda.appointments.read) |
payments | payments_summary, collections_by_method | READ | payments.reports.read |
reports | billing_report, top_clients_by_billing, scheduling_report | READ | reports.billing.read / reports.scheduling.read |
patient_timeline | get_patient_timeline | READ | patient_timeline.read |
catalog | list_catalog_items, get_catalog_item | READ | catalog.read |
agenda | reschedule_appointment, update_appointment_status | WRITE | agenda.appointments.write |
patients | update_patient (contact data only) | WRITE | patients.write |
recalls | list_due_recalls, get_recall (exposes_free_text) | READ | recalls.read |
recalls | create_recall, log_contact_attempt, snooze_recall, complete_recall | WRITE | recalls.write |
budget | list_budgets, get_budget | READ | budget.read |
budget | send_budget (emails the patient) | DESTRUCTIVE | budget.write |
billing | list_invoices, get_invoice (invoice axis only, include_payments=False) | READ | billing.read |
payments | record_payment | WRITE | payments.record.write |
payments | patient_payment_history (collection axis only) | READ | payments.record.read |
Playbooks (composite capabilities — decision record)
Multi-step workflows (daily briefing, prepare-visit, fill-gap-from- recalls) are prompt-level playbooks, not composite tools: a _PLAYBOOKS block appended to the static system prompt in copilot/bridge.py plus permission-gated suggestion chips in CopilotSuggestions.vue. Rationale: the model already chains registered tools; every step still passes the registry chokepoint (RBAC, guardrails, audit, redaction). A composite tool inside copilot would need a synthetic context, its own partial-failure handling and a second orchestration path to maintain. Revisit only if observed chains prove unreliable in practice (repeated retry loops); record evidence before changing the approach.
3.3 Contract elevation (so it never drifts)
- Root
CLAUDE.md"When adding X, do Y" — new row: New agent-exposed capability → declare aToolin<module>/tools.py, wrap the existing service (no logic dup), setpermissionsto the gating RBAC string andcategoryconservatively (DESTRUCTIVE for side-effects / deletes), markexposes_free_text=Trueif it returns prose, document under "Tools exposed" in the moduleCLAUDE.md. docs/checklists/new-module.md— add a "Agent tools" section.docs/checklists/module-claude-template.md— add a "Tools exposed" table.- Scaffold —
backend/scripts/scaffold_module_docs.pyalready stubs docs; add atools.pystub emitter (or document the copy-paste pattern) so every new module is born agent-addressable.
4. Permission parity (the hard rule, mechanically)
- The copilot endpoint resolves
ClinicContextviaDepends(get_clinic_context)and buildsAgentContext.permissions = get_role_permissions(ctx.role)(app/core/auth/permissions.py:65) — the exact grant listrequire_permissionchecks against.clinic_id = ctx.clinic_id. - Tool schemas offered to the model are pre-filtered:
tool_names = [n for n in registry.list() if caller_can_use(n)], wherecaller_can_useANDs each tool's declaredpermissionsagainstctx.permissionsusingpermission_matches. The model never sees tools the user couldn't call. - Execution re-checks at the chokepoint (
ToolRegistry.call) regardless — defence in depth. Every handler filters byctx.clinic_id. - Net: the agent can neither see nor do anything the calling user couldn't through the UI. Proven by the RBAC + isolation tests in §11.
5. Layer C — the copilot module
5.1 Files & wiring
backend/app/modules/copilot/
__init__.py # CopilotModule(BaseModule)
models.py # CopilotConversation, CopilotMessage, CopilotSettings
schemas.py # Pydantic request/response
service.py # ConversationService, CopilotSettingsService, BudgetGuard
orchestrator_bridge.py # builds AgentContext + Provider + Redactor, drives core orchestrator, frames SSE
router.py # /api/v1/copilot endpoints
redaction_policy.py # PII key denylist for this deployment (or import core defaults)
migrations/versions/cop_0001_initial.py
frontend/ # Nuxt layer (§7)
CLAUDE.md CHANGELOG.md- Entry point in
backend/pyproject.toml:copilot = "app.modules.copilot:CopilotModule". alembic.iniversion_locations+=app/modules/copilot/migrations/versions.- Manifest:
depends: [],installable: True,auto_install: False,removable: True,category: "official",role_permissionsper §5.4,frontend.layer_path: "frontend",frontend.navigation: [{label:"copilot.nav", to:"/copilot", icon:"i-lucide-sparkles", permission:"copilot.chat", order:90}].
5.2 Data model (own Alembic branch ("copilot",), down_revision="0001")
copilot_conversations—id, clinic_id (FK, idx), user_id (FK), session_id (FK agent_sessions.id), title, context_jsonb, model, status, total_input_tokens, total_output_tokens, cost_cents, created_at, updated_at.copilot_messages—id, conversation_id (FK, idx), clinic_id (idx), role (system|user|assistant|tool), content_jsonb (text + tool_use/tool_result blocks), tool_call_id, tool_name, input_tokens, output_tokens, created_at. Source of truth for resuming a suspended turn (an assistanttool_usewith no matchingtool_result= awaiting confirmation).copilot_settings—clinic_id (PK), provider, model, redaction_enabled (default true), monthly_token_limit, monthly_cost_limit_cents, period_start, period_input_tokens, period_output_tokens, period_cost_cents, updated_at. Lazyget_or_create(RecallSettings pattern,recalls/service.py:599).
Every copilot_conversation opens a core Agent (type "copilot", lazy per-clinic get_or_create) + AgentSession (AgentService.start_session) so all tool calls land in agent_audit_logs automatically.
5.3 Endpoints (/api/v1/copilot, all ApiResponse-wrapped except the SSE stream)
| Method | Path | Permission | Notes |
|---|---|---|---|
| POST | /sessions | copilot.chat | create conversation; seed context_jsonb from body; create Agent+AgentSession |
| POST | /sessions/{id}/messages | copilot.chat | append user msg; SSE stream of token / tool_call / tool_result / confirmation_required / done / budget_exceeded |
| POST | /sessions/{id}/confirmations/{call_id} | copilot.chat | `{decision: confirm |
| GET | /sessions / /sessions/{id}/messages | copilot.history.read (read_all to see others') | replay |
| POST | /sessions/{id}/end | copilot.chat | close |
| GET/PATCH | /settings | copilot.configure (admin) | model + budget + redaction toggle |
5.4 Permissions & events
get_permissions()→["chat", "history.read", "history.read_all", "supervise", "configure"](no prefix; registry →copilot.*).role_permissions: admin["*"]; dentist/hygienist/assistant/receptionist["chat", "history.read"].- New
EventTypeconstants (app/core/events/types.py):COPILOT_SESSION_STARTED,COPILOT_SESSION_ENDED,COPILOT_TOOL_INVOKED,COPILOT_BUDGET_THRESHOLD_REACHED. Published viaawait event_bus.publish(...)withclinic_idstringified. Re-rungenerate_catalogs.py.
6. The two core flows (sequence)
Ask (read-only)
POST /sessions/{id}/messages → persist user msg → SSE open → orchestrator: budget ok → redact → provider streams tokens (rehydrated → token events) → model calls search_patients (READ) → registry.call (RBAC+clinic+audit) → redact result → loop → provider final answer → done (usage tallied). One round trip, no pause.
Act (write, inline confirm)
… provider calls book_appointment (WRITE) → orchestrator yields confirmation_required{call_id, tool, args} and suspends (assistant tool_use persisted, no tool_result). SSE closes. Drawer renders an approval card with the (rehydrated) args. User taps Confirmar → POST /sessions/{id}/confirmations/{call_id}{decision:confirm} → registry.call(book_appointment) executes (audit row written) → persist tool_result → resume: reload history, call provider again → streams the confirmation sentence → done. Reject → persist a tool_result of "usuario canceló" → resume → model acknowledges. Booking happens only after confirm — satisfies the acceptance criterion.
7. Frontend (Nuxt layer)
- Mount: add a
<ModuleSlot name="app.overlays" :ctx="{}" />tofrontend/app/layouts/default.vue(one host edit, the agent-surface mount point). Copilot'splugins/overlays.client.tsregistersCopilotMount.vueinto it: renders nothing inline, teleports a FAB +<USlideover>drawer tobody, and binds the globalCmd/Ctrl+Klistener (model:components/settings/SettingsSearch.vue:41-70). Removing the module removes the slot entry → zero residue. - Streaming:
composables/useCopilotStream.ts—fetch(url, {method:'POST', headers:{Authorization:Bearer …}, body}), readres.body.getReader(), parsedata: …\n\nframes into typed events. (NotuseEventSource; see §1.2.) Token storage/headers peruseApi.ts/useAuth.ts. - Components:
CopilotDrawer,CopilotMessageList,CopilotComposer,CopilotToolCallCard(collapsed args/result),CopilotConfirmationCard(Confirmar/Cancelar → confirmations endpoint),CopilotCitation. Auto-imported ({path:'./components', pathPrefix:false}). - Page:
frontend/pages/copilot/index.vue→/copilot(history + long sessions). - Context capture:
CopilotMountreadsuseRoute()+useClinic()to buildcontext_jsonb({patient_id?, appointment_id?, screen}) on session create — so "agéndale revisión" needs no restated patient. - i18n:
i18n/locales/{en,es}.json, namespacedcopilot.*; ES default, EN parity. - Permissions: add
copilot: {chat, historyRead, configure}tofrontend/app/config/permissions.ts; gate launcher/page withcan(PERMISSIONS.copilot.chat). - Mobile-first for drawer (full-height sheet on small viewports) and
/copilotpage (project rule). - Settings page via
plugins/settings.client.ts→registerSettingsPage({category:'workspace', permission:'copilot.configure', …})for model + budget + redaction toggle.
8. SSE & DB-session handling (the footgun)
The streaming endpoint resolves auth/context with the request-scoped Depends(get_clinic_context) before streaming. Inside the SSE generator it opens its own async with async_session_maker() as db: for the orchestrator/persistence work (commit per tool call + per persisted message), instead of relying on Depends(get_db) whose lifetime around a streaming body is fragile. AgentContext.db = that generator-scoped session. media_type="text/event-stream", frames f"data: {json.dumps(evt)}\n\n".
9. Budget enforcement
BudgetGuard reads copilot_settings; before each provider call checks period_* vs limits. ≥80% → publish COPILOT_BUDGET_THRESHOLD_REACHED once per period (banner via app.banners). ≥100% → orchestrator yields budget_exceeded, no provider call. After each turn, Usage increments period_* (atomic UPDATE … SET period_input_tokens = period_input_tokens + :n). Period rolls when period_start month changes (lazy reset on read).
10. Build order (v1)
- Core engine —
app/core/llm/(neutral types + protocol,OpenAIProvider, factory;openaidep, config),orchestrator.py,redaction.py,Tool.exposes_free_text. Unit-tested with a fake provider; one live smoke test against OpenAI. - Tool backfill —
patients/tools.py,agenda/tools.py+get_tools()wiring. Registry tests. - copilot backend — models + migration branch, settings/budget, conversation service, orchestrator bridge, SSE endpoint, confirm endpoint, events, permissions, entry point, alembic.ini.
- copilot frontend —
app.overlaysslot, mount/drawer/composer/cards,useCopilotStream, page, i18n, permissions.ts, settings page. - Docs + catalogs + ADRs — §12, §13; run
generate_catalogs.py+scaffold_module_docs.py --modules copilot.
11. Test plan (templates from the audit)
Backend (pytest, fixtures client/auth_headers/db_session/test_clinic from conftest.py:99-201):
- Tool unit — each v1 tool: success path + clinic-scope (mirror
test_agents_registry.py:116). - RBAC parity — receptionist
AgentContext(perms fromget_role_permissions("receptionist")) → apayments-gated tool returnspermission denied(mirrortest_agents_registry.py:170); same user'ssearch_patientsworks. - Multi-tenancy isolation — a tool invoked with clinic A's
ctxcannot read clinic B's row (mirrortest_media.pycross-clinic 404 → here assert empty/denied). - Inline confirm — drive the orchestrator with a fake provider that emits a
book_appointmenttool_use: assert the turn suspends withconfirmation_requiredand no appointment row exists; POST confirm → row created + audit row; POST reject → no row, audit reflects skip. - Redaction — assert no raw
full_name/phone/email/patient-UUID appears in the payload handed to the fake provider; assert displayed tokens are rehydrated; assert a tool flaggedexposes_free_textis excluded fromtool_nameswhen redaction is on. - Budget — over-limit
copilot_settings→ orchestrator yieldsbudget_exceeded, provider never called. - Provider abstraction — run the orchestrator turn against a fake provider emitting the neutral
ProviderEventsequence; assert redaction/confirm/audit behave purely on neutral types (proves Anthropic will drop in later untouched). Assertfactory.get_providerraises on unsupported names andsettingsPATCH rejects a provider whose API key is unset. - Uninstall round-trip — install → converse → uninstall leaves core agent tables + other modules intact (
test_uninstall_roundtrip.pypattern).
Manual: docker-compose up, login admin@demo.clinic, Cmd+K, run an Ask ("pacientes que se llamen María") and an Act ("agenda revisión a María mañana 10:00" → confirm card → confirm → booked).
12. Docs obligations (CLAUDE.md "When adding X")
New module ⇒ app/modules/copilot/{CLAUDE.md, CHANGELOG.md}; docs/technical/copilot/{overview,events,permissions}.md (scaffold); user-manual EN+ES per screen + screenshots in docs/screenshots/copilot/; new EventTypes + docs/technical/copilot/events.md; generate_catalogs.py; contract-elevation edits (root CLAUDE.md row + checklist + template + scaffold stub, §3.3); frontend/app/config/permissions.ts.
13. ADRs (docs/adr/, next = 0014)
- 0014 — Agent permission parity & the tool chokepoint (why
ctx.permissions = get_role_permissions(role); tools never widen access; inline-confirm vsAgentApprovalQueue). - 0015 — Agentic layer as a core platform primitive (orchestrator/provider/redaction in
app/core;tools.pycontract;app.overlayssurface slot; PHI redaction mandatory).
14. Open risks for implementation
- Free-text redaction gap (§2.3) — v1 excludes free-text tools from cloud; communicate the limitation in UI copy. NER is a later milestone.
- Turn resume correctness — resuming from
copilot_messagesmust reconstruct the exacttool_use/tool_resultordering the provider expects; cover with the inline-confirm test. - SSE session lifetime (§8) — verify the generator-scoped session under load; no reliance on
Depends(get_db)during streaming. agenda.book_appointmentvalidation — the tool must surface the service's conflict/validation errors as a structuredtool_resultso the model can explain failures rather than the stream 500-ing.