Designing the trust layer

Nava's AI handles the analytical work, parsing invoices, comparing census data, surfacing discrepancies. My job was designing the experience that sits between what the system produces and the humans who need to act on it.

SERVICES

User research

AI enabled UX

systems thinking

prototyping

outcome

Audit turnaround reduced from hours to under 3 minutes. Shipped AI-first planning and results experience. Established a review-and-decision model that kept scope in check and set the product up to learn.

context

An AI-first product

Nava Benefits is an AI-enabled healthcare benefits platform that helps HR administrators manage and audit employee benefits programs. The audit tool is AI-first: the system parses carrier invoices and employee census data, generates a structured audit plan, runs the comparison, and surfaces discrepancies for human review.

I was the primary designer for the audit experience, owning both the planning flow and the results workspace end to end.

THE PROBLEM

Automation creates a new design problem

The product could automate a process that previously took hours or days. But because the AI was doing the analytical work, users needed to understand how the audit was configured and why specific discrepancies were surfaced.

Without that visibility, the output feels arbitrary. No HR administrator is going to act on findings they don't understand. AI wasn't just a feature here. It was the system doing the work. My job was designing the layer between that system and the humans who need to trust it. Three tensions shaped every design decision:

Tension 01

Automation

vs. transparency

Tension 02

Speed

vs. control

Tension 03

Summary

vs. validation

The constraint

A platform shift mid-project

Early in the project, I explored a dedicated planning interface. Then the strategy shifted.

The product team prioritized an assistant-first interaction model, and the decision was made to deliver planning within the existing chat interface rather than introducing a new standalone surface. The dedicated planning UI wasn't going to ship.

The underlying design thinking didn't get thrown out. It got translated.

The need for a clear, scannable summary before execution, the separation of high-level understanding from detailed configuration, and the support for both quick execution and deeper review all carried over into the chat-based model. That constraint is what led to the Plan Summary Artifact.

Design — audit planning

Structure inside the assistant

Instead of exposing the AI's configuration directly, the assistant translates it into something readable and scannable.

When the audit plan is generated, it surfaces in chat as a structured summary artifact. Users who trust the system can execute immediately. Users who want more control can open the full plan in a right panel. Two surfaces, doing different jobs:

chat

System communication and editing

Generates the plan summary, communicates what the system is doing, and handles lightweight editing interactions.

right panel

Structured review and execution

The full plan lives here. A sticky footer keeps the primary execution action visible regardless of scroll depth.

Not every user needs to inspect the plan before running an audit. The summary card accommodates both without making either feel like the wrong choice.

The three-minute processing window shaped this experience specifically. That drove the design of the task tracker: showing meaningful milestones without exposing unnecessary system detail or making the experience feel unpredictable.

Design — audit results

Review, not task management

A discrepancy says "something looks off. Review it and decide." That distinction shaped everything about the results workspace.

Not every discrepancy is an error. Some are explained by timing, carrier billing behavior, retroactive changes, or acceptable variance. The system surfaces them. The administrator applies judgment. A two-column structure reflects that:

Left column - triage and navigation

Each record shows the employee name, current status, number of flagged issues, and benefit type chips. Pure triage: who needs attention, how many issues, and what type of coverage is involved.

Right column - context and decision support

Employee context, a plain-language "What we found" summary, structured billing vs enrollment comparisons, and a lightweight decision interface. The tag describes what the system found. The status records what the administrator decided to do about it.

For v1, administrators have two resolution actions:

Action 01

Mark resolved

The discrepancy was a real issue and has been addressed. Doesn't mean the system fixed it automatically. It means the administrator completed their review and took whatever action was needed.

Action 02

Dismissed

Reviewed and determined not to require action. Not every discrepancy is a true error. Timing, billing lag, and retroactive changes all count.

We intentionally kept v1 as a review-and-decision model rather than a task management system. Building something heavier would have meant making assumptions about user behavior we didn't yet have data to support.

outcome

A foundation and a product stance

What shipped was a foundation. What it established was a product stance.

Usability testing confirmed that earlier designs, which exposed too much system detail during planning and used more narrative descriptions in results, created friction the final model resolved. The summary-first planning approach and simplified triage list both emerged from consistent feedback that earlier iterations were harder to follow and slower to navigate.

There are no post-launch metrics. But the experience was designed to generate them, particularly around whether users engage with the full plan before executing or proceed directly from the summary.

The broader contribution was establishing that the right product stance for v1 was a review-and-decision model rather than a task management system. A call that kept scope in check, avoided overbuilding on assumptions, and left room for the product to learn what a more mature system should actually look like.