AI · LLM integration · UX + frontend

Comparing LLM Models and Prompts in a single UI

Built an open-source workbench to compare OpenAI, Claude, and Google AI models in one UI, with diffs, costs and latency readouts. End to end: UX, UI, and code.

Aidiff composer: compare models or prompts in one UI
  • ChatGPT, Claude, and Gemini in one local app.
  • Compare models and prompts side by side in one UI.
  • AI-powered analysis and differentiation of answers.
  • React 18, Vite 5, Cursor, GitHub
Loading interactive demo…

Portfolio embed

UI sandbox with mock data: explore layout and controls here without API keys. The full app with live OpenAI, Claude, and Google runs lives in the repository (.env, npm run dev).

Aidiff on GitHub
01. Problem

Comparing LLM models and prompts is tedious

Comparing models or small prompt tweaks is everyday work, but most people do it without APIs: they open ChatGPT, Claude, and Gemini in the browser and switch tabs by hand.

The manual loop: type the same question again, find the old thread, scroll two long answers, and judge differences from memory.

There is still no one screen with parallel replies and a structured diff, unless you wire integrations yourself.

What gets in the way

  • Tab switching: context and wording drift between products

  • Manual repetition: one prompt, multiple paste cycles

  • Unstructured comparison: long prose compared by eye, not by a shared diff view

02. Solution

One screen. Several LLMs. No tab marathon.

Aidiff is a comparison workbench, not another chat app: one prompt, parallel columns, no copy-paste between ChatGPT, Claude, and Gemini.

Diff, keywords, and performance tabs sit on top of the raw answers so comparison needs less scrolling and less guesswork.

"Explain quantum entanglement in simple terms for a curious teenager."
GPT-4oOpenAI
1.14s$1.72/1K
Imagine two coins that are magically linked: whenever one lands heads, the other is always tails, even if they are far apart. Entanglement is a bit like that, but with tiny particles and rules from quantum physics.

When particles are entangled, measuring one instantly tells you something about the other. It does not mean you can send messages faster than light; nature just keeps their states tied together in a way classical objects cannot mimic.
Claude 3.5 SonnetAnthropic
1.48s$3.13/1K
Quantum entanglement means two (or more) particles share a single quantum state. Measuring one particle constrains what you can observe in the other, no matter how far apart they are.

For a curious teenager: think of it as a perfectly coordinated pair of dice that only make sense when you look at them together. Scientists use entanglement in quantum computing and secure communication; it is real physics, not science fiction.

How the solution breaks down

Less friction

One composer, one send. Test several models in minutes, not across an afternoon of tabs.

Two modes

Compare models or prompt variants in the same app. Switch modes in one click: two use cases, one workbench.

Built-in analysis

Results, Differences, Performance. Structured readouts instead of eyeballing two long replies.

03. Technical setup

React + Vite with same-origin API proxies

Step 1: the composer calls OpenAI, Claude, and/or Google in parallel through the Vite proxy. Each column returns answer text plus token and latency metadata. Step 2: when every active column succeeds, those answers are bundled and sent once more through /api/google (Gemini Flash). diffParsing.js turns the markdown into the Differences tab. Results shows the raw columns; Performance combines the same metadata with MODEL_PRICING on the client, with no third API call.

  • Two network phases: parallel compare, then one Gemini diff over all replies.
  • diffParsing.js splits diff markdown into keywords, mini-matrix, and assessment UI.
04. UI

UI building blocks of the workbench

Aidiff is designed and built in React components, not static mockups: glass controls, compare-mode switch, tab rail, composer actions, model slots, run cards, file chips, and settings rows. I chose a modern glass look with frosted layers, soft depth, and quiet borders so dense compare layouts stay readable without a heavy chrome shell. Below is an interactive component preview with demo data (no live API calls on this page).

Interactive preview of Aidiff UI components with demo data; no live API calls in the portfolio embed.

CompareModeSwitch

TabBar

HeaderSettingsButton

Composer · Send

Composer · Attach & Meta pill

ComposerModelSlots

SettingsApiKeyInputRow (preview)

CollapsedRun

"Explain quantum entanglement in simple terms for a curious teenager."

FileChip

context-notes.md4KB

Dots (loading cue)

ThemeSchemeToggle

05. Takeaways

What I did and what I learned

Solo build in four days: a local LLM comparison workbench, then open source (MIT) for anyone who wants the same workflow.

4 days

Solo build

2 modes

Model & prompt compare

3 APIs

OpenAI · Claude · Google

What I did

  • Zero to one: solo, end to end

    From first idea to a finished, runnable product in four days: research, UX, UI, frontend, and integration. Design and code in one loop: layout and behaviour shifted in the same sessions, not as a hand-off between Figma and implementation.

  • Provider wiring & API keys

    Hooked up OpenAI, Claude, and Google through Vite same-origin proxies, plus a keys panel that persists to local `.env`, so setup stays in the app and compare runs stay one click away.

  • Workbench UI that removes friction

    Built a shell that combines parallel columns, composer, slots, and Results / Differences / Performance: one send instead of tabs, copy-paste, and guessing from memory. The UI takes work off the user, not adds another chat surface.

What I learned

  • Designing in code works

    If you know the patterns and know what you need, you can move from structure to shipped UI in React: no Figma pass required when you orchestrate layout, states, and flows in the codebase. Decisions land as components and behaviour, not specs waiting on implementation.

  • Side-by-side beats tab switching

    Answers next to each other, with latency and estimated cost per column on the same screen, cut cognitive load more than another chat feature. Comparison became scan work, not memory work across browser tabs.

  • Ship the diff v1, then iterate

    The diff layer can be expanded and tuned a lot; what mattered was shipping a working first version instead of holding the release until automated analysis felt finished. Enough to trust for daily compare, with room to push quality later.

React · Vite · Cursor · OpenAI · Claude · Google

Clone the repo, add API keys to .env, and run npm run dev locally.

More Projects