Unveil Logo
← Back to all work
Capability build Consumer · Field service · Applied vision-language R&D

Agentic vision-language repair guidance — an R&D capability build

Internal R&D agent — photo of a broken item in, diagnosed and visually-guided repair out. Vision-language reasoning + web retrieval + on-demand 3D part visualization.

Category
Capability build
Industry
Consumer · Field service · Applied vision-language R&D
Role
Architect, lead engineer (concept-validation)
Scope
Agentic vision-language pipeline: photo in → identification → web retrieval → diagnosis → adapted visual guidance
Duration
Internal R&D — concept-validation phase
Capabilities
Agentic AI Vision-Language Models Web Retrieval-Augmented Generation Depth & Segmentation On-Demand 3D Part Generation

What this is

An internal R&D capability build that demonstrates how we approach applied vision-language work end to end. The user uploads a photo of a broken household or automotive item; the system identifies the make and model, retrieves real repair information from the open web, diagnoses the issue conversationally, and presents the fix as a visualization adapted to the user’s actual unit.

This is a concept-validation R&D system — not a shipped product, and not a delivered client engagement.

Why we built it

Applied vision-language reasoning is one of the most over-promised areas in AI right now. We wanted to validate, on a problem we cared about, that the agent could handle the long tail — items the system has never seen before — by orchestrating retrieval and reasoning per query, rather than depending on a hand-authored library of canned guides.

What it does

Built with

LayerStack
Vision identificationLeading commercial vision-language model
Orchestration & dialogueLeading frontier LLM
Web retrievalCommercial web-retrieval APIs
Document parsingLayout-aware document parsers
Depth & segmentationOpen-source depth and segmentation models
Generative 3D (escalation)Leading generative-3D models
3D renderingBrowser-native 3D rendering
BackendPython + FastAPI
FrontendNext.js + React

What this means for you

If you have a workflow where a user provides a photo (or an image, scan, or video frame) and you need an AI agent to identify, diagnose, and produce visual guidance — field service, claims adjusting, equipment maintenance, retail returns triage, anything in that family — we can build it. Vision-language reasoning is a capability we have validated end to end.

Want to discuss applied vision-language work in your domain? Contact us.

Ready to talk about your project?

We respond within one business day.

← See all work