Unveil Logo
← Back to all work
Capability build Consumer · Field service · Applied vision-language R&D

Agentic vision-language repair guidance, an R&D capability build

Internal R&D agent, photo of a broken item in, diagnosed and visually-guided repair out. Vision-language reasoning + web retrieval + on-demand 3D part visualization.

Category
Capability build
Industry
Consumer · Field service · Applied vision-language R&D
Role
Architect, lead engineer (concept-validation)
Scope
Agentic vision-language pipeline: photo in → identification → web retrieval → diagnosis → adapted visual guidance
Duration
Internal R&D, concept-validation phase
Capabilities
Agentic AI Vision-Language Models Web Retrieval-Augmented Generation Depth & Segmentation On-Demand 3D Part Generation

What this is

An internal R&D capability build that demonstrates how we approach applied vision-language work end to end. The user uploads a photo of a broken household or automotive item; the system identifies the make and model, retrieves real repair information from the open web, diagnoses the issue conversationally, and presents the fix as a visualization adapted to the user’s actual unit.

This is a concept-validation R&D system, not a shipped product, and not a delivered client engagement.

Why we built it

Applied vision-language reasoning is one of the most over-promised areas in AI right now. We wanted to validate, on a problem we cared about, that the agent could handle the long tail, items the system has never seen before, by orchestrating retrieval and reasoning per query, rather than depending on a hand-authored library of canned guides.

What it does

Built with

LayerStack
Vision identificationLeading commercial vision-language model
Orchestration & dialogueLeading frontier LLM
Web retrievalCommercial web-retrieval APIs
Document parsingLayout-aware document parsers
Depth & segmentationOpen-source depth and segmentation models
Generative 3D (escalation)Leading generative-3D models
3D renderingBrowser-native 3D rendering
BackendPython + FastAPI
FrontendNext.js + React

What this means for you

If you have a workflow where a user provides a photo (or an image, scan, or video frame) and you need an AI agent to identify, diagnose, and produce visual guidance (field service, claims adjusting, equipment maintenance, retail returns triage, anything in that family), we can build it. Vision-language reasoning is a capability we have validated end to end.

Want to discuss applied vision-language work in your domain? Contact us.

Ready to talk about your project?

We respond within one business day.

← See all work