For decades, the audio guide has been the default answer when a museum asks how to add a digital layer to a visit. Hand a visitor a handset, press a number, listen to a script. It is familiar, it is proven, and for many procurement teams it still feels like the safe choice.
But walk the floor of any major gallery in 2026 and the picture tells a different story. A significant share of visitors never pick up a device at all. Those who do often skip large sections of the tour. Front-of-house staff spend time explaining how the handset works rather than facilitating conversation. And when a temporary exhibition opens, the content pipeline starts again from scratch: new scripts, new numbers, new translations.
If you are a curator, interpretation lead, or visitor services manager, you already know the friction. The question is no longer whether digital interpretation matters. It is whether the audio guide paradigm (linear, device-centric, broadcast-style) still matches what visitors expect and what your institution can sustain.
Why audio guides still dominate
Audio guides persist for understandable reasons. They are a known quantity in capital budgets. Trustees understand them. Visitor research from the 1990s and 2000s validated their impact, and many institutions still reference that evidence in funding applications.
Procurement is another factor. A handset contract with maintenance and sanitisation feels tangible. Boards can point to a line item and a device count. By contrast, browser-based interpretation, where visitors use their own phones, can sound intangible until you see it working on the gallery floor.
There is also a curatorial comfort with the format. Audio tours let you control the narrative sequence. You decide what object 47 says and when the visitor hears it. That control is real, and for certain exhibition types (single-path storytelling, blockbuster retrospectives with a defined arc) it still has merit.
The problem is that most visits are not single-path. Visitors linger at one case, skip another, return with children, or arrive with specific questions a pre-recorded script cannot anticipate.
The hidden costs nobody puts in the business case
The upfront device cost is only the beginning. Consider what a typical audio guide programme actually consumes over a three-year cycle:
- Hardware lifecycle: Handsets break, batteries degrade, charging banks need staff time, and replacement cycles are rarely synchronised with exhibition schedules.
- Content refresh: Every label rewrite, object rotation, or new acquisition can orphan an audio stop. Updating scripts, re-recording voice talent, and re-numbering stops is slow and expensive.
- Multilingual duplication: Each language is not a toggle; it is a parallel production pipeline. For institutions with statutory language obligations (Welsh in Wales, for example) the cost multiplier is structural, not optional.
- Visitor drop-off: Linear tours assume compliance. Analytics from handset systems, where they exist at all, often show steep fall-off after the first few stops.
- Accessibility gaps: Audio-only formats exclude D/deaf visitors unless separate provision is made. Text alternatives are rarely integrated into the same experience.
- No dialogue: A visitor who wants to ask "Why is this sword bent?" or "How does this compare to the Viking display upstairs?" gets silence.
These costs rarely appear together in one spreadsheet. They are distributed across interpretation, IT, front-of-house, and external contractors. That fragmentation is part of why the format survives: no single team owns the full picture.
What visitors actually want now
Visitor expectations have shifted with smartphone behaviour. People are accustomed to on-demand answers, visual interfaces, and self-paced exploration. They do not want to learn a new device; they want to use the one in their pocket, provided the experience is fast, trustworthy, and worth the battery.
Research and floor observation consistently point to the same priorities:
- Self-paced exploration: Start anywhere, leave anytime, revisit what matters.
- Answers to their questions: Not only the narrative you wrote, but the questions they bring.
- No app install: App store friction eliminates a large fraction of potential users before they begin.
- Multilingual access: Switch language without collecting a different handset.
- Accessibility by design: Text, audio, and readable interfaces in one coherent experience.
- Trust: In an age of misinformation, visitors (and curators) care whether the answer comes from the institution or from the open web.
This is where the gap between legacy audio guides and contemporary behaviour is widest. The handset model was built for broadcast. Visitors now expect conversation.
From "press 47" to point-and-discover
The next generation of museum interpretation is not a better audio guide. It is a different interaction model entirely.
Point-and-discover means a visitor indicates what they are looking at, typically by pointing their phone camera at an exhibit, and receives interpretation grounded in that object. Combined with conversational AI under institutional guardrails, the experience becomes a dialogue rather than a monologue.
You can see how this works in practice in our interactive product preview: scan a QR code, open in the browser, point at an object, receive a short on-brand snippet, then explore deeper with facts mode, Q&A, and line-level source citations. No app store. No numbered keypad.
For visitor experience teams, the shift unlocks several capabilities audio guides never offered:
- Non-linear paths: Visitors engage with what interests them, in the order they choose.
- Multilingual interpretation: Language switching without separate hardware or duplicate recording schedules.
- Accessibility integrated: Readable text, adjustable presentation, and inclusive design as part of the core product rather than an add-on.
- Questions the script did not anticipate: Governed AI can respond within approved collection knowledge rather than forcing every possible question into a pre-written script.
Critically, this is not consumer ChatGPT on a personal phone. The institutional proposition is different: answers grounded in approved collections, governed by your tone, reading level, safety policies, and accessibility requirements, not whatever the open web returns.
What to evaluate when you move on
If you are assessing what comes after, or alongside, your current audio infrastructure, these criteria help separate durable solutions from novelty:
Visitor friction
How many steps stand between entering the gallery and receiving useful interpretation? QR-to-browser should be two steps, not five.
Curatorial control
Can interpretation staff approve, edit, and govern what the system says? Can you trace an answer back to a specific approved source?
Content operations
When a label changes, how long until the digital layer reflects it? Hours, not months, should be the target for minor updates.
Multilingual and accessibility
Is language switching built in? Are accessibility features part of the default experience or a separate workflow?
Analytics
Can you report anonymised engagement data to leadership and funders: which galleries draw attention, where visitors ask questions, how satisfaction trends over time?
Total cost of ownership
Include content refresh, language expansion, staff time, and device logistics, not just year-one capital expenditure.
The future is not louder, it is more responsive
Audio guides made museums louder. They amplified a single institutional voice through a speaker pressed to the ear. The next phase is not about volume. It is about responsiveness: meeting visitors where their curiosity actually is, in the language they prefer, with answers they can trust.
That does not mean abandoning everything audio guides did well. Narrative craft, scholarly rigour, and emotional storytelling still matter. The delivery mechanism is what has expired.
Institutions that treat this as a hardware refresh (newer handsets, shinier cases) will get incremental improvement. Institutions that treat it as an interpretation paradigm shift (browser-based, visual, conversational, governed) will reshape what visitors expect from cultural venues entirely.
