How fast does Hi Agent answer?

Under one second. The voice model is hot-warmed and never on hold. There is no IVR, no "press 1 for…" — the call connects to a natural-voice agent immediately, 24 hours a day, including holidays.

Does it actually sound human?

Listen for yourself — call the demo number on this page. It is a current-generation conversational voice model trained for trade-services dispatch, with your brand name, your service catalog, and your pricing tiers loaded in.

Will Hi Agent integrate with my field-service software?

Yes. Direct integration with ServiceTitan, Housecall Pro, Jobber, and Service Fusion. Bookings land in your dispatch calendar in real time, with the full call summary, customer details and recording attached. We can also write to a Google Sheet or your CRM via webhook.

What happens when a real emergency comes in?

Hi Agent distinguishes emergencies from routine intake (gas leak, water damage, no heat in winter) and pages your on-call technician with the full call context. The rest get booked into the next available slot.

How does pricing work?

Flat monthly subscription based on inbound call volume — no per-minute surprises. We benchmark against the cost of one part-time phone staffer (~$41,600 / year at $20 / hour) and price at roughly 20 % of that. Use the calculator on this page for your specific number.

Can I keep my existing phone number?

Yes. Hi Agent attaches to your current number via a SIP forward or Twilio number swap — your customers keep dialing the same number, and you keep ownership of it. We document the porting process in onboarding.

What about call recordings and data privacy?

All call audio and transcripts are encrypted at rest, retained 90 days by default (configurable), and never used to train shared models. Your customer data stays in your account. SOC 2 Type II audit in progress.

How long does setup take?

7 to 14 days end-to-end. We collect your service catalog, pricing tiers, dispatch rules and CRM credentials in a 60-minute kickoff call, then mirror your existing intake process before we go live.

All terms

Glossary

Voice AI Agent

Voice AI Agent — A voice AI agent is a software system that holds spoken conversations in real time using a combination of speech recognition, a large language model, and speech synthesis — typically used for customer-facing phone calls.

Also known as: conversational voice AI · voice agent · AI phone agent

The technical stack

A voice AI agent stitches together four components in real time:

ASR (Automatic Speech Recognition) — converts the caller's speech to text. Industry leaders in 2026: Deepgram Nova-3, OpenAI Whisper, Google Speech-to-Text. Streaming ASR is mandatory — non-streaming adds 500-1500ms of perceptible delay.
LLM (Large Language Model) — decides what to say back. Production deployments typically use GPT-5, Claude Opus 4.7, or Claude Sonnet 4.6 depending on latency-vs-quality tradeoffs.
TTS (Text-to-Speech) — converts the response into spoken audio. ElevenLabs, Cartesia, and OpenAI's voice models are the 2026 incumbents.
Orchestration — manages the call lifecycle, interruption handling, function calls (book appointment, transfer to human), and CRM integrations.

How voice AI agents differ from chatbots

Chatbots are text-first and turn-based — you type, it answers. Voice AI agents are speech-first and continuous — both sides speak, often overlapping ("barge-in"). The engineering is harder: the model has to detect when the caller has finished speaking, decide whether to interrupt, and recover gracefully when the conversation goes off-rails.

The hardest technical problem in voice AI is end-of-turn detection — knowing when the caller has finished a sentence vs paused mid-thought. Most production systems use a combination of voice activity detection (VAD), semantic completion estimation, and silence thresholds (typically 600-1200ms).

Latency budget

End-to-end latency — from caller stops speaking to agent starts speaking — is the single most-measured metric in voice AI. The industry target is under 800ms. Above 1500ms and callers assume the agent has hung up.

Typical 2026 breakdown:

ASR streaming partial: 100-200ms
LLM time-to-first-token: 250-450ms
TTS time-to-first-byte: 100-200ms
Network round-trips: 100-300ms
Total budget: 550-1150ms

Common deployments

Inbound receptionist — answer + qualify + book or dispatch. Largest commercial deployment in 2026.
Outbound sales / appointment-setting — high-volume cold outbound (regulatory friction in most US states; check TCPA).
In-call assist — coach a human agent in real time with prompts during the call.
IVR replacement — replace touch-tone menus with natural language ("press 1 for…" becomes "what can I help you with?").

Limitations in 2026

Multi-party conferences — most agents handle 1:1 calls only.
Code-switching — switching languages mid-conversation works but accents reduce ASR accuracy.
Emotional escalation — agents detect frustration but don't always defuse it. Routing to a human is still the standard pattern.
Complex authentication — voice biometric KYC is improving but not yet trusted for high-stakes verification.

Vendors by category (June 2026)

Vertical receptionists — Hi Agent (home services), Smith.ai (pro services), Goodcall (general SMB).
Dev platforms — Vapi, Bland.ai, Retell, Synthflow.
Enterprise CX — Sierra, Avoca AI, ServiceTitan Voice.
Voice infrastructure — ElevenLabs, Cartesia, Deepgram, OpenAI Realtime.

Related terms

AI receptionist AI dispatcher

See Hi Agent in action