Question 1

How is a Hebrew-specific STT engine better than a generic model like Whisper?

Accepted Answer

Generic models are trained primarily on English and then add Hebrew as a secondary language. An engine trained specifically on Hebrew goes deeper - an acoustic model that knows the sounds of spoken Hebrew, a language model that understands Hebrew grammar, and call-center audio handling in particular. The result: higher accuracy on real calls, especially those with Israeli slang.

Question 2

What is your accuracy on telephony audio (low sample rate)?

Accepted Answer

Phone-call audio is typically 8kHz, significantly lower than studio recordings. Our engine is trained specifically on telephony audio, so accuracy remains high even at low sample rates - unlike models trained only on high-quality audio.

Question 3

Does the system provide word-level timestamps?

Accepted Answer

Yes. Every transcript comes with word-level timestamps - useful for precise search inside a call, syncing playback to specific points, and analyzing speaking pace for both agent and customer.

Question 4

Is your STT available as a standalone API?

Accepted Answer

Nivision is sold as a full platform - transcription is one component of it. If you need only an STT layer, please reach out and describe the scenario so we can suggest the right fit.

Question 5

How fast is a transcript available after a call ends?

Accepted Answer

Nivision processes transcription post-call: the transcript is available within minutes of the call ending. In-call real-time streaming (sub-second latency) is on the roadmap; for most call-center use cases, the post-call transcript is fast enough to power alerts, the dashboard, summaries and CRM sync.

Question 6

What about overlapping voices or calls with more than two speakers?

Accepted Answer

The system supports speaker separation in calls with 3-4 participants (for example: a sales call with two reps and one customer). Separation accuracy drops slightly with very similar voices or heavy overlap, but remains usable for analysis.

Question 7

Does the model learn the unique terminology of my company?

Accepted Answer

Yes. You can load a custom term dictionary (product names, acronyms, professional terms) that improves accuracy on expressions uncommon in general Hebrew. The system also gradually learns from your calls.

Question 8

What about the security of the audio and transcripts?

Accepted Answer

All audio and transcripts are encrypted at rest and in transit, and retention policies are configurable. Nivision supports the security and compliance standards required by Israeli regulated industries. Full details: /security.

STT engine category	Accuracy on Hebrew call audio	Speaker separation	Telephony audio handling	Operational integration	Local support	Recommended fit
Generic open-source models (Whisper-class)	Moderate, version-dependent	Not built-in	Limited	Requires custom build	Community	R&D projects and self-built pipelines
International cloud STT services	Good on clean audio, drops on call audio	Add-on	Limited	API only	Generic, not local	Projects that need a generic speech recognition API
Hebrew call-center STT (like Nivision)	High, built for call audio	Built-in	Yes	Full platform	Israeli, call-center focused	Call centers that need a full operational solution

A Speech-to-Text engine trained specifically on Hebrew.

The technology that turns Hebrew audio into clean text - the foundation of every call analysis.

Four factors that produce operational-grade Hebrew STT:

Acoustic + language model trained on Hebrew

Coverage across accents and speakers

Robustness to noise and compressed audio

Fast post-call transcription at scale

Who needs a Hebrew-specific Speech-to-Text engine?

Three categories of Hebrew STT engines - which fits call centers?

The outcome

FAQ

Turn your conversations into action.

Talk to us