IraVoice

A Hyper Scalable and Secure Media Server for VoiceAI Applications

IraVoice is a bidirectional media server that brings together Dialer, Recorder, and BotStream modules to enable seamless telephony services for both legacy CX systems and modern conversational AI applications.

It delivers enterprise-grade capabilities including API-driven call control, real-time recording, conferencing, media streaming, trunk management, and QoS monitoring. Designed for flexibility, IraVoice integrates with PSTN or any PBX over E1 or SIP, while connecting to VoiceBots and AI platforms through secure WebSocket interfaces.

Built on the proven Freeswitch telephony platform and deployed on Kubernetes, IraVoice offers high performance, horizontal scalability, and fault-tolerant operation. With low-latency messaging powered by NATS, it ensures reliable communication across distributed data centers.

By unifying dialer functions, compliance-grade recording, and real-time AI streaming in a single platform, IraVoice provides a versatile, future-ready backbone for mission-critical voice applications.

Use Cases

Load Testing Suite

Telephony Control, Compliance Recording,
and AI Streaming In One Platform.

Unified Architecture

Combines Dialer, Recorder, and BotStream modules in a single media server.

Media Streaming

Real-time access to raw audio streams for AI/ML enrichment.

PSTN & PBX Integration

Seamless inbound/outbound connectivity over E1 or SIP with any PBX.

NATS Messaging

High-speed, low-latency communication between telephony switches and VoiceAI engines.

Auto-Scaling

Kubernetes-native scaling to handle fluctuating traffic loads.

Answering Machine Detection

Improve efficiency by distinguishing live voice, fax, or machines.

Lightweight Deployment

Run on-premise or in cloud containers with minimal resource usage.

Secure Audio

Encrypted media streams with TLS and Secure RTP.

Enterprise-Grade Reliability

99.999% availability with compliance to industry security standards.

Built-in Telephony Features

Conferencing, DTMF, barge-in, audio playback, and more out of the box.

Real-Time Analytics Ready

Streamlined for monitoring, reporting, and AI-driven insights.

Future-Ready Platform

Modular design enabling seamless integration with next-gen CX and AI systems.

IraVoice

FAQs

VoiceAPI is a set of tools that lets you add calling(make call/receive call) capabilities to your applications using an Application Programming Interface(API). Our telephony layer takes care of handling all the telephony functions while you can focus on your area of expertise.

Our products are being used in various industry verticals such as Lead generation, Debt collection, Voicebots, Consent collection,Election Campaigns,Outbound Campaign Managers.

Businesses can add value to their existing customer engagement channels by adding voice. This gives a powerful feature to those who are running legacy softwares or do not want to get into the telephony domain and leverage our years of domain knowledge to improve their customer experience.

Yes, along with VoiceAPIs our CPaaS platform provides APIs for WhatsApp, SMS etc which can be consumed by the enterprise applications to create an omnichannel solution.

No, our solution takes care of handling all SIP trunking, gateway management, any troubleshooting required at the telephony end. Your application can use the APIs published to get required features added.

Yes, the media of the established calls can be sent over the websocket to the required application using a secure protocol.

IraVoice, when hosted on an 8-core, 8GB RAM server, can handle up to 800 simultaneous
voicebot sessions with recording enabled, and up to 1000 simultaneous sessisons without
recording.

Besides CPU resource constraints, the call capacity also depends on the SIP trunk configuration,
including the number of channels allocated and the CPS (Calls Per Second) limit defined by the
telecom provider.

Yes, IraVoice supports supports WhatsApp Business Calling.

Dial limit can be managed via the IraVoice Trunk Manager, either through the HTTP API or
directly in the Trunk Manager interface.

IraVoice supports streaming in both 8khz and 16khz.

Inbound calls in IraVoice are configured through dialplans within the setup. These dial plans
are typically implemented by the Epicode team based on your inbound routing requirements.

Parameters can be set within the call_params as follows:
VAD Mode (“enable_vad”: true)
      – Audio is delivered as complete utterances whenever the user finishes speaking.
Non-VAD Mode (“enable_vad”: false)
      – Audio is streamed in chunks.
      – chunk_size can be configured under call_params.
      – Default: 3200 bytes (200 ms of audio).
Silence Threshold (“silence_threshold”)
      – Audio level threshold to consider a segment as silence.
      – Can range from 1 to 20, with a default of 5.
      – Silence Duration: Calculated as threshold value × 250 ms to determine when a segment is considered silent.
Speech Threshold (“speech_threshold”)
      – Speech level threshold to consider a segment as speech
      – Defines the minimum amplitude level required to classify a segment as speech.
      – Works best when set between 500 and 600. Default: 800

Custom SIP headers for IraVoice outbound calls can be configured using the channel_vars
parameter in the make_call API: “channel_vars”: { “<sip-header-name>”: “value string” }
For Inbound Calls, custom SIP headers can be defined within the IraVoice dialplan configuration.

Possible causes of latency and voice distortion in IraVoice include:
Network issues: High jitter, packet loss, or unstable bandwidth between SIP trunks, media
servers, and VoiceAI endpoints.
Server resource constraints: CPU or memory saturation on the host machine.
Inconsistent streaming configurations: Mismatched sample rates or chunk sizes
between endpoints leading to distorted audio.
Media routing complexity: Long network paths or multiple proxy hops introducing
transmission delays.
VoiceBot response time: Slow response from AI models or APIs used in VoiceBots adding
to overall call latency.


To minimize or avoid these issues:
• Use dedicated bandwidth and maintain network jitter below 30 ms.
• Allocate adequate CPU and memory resources based on expected concurrency.
• Ensure consistent audio streaming configurations across all endpoints.
• Optimize VoiceBot applications to minimize response delays between streaming chunks.

We support recording uploads to your AWS s3 buckets, Azure Cloud Storage, Google cloud
storage, or directly to your webhook endpoint.

To upload call recordings to your cloud, we will require the necessary credentials for your cloud
storage. Recordings will be uploaded as calls are completed.

If you prefer to use non-cloud storage, please provide a server with adequate storage capacity.
Epicode will upload the recordings to this server and share a secure HTTP endpoint for
download access.

If you prefer to use your own cloud storage bucket, You can share the necessary credentials to
us to upload the recordings. The recording files will follow a standardized naming convention
that includes the call UUID.

Example: e88e852e-da28-4e1f-bc97-fc00469faebf.mp3

Learn how you can add the power of telephony to your applications