← Back to Documentation

Hold Music

VoiceRail plays hold music during long reasoning operations, call transfers, and when connecting calls. Configure custom hold music to match your brand.

Overview

Hold music is configured at the organization level. When set, all assistants in your organization will use the same hold music. If not configured, VoiceRail uses a pleasant default track.

When hold music plays:

  • During MCP tool calls that take longer than 3 seconds
  • During webhook reasoning calls that take longer than 3 seconds
  • While connecting outbound calls (ringing phase)
  • During call transfers

Supported Formats

VoiceRail supports two audio formats optimized for telephony:

FormatRequirementsNotes
MP3
  • 16kHz sample rate
  • Mono channel
  • 64-128kbps bitrate
  • Must have ID3v2 tag
Recommended. Smaller file size.
WAV
  • 16kHz sample rate
  • Mono channel
  • 16-bit signed PCM
  • Little-endian byte order
Larger files but simpler encoding.

Important: ID3v2 Tag Requirement

MP3 files must include an ID3v2 tag header. Files without this tag will fail to play. Most audio software adds this automatically, but if you're using ffmpeg, ensure you include -id3v2_version 3.

Converting Audio with FFmpeg

Use FFmpeg to convert any audio file to a compatible format:

Convert to MP3 (Recommended)

Terminal
# Convert any audio file to VoiceRail-compatible MP3
ffmpeg -i input.wav \
  -c:a libmp3lame \
  -b:a 128k \
  -ar 16000 \
  -ac 1 \
  -id3v2_version 3 \
  -write_id3v2 1 \
  output.mp3

# Explanation:
# -c:a libmp3lame   Use LAME MP3 encoder
# -b:a 128k         128kbps bitrate (sufficient for voice/music)
# -ar 16000         16kHz sample rate (telephony standard)
# -ac 1             Mono audio (required for telephony)
# -id3v2_version 3  Include ID3v2 tag (required)
# -write_id3v2 1    Force ID3v2 header

Convert to WAV

Terminal
# Convert to VoiceRail-compatible WAV
ffmpeg -i input.mp3 \
  -c:a pcm_s16le \
  -ar 16000 \
  -ac 1 \
  output.wav

# Explanation:
# -c:a pcm_s16le    16-bit signed little-endian PCM
# -ar 16000         16kHz sample rate (telephony standard)
# -ac 1             Mono audio (required for telephony)

File Size Guidelines

Keep your hold music files small for fast loading:

DurationMP3 (128kbps)WAV (16-bit)
30 seconds~480 KB~960 KB
1 minute~960 KB~1.9 MB
2 minutes~1.9 MB~3.8 MB
Recommended30-60 seconds< 2 MB

Tip: Hold music loops automatically. A 30-second track is usually sufficient - longer tracks increase load time without benefit.

Hosting Your Audio

Your hold music URL must be publicly accessible or include authentication (like a SAS token). We recommend Azure Blob Storage or AWS S3.

Azure Blob Storage

Terminal
# Upload to Azure Blob Storage
az storage blob upload \
  --account-name yourstorageaccount \
  --container-name audio \
  --name hold-music.mp3 \
  --file output.mp3 \
  --content-type audio/mpeg

# Generate SAS URL (valid for 1 year)
az storage blob generate-sas \
  --account-name yourstorageaccount \
  --container-name audio \
  --name hold-music.mp3 \
  --permissions r \
  --expiry $(date -u -d "1 year" +%Y-%m-%dT%H:%MZ) \
  --full-uri

AWS S3

Terminal
# Upload to AWS S3
aws s3 cp output.mp3 s3://your-bucket/audio/hold-music.mp3 \
  --content-type audio/mpeg

# Generate pre-signed URL (valid for 7 days max for S3)
aws s3 presign s3://your-bucket/audio/hold-music.mp3 \
  --expires-in 604800

# For longer access, make the object public or use CloudFront

URL Requirements

  • HTTPS required - HTTP URLs are not supported
  • Direct link - No redirects or login pages
  • Stable URL - Ensure SAS tokens have long expiry (1+ year)
  • CORS not required - VoiceRail fetches server-side

Configuring Hold Music

Update your organization settings with your hold music URL:

Terminal
curl -X PATCH https://api.voicerail.ai/v1/organizations/{org_id} \
  -H "Authorization: Bearer $VOICERAIL_KEY" \
  -H "X-Organization-Id: $ORG_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "holdMusicUrl": "https://your-storage.blob.core.windows.net/audio/hold-music.mp3?sv=..."
  }'

To remove custom hold music and return to the default, set holdMusicUrl to null.

Best Practices

Choose appropriate music

Select calm, non-distracting music without lyrics. Instrumental tracks work best. Avoid music that might clash with your brand or caller expectations.

Ensure seamless looping

Edit your audio so it loops cleanly without clicks or jarring transitions. The last note should flow naturally into the first.

Normalize volume levels

Match the volume of your hold music to the assistant's voice. Callers shouldn't need to adjust their volume when switching between hold and conversation.

Test on actual phones

Telephony audio sounds different from computer speakers. Test your hold music by making actual calls to ensure it sounds good over the phone network.

Respect copyright

Ensure you have rights to use your chosen music. Consider royalty-free music libraries or commissioning original compositions.

Troubleshooting

ProblemSolution
No audio playsCheck URL accessibility. Try opening in browser. Verify SAS token hasn't expired.
Audio sounds distortedRe-encode at 16kHz sample rate. Ensure mono channel.
MP3 fails to loadVerify ID3v2 tag is present. Re-encode with -id3v2_version 3.
Audio too quiet/loudNormalize to -16 LUFS (broadcast standard). Use ffmpeg's loudnorm filter.
Long load timeReduce file size. Use MP3 instead of WAV. Host in same region as VoiceRail (East US 2).

Related documentation