# Voice Features

{% hint style="info" %}
All Voice Features require [iTranslator PRO](https://itranslator.app/premium) and the **Manage Server** permission.
{% endhint %}

## Overview

The Voice system bundles two related capabilities under a single `/voice` command:

* **Text-To-Speech (TTS)** — turn any text into spoken audio, either played live in your voice channel or delivered as an audio file.
* **Speech-To-Text (Transcription)** — let iTranslator join your voice channel and produce a live transcript in a dedicated thread, with pause/resume controls and exportable files.

All subcommands live under the same base command:

| Command             | What it does                                                             |
| ------------------- | ------------------------------------------------------------------------ |
| `/voice help`       | Display your remaining TTS/transcription quotas and the next reset date. |
| `/voice tts`        | Generate speech from text and play it live in your voice channel.        |
| `/voice tts-file`   | Generate speech from text and upload it as an audio file.                |
| `/voice transcribe` | Start a live transcription session in your voice channel.                |

## Monthly Quotas

Every Pro user is billed against two pooled monthly quotas, shared across all servers where they activate Pro:

| Resource           | Monthly allowance      |
| ------------------ | ---------------------- |
| TTS characters     | **100,000 characters** |
| Transcription time | **24 hours** of audio  |

`/voice help` shows your current remaining balance and the exact reset timestamp. **Quotas reset automatically on the 1st of each month.**

***

## Text-To-Speech (Live) — `/voice tts`

iTranslator joins your current voice channel and plays the synthesized audio.

**How to use it:**

1. Join a voice channel iTranslator has access to.
2. Run `/voice tts` with a voice and your text.
3. iTranslator connects, plays the generated audio, then disconnects when finished.

**Parameters**

| Parameter | Description                                                                       |
| --------- | --------------------------------------------------------------------------------- |
| `voice`   | The voice used to read the text. See [Available Voices](#available-voices) below. |
| `text`    | The text to convert to speech (up to 4,096 characters per call).                  |

{% hint style="warning" %}
iTranslator can only handle one voice session at a time per server. If it is already connected (e.g. running a transcription, or playing another TTS), the command will be rejected.
{% endhint %}

## Text-To-Speech (File) — `/voice tts-file`

Same as the live version, except the audio is uploaded as a file in the format you choose — no voice channel needed.

**Parameters**

| Parameter         | Description                                                                 |
| ----------------- | --------------------------------------------------------------------------- |
| `voice`           | The voice used to read the text. See [Available Voices](#available-voices). |
| `response-format` | Output audio format: `mp3`, `opus`, `aac`, `flac`, `wav`, or `pcm`.         |
| `text`            | The text to convert to speech (up to 4,096 characters per call).            |

### Available Voices

iTranslator currently supports the following voices:

* `alloy`
* `echo`
* `fable`
* `nova`
* `onyx`
* `shimmer`

***

## Live Transcription — `/voice transcribe`

iTranslator joins your voice channel and writes everything spoken into a dedicated **thread** attached to the channel where you ran the command. The transcript updates in near real-time and can be paused, resumed, stopped, or exported.

**How to use it:**

1. Join a voice channel iTranslator has access to.
2. Run `/voice transcribe` from a text channel where the bot can create public threads.
3. A status embed and a transcription thread are created — speak normally and watch the transcript flow in.

**Parameters**

| Parameter     | Required | Description                                                                                                                       |
| ------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------- |
| `language`    | No       | Forces the transcription language. If omitted, the language is auto-detected per chunk.                                           |
| `combined`    | No       | If `true`, all speakers are merged into a single feed. If `false` (default), each speaker gets their own webhook-styled messages. |
| `ignore-bots` | No       | If `true` (default), audio coming from other bots is skipped.                                                                     |

### How the session looks

After running the command, iTranslator posts a status embed with:

* The current **state**: 🔴 Recording / ⏸️ Paused / ✅ Ended
* The **initiator** of the session
* The active **duration** versus the session limit
* Three control buttons:
  * **Pause / Resume** — temporarily stop or resume recording.
  * **Stop** — end the session immediately.
  * **Generate file** — export the transcript as a `TXT` or `SRT` file.

Transcribed messages are then delivered into the thread, either with each speaker’s name and avatar (per-user mode) or as a single timestamped feed (combined mode).

{% hint style="info" %}
Only the **initiator** of the session, or members with the **Manage Server** permission, can use the Pause / Stop / Generate-file buttons.
{% endhint %}

### Session limits

| Limit                         | Value                                      |
| ----------------------------- | ------------------------------------------ |
| Maximum session duration      | **30 minutes** of active recording         |
| Simultaneous speakers tracked | **5 users** per session                    |
| Update interval               | New transcript chunks every **10 seconds** |

If more than 5 users speak at the same time, iTranslator transcribes the first 5 it heard and posts a one-time notice in the thread for each user it had to skip.

### Why does iTranslator stop on its own?

A transcription session can end for several reasons:

* **Manual** — someone pressed the **Stop** button.
* **Automatic** — the 30-minute duration limit was reached.
* **Quota** — the Pro user’s monthly transcription time ran out mid-session.
* **Disconnected** — the bot was kicked, lost permissions, or the voice channel was deleted.

In every case, a status message is posted in the thread to explain what happened, and the embed is updated to `✅ Ended`.

### Exporting the transcript

Click **Generate file** at any time during or after the session to download the transcript:

| Format | Content                                                                     |
| ------ | --------------------------------------------------------------------------- |
| `TXT`  | Plain text with timestamps and speaker names.                               |
| `SRT`  | Subtitle file (timestamps + text) ready to be loaded by most video players. |

The file is generated from the current state of the session, so you can export at any point — not only at the end.

### Combined vs Per-User mode

|              | Per-User (default)                                                            | Combined (`combined: true`)                                                    |
| ------------ | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------ |
| **Delivery** | One message per speaker, with their name & avatar (via webhook).              | A single stream of messages where each line is prefixed with the speaker name. |
| **Best for** | Multi-speaker conversations where you want to clearly identify who said what. | Note-taking, meeting summaries, or single-speaker sessions.                    |

{% hint style="warning" %}
Per-user mode uses Discord **webhooks** to keep each speaker’s identity. If iTranslator cannot create or use a webhook in the parent channel (e.g. missing permissions), it falls back to plain messages and posts a warning in the thread.
{% endhint %}

### Required permissions for transcription

For `/voice transcribe` to succeed, iTranslator needs to be able to:

* **Connect** to the voice channel you’re in.
* **Send messages** in the text channel where you ran the command.
* **Create public threads** in that text channel.
* **Manage webhooks** in the parent channel (optional — needed for per-user mode; otherwise the bot falls back to plain messages).

***

## Tips

* Run `/voice help` first to make sure you still have quota for the month.
* Want a quick voice memo without joining a channel? Use `/voice tts-file` and pick `mp3` or `opus`.
* For meetings, run `/voice transcribe` from a dedicated text channel — the thread keeps everything organized and easy to export at the end.
* The **language** parameter is optional but recommended for multilingual servers — it locks the transcription language and prevents auto-detect from switching mid-conversation.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.itranslator.app/amazing-features/voice-features.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
