Open JTalk

To convert the text in the array into Japanese and use Open JTalk to create MP3 files for each entry, follow these steps:

Install

  • Ubuntu/Devian (WSL2)

    sudo apt update
    sudo apt install open-jtalk open-jtalk-mecab-naist-jdic hts-voice-nitech-jp-atr503-m001
  • Mac

    brew install open-jtalk

1. Translate the Text into Japanese

Here’s the translated text in Japanese:

text_with_timestamps = [
    (0, 1.6, "待って。誰だった?"),
    (2.6, 4.2, "その男は誰?"),
    (5.2, 9.6, "彼女は身を乗り出し、唇がほとんど彼の耳に触れるほど近づいて囁く。"),
    (10.6, 11, "囁く。"),
    (41.8, 42.2, "しまったしまったしまった。")
]

2. Install Open JTalk

If you haven’t already installed Open JTalk, follow these steps:

On Ubuntu/Debian:

sudo apt update
sudo apt install open-jtalk open-jtalk-mecab-naist-jdic hts-voice-nitech-jp-atr503-m001

On macOS (using Homebrew):

brew install open-jtalk

3. Prepare the Script

Save the following Python script to generate MP3 files using Open JTalk:

import os

# Translated text with timestamps
text_with_timestamps = [
    (0, 1.6, "待って。誰だった?"),
    (2.6, 4.2, "その男は誰?"),
    (5.2, 9.6, "彼女は身を乗り出し、唇がほとんど彼の耳に触れるほど近づいて囁く。"),
    (10.6, 11, "囁く。"),
    (41.8, 42.2, "しまったしまったしまった。")
]

# Path to Open JTalk voice file (adjust as needed)
voice_path = "/usr/share/hts-voice/nitech-jp-atr503-m001/nitech_jp_atr503_m001.htsvoice"

# Output directory for MP3 files
output_dir = "output_audio"
os.makedirs(output_dir, exist_ok=True)

# Generate MP3 files
for i, (start, end, text) in enumerate(text_with_timestamps):
    # Save text to a temporary file
    text_file = f"temp_{i}.txt"
    with open(text_file, "w", encoding="utf-8") as f:
        f.write(text)

    # Output WAV file
    wav_file = os.path.join(output_dir, f"output_{i}.wav")

    # Run Open JTalk command
    os.system(f"open_jtalk -x /var/lib/mecab/dic/open-jtalk/naist-jdic -m {voice_path} -ow {wav_file} {text_file}")

    # Convert WAV to MP3 using ffmpeg
    mp3_file = os.path.join(output_dir, f"output_{i}.mp3")
    os.system(f"ffmpeg -i {wav_file} -vn -ar 44100 -ac 2 -b:a 192k {mp3_file}")

    # Clean up temporary files
    os.remove(text_file)
    os.remove(wav_file)

    print(f"Generated: {mp3_file}")

4. Run the Script

  1. Save the script to a file, e.g., generate_audio.py.
  2. Run the script:
    python3 generate_audio.py

5. Output

The script will generate MP3 files for each text entry in the output_audio directory. Each file will be named output_0.mp3, output_1.mp3, etc.


Notes:

  • Ensure ffmpeg is installed for WAV-to-MP3 conversion:
    sudo apt install ffmpeg  # On Ubuntu/Debian
    brew install ffmpeg      # On macOS
  • Adjust the voice_path variable if you’re using a different voice file.
  • Open JTalk’s output may sound robotic. For more natural-sounding Japanese TTS, consider using Google Text-to-Speech or VoiceText.