Open JTalk
To convert the text in the array into Japanese and use Open JTalk to create MP3 files for each entry, follow these steps:
Install
Ubuntu/Devian (WSL2)
sudo apt update sudo apt install open-jtalk open-jtalk-mecab-naist-jdic hts-voice-nitech-jp-atr503-m001
Mac
brew install open-jtalk
1. Translate the Text into Japanese
Here’s the translated text in Japanese:
text_with_timestamps = [
(0, 1.6, "待って。誰だった?"),
(2.6, 4.2, "その男は誰?"),
(5.2, 9.6, "彼女は身を乗り出し、唇がほとんど彼の耳に触れるほど近づいて囁く。"),
(10.6, 11, "囁く。"),
(41.8, 42.2, "しまったしまったしまった。")
]
2. Install Open JTalk
If you haven’t already installed Open JTalk, follow these steps:
On Ubuntu/Debian:
sudo apt update
sudo apt install open-jtalk open-jtalk-mecab-naist-jdic hts-voice-nitech-jp-atr503-m001
On macOS (using Homebrew):
brew install open-jtalk
3. Prepare the Script
Save the following Python script to generate MP3 files using Open JTalk:
import os
# Translated text with timestamps
text_with_timestamps = [
(0, 1.6, "待って。誰だった?"),
(2.6, 4.2, "その男は誰?"),
(5.2, 9.6, "彼女は身を乗り出し、唇がほとんど彼の耳に触れるほど近づいて囁く。"),
(10.6, 11, "囁く。"),
(41.8, 42.2, "しまったしまったしまった。")
]
# Path to Open JTalk voice file (adjust as needed)
voice_path = "/usr/share/hts-voice/nitech-jp-atr503-m001/nitech_jp_atr503_m001.htsvoice"
# Output directory for MP3 files
output_dir = "output_audio"
os.makedirs(output_dir, exist_ok=True)
# Generate MP3 files
for i, (start, end, text) in enumerate(text_with_timestamps):
# Save text to a temporary file
text_file = f"temp_{i}.txt"
with open(text_file, "w", encoding="utf-8") as f:
f.write(text)
# Output WAV file
wav_file = os.path.join(output_dir, f"output_{i}.wav")
# Run Open JTalk command
os.system(f"open_jtalk -x /var/lib/mecab/dic/open-jtalk/naist-jdic -m {voice_path} -ow {wav_file} {text_file}")
# Convert WAV to MP3 using ffmpeg
mp3_file = os.path.join(output_dir, f"output_{i}.mp3")
os.system(f"ffmpeg -i {wav_file} -vn -ar 44100 -ac 2 -b:a 192k {mp3_file}")
# Clean up temporary files
os.remove(text_file)
os.remove(wav_file)
print(f"Generated: {mp3_file}")
4. Run the Script
- Save the script to a file, e.g.,
generate_audio.py
. - Run the script:
python3 generate_audio.py
5. Output
The script will generate MP3 files for each text entry in the output_audio
directory. Each file will be named output_0.mp3
, output_1.mp3
, etc.
Notes:
- Ensure ffmpeg is installed for WAV-to-MP3 conversion:
sudo apt install ffmpeg # On Ubuntu/Debian brew install ffmpeg # On macOS
- Adjust the
voice_path
variable if you’re using a different voice file. - Open JTalk’s output may sound robotic. For more natural-sounding Japanese TTS, consider using Google Text-to-Speech or VoiceText.