reference pronunciation, espeak-ng —
synthesized from the spelling by espeak's own rules, not
from the model's phonemes
Attention heatmap
Each cell shows how strongly the model attended to an input letter (column) while producing an output phoneme (row) — darker means more attention.