You handle everything yourself:
Your App → [Your IPA parser] → [Your frame builder] → speechPlayer_queueFrame/Ex() → DSP → Audio
What you do:
• Parse IPA text into phonemes
• Look up formant values from your own phoneme table
• Build speechPlayer_frame_t structs (47 parameters)
• Build speechPlayer_frameEx_t structs (5 parameters) if you want voice quality
• Calculate timing/duration yourself
• Call speechPlayer_queueFrame() or speechPlayer_queueFrameEx() directly
• Mix per-phoneme FrameEx with user settings yourself (if desired)
Pros: Full control, no frontend dependency, smaller footprint Cons: You reimplement all the phoneme logic, coarticulation, prosody, etc.
Path B: Frontend + DSP (nvspFrontend.dll → speechPlayer.dll)
Frontend does the heavy lifting:
Your App → nvspFrontend_queueIPA[_Ex]() → [Frontend magic] → Your Callback → speechPlayer_queueFrame/Ex() → DSP → Audio
You might be thinking, but that's more layers!
The alternative would be pushing mixing into the DSP, but then:
• DSP needs phoneme awareness (wrong layer!)
• Or every driver reimplements mixing (inconsistency, bugs)
The "extra layer" is actually the frontend doing its job - keeping linguistic smarts out of the DSP and out of every driver.
Borris
in reply to Toni Barth • • •Toni Barth
in reply to Borris • • •