Tracking character boundary using speech synthesis












0














I am trying to use the Speech Synthesis API:



https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis



What I want to do is highlight a given word as it is being read outloud. Note, I don't want it to just highlight a single character, but the entire word - essentially the .char selector.



The problem is that my use-case is pretty complex.



I need a system that can handle CJK characters (specifically Chinese), can handle traditional and simplified characters, and interspersed untranslated English words. The Chinese words all need to have Pinyin (Romanization) as well as a translation associated with them in a structured way.



Currently I have the following structure - bewarned, it is large as I have had to add a ridiculous amount of markup to achieve what I want (I am desperately hoping for simplification methods) - including a span for literally every single character (this was to allow character indexing):



<div class="block-rawhtml normal-font"><ruby class="pinyined-char"><span class="char traditional speak selected-text" style=""><span class="ind-char">新</span><span class="ind-char">年</span><span class="ind-char">快</span><span class="ind-char">樂</span></span><span class="char simplified"><span class="ind-char">新</span><span class="ind-char">年</span><span class="ind-char">快</span><span class="ind-char">乐</span></span><rp>(</rp><rt>xīn nián kuài lè</rt><rp>)</rp><span class="translation">Happy New Year!</span></ruby> </div>


Now that markup above generates a sentence like this:



新年快樂


The problem is that when I am using the Speech Synthesis API, the onboundary event fires after the word has been read (leading to some lag in showing the highlight) as well as not hitting each and every character index, as I would expect (it only operates on a word boundary, which for CJK characters seems to be a bit fuzzier).



Speech Synthesis can only take text - so what I am currently doing is just grabbing the text inside of the markup, and mapping the index of Speech Synthesis to my bastardized indexing character span indexing (and then grabbing the parent of the individual character - which should theoretically be the whole Chinese word) - the mapping works fine - if if the onboundary character index hits all of the characters - but it doesn't.



I'm wondering if there's a way to either extend Speech Synthesis so that I can grab the correct character to read, or possibly to use SSML - supposedly SSML documents are supported and you can add mark tags (so theoretically instead of just a span, I could add a mark tag for every character)



https://developer.mozilla.org/en-US/docs/Web/Events/mark



I am pretty lost on this, and even while adding a ton of extra complexity, it seems like what I want to do is out of reach.



All I want to be able to do is highlight the whole Chinese word when an individual character is read, and for all the characters to hit the onboundary (or another) event.



 msg.onboundary = function (event) {
$('.selected-text').removeAttr('style');
highlight(event.charIndex);
};

function highlight(index) {
$('.traditional .ind-char, .all .ind-char').eq(index).parent().css('color', '#FFF').css('background-color', '#337ab7').addClass('selected-text');
}


My apologies if this is a little confusing - I am happy to elaborate on anything that is unclear - it's a fairly complex problem, and I may not (probably am not) going about it the right way.










share|improve this question






















  • I have since discovered that SSML works, however there is a bug that prevents correct processing on MacOS.
    – Andrew Alexander
    Dec 27 '18 at 23:02
















0














I am trying to use the Speech Synthesis API:



https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis



What I want to do is highlight a given word as it is being read outloud. Note, I don't want it to just highlight a single character, but the entire word - essentially the .char selector.



The problem is that my use-case is pretty complex.



I need a system that can handle CJK characters (specifically Chinese), can handle traditional and simplified characters, and interspersed untranslated English words. The Chinese words all need to have Pinyin (Romanization) as well as a translation associated with them in a structured way.



Currently I have the following structure - bewarned, it is large as I have had to add a ridiculous amount of markup to achieve what I want (I am desperately hoping for simplification methods) - including a span for literally every single character (this was to allow character indexing):



<div class="block-rawhtml normal-font"><ruby class="pinyined-char"><span class="char traditional speak selected-text" style=""><span class="ind-char">新</span><span class="ind-char">年</span><span class="ind-char">快</span><span class="ind-char">樂</span></span><span class="char simplified"><span class="ind-char">新</span><span class="ind-char">年</span><span class="ind-char">快</span><span class="ind-char">乐</span></span><rp>(</rp><rt>xīn nián kuài lè</rt><rp>)</rp><span class="translation">Happy New Year!</span></ruby> </div>


Now that markup above generates a sentence like this:



新年快樂


The problem is that when I am using the Speech Synthesis API, the onboundary event fires after the word has been read (leading to some lag in showing the highlight) as well as not hitting each and every character index, as I would expect (it only operates on a word boundary, which for CJK characters seems to be a bit fuzzier).



Speech Synthesis can only take text - so what I am currently doing is just grabbing the text inside of the markup, and mapping the index of Speech Synthesis to my bastardized indexing character span indexing (and then grabbing the parent of the individual character - which should theoretically be the whole Chinese word) - the mapping works fine - if if the onboundary character index hits all of the characters - but it doesn't.



I'm wondering if there's a way to either extend Speech Synthesis so that I can grab the correct character to read, or possibly to use SSML - supposedly SSML documents are supported and you can add mark tags (so theoretically instead of just a span, I could add a mark tag for every character)



https://developer.mozilla.org/en-US/docs/Web/Events/mark



I am pretty lost on this, and even while adding a ton of extra complexity, it seems like what I want to do is out of reach.



All I want to be able to do is highlight the whole Chinese word when an individual character is read, and for all the characters to hit the onboundary (or another) event.



 msg.onboundary = function (event) {
$('.selected-text').removeAttr('style');
highlight(event.charIndex);
};

function highlight(index) {
$('.traditional .ind-char, .all .ind-char').eq(index).parent().css('color', '#FFF').css('background-color', '#337ab7').addClass('selected-text');
}


My apologies if this is a little confusing - I am happy to elaborate on anything that is unclear - it's a fairly complex problem, and I may not (probably am not) going about it the right way.










share|improve this question






















  • I have since discovered that SSML works, however there is a bug that prevents correct processing on MacOS.
    – Andrew Alexander
    Dec 27 '18 at 23:02














0












0








0







I am trying to use the Speech Synthesis API:



https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis



What I want to do is highlight a given word as it is being read outloud. Note, I don't want it to just highlight a single character, but the entire word - essentially the .char selector.



The problem is that my use-case is pretty complex.



I need a system that can handle CJK characters (specifically Chinese), can handle traditional and simplified characters, and interspersed untranslated English words. The Chinese words all need to have Pinyin (Romanization) as well as a translation associated with them in a structured way.



Currently I have the following structure - bewarned, it is large as I have had to add a ridiculous amount of markup to achieve what I want (I am desperately hoping for simplification methods) - including a span for literally every single character (this was to allow character indexing):



<div class="block-rawhtml normal-font"><ruby class="pinyined-char"><span class="char traditional speak selected-text" style=""><span class="ind-char">新</span><span class="ind-char">年</span><span class="ind-char">快</span><span class="ind-char">樂</span></span><span class="char simplified"><span class="ind-char">新</span><span class="ind-char">年</span><span class="ind-char">快</span><span class="ind-char">乐</span></span><rp>(</rp><rt>xīn nián kuài lè</rt><rp>)</rp><span class="translation">Happy New Year!</span></ruby> </div>


Now that markup above generates a sentence like this:



新年快樂


The problem is that when I am using the Speech Synthesis API, the onboundary event fires after the word has been read (leading to some lag in showing the highlight) as well as not hitting each and every character index, as I would expect (it only operates on a word boundary, which for CJK characters seems to be a bit fuzzier).



Speech Synthesis can only take text - so what I am currently doing is just grabbing the text inside of the markup, and mapping the index of Speech Synthesis to my bastardized indexing character span indexing (and then grabbing the parent of the individual character - which should theoretically be the whole Chinese word) - the mapping works fine - if if the onboundary character index hits all of the characters - but it doesn't.



I'm wondering if there's a way to either extend Speech Synthesis so that I can grab the correct character to read, or possibly to use SSML - supposedly SSML documents are supported and you can add mark tags (so theoretically instead of just a span, I could add a mark tag for every character)



https://developer.mozilla.org/en-US/docs/Web/Events/mark



I am pretty lost on this, and even while adding a ton of extra complexity, it seems like what I want to do is out of reach.



All I want to be able to do is highlight the whole Chinese word when an individual character is read, and for all the characters to hit the onboundary (or another) event.



 msg.onboundary = function (event) {
$('.selected-text').removeAttr('style');
highlight(event.charIndex);
};

function highlight(index) {
$('.traditional .ind-char, .all .ind-char').eq(index).parent().css('color', '#FFF').css('background-color', '#337ab7').addClass('selected-text');
}


My apologies if this is a little confusing - I am happy to elaborate on anything that is unclear - it's a fairly complex problem, and I may not (probably am not) going about it the right way.










share|improve this question













I am trying to use the Speech Synthesis API:



https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis



What I want to do is highlight a given word as it is being read outloud. Note, I don't want it to just highlight a single character, but the entire word - essentially the .char selector.



The problem is that my use-case is pretty complex.



I need a system that can handle CJK characters (specifically Chinese), can handle traditional and simplified characters, and interspersed untranslated English words. The Chinese words all need to have Pinyin (Romanization) as well as a translation associated with them in a structured way.



Currently I have the following structure - bewarned, it is large as I have had to add a ridiculous amount of markup to achieve what I want (I am desperately hoping for simplification methods) - including a span for literally every single character (this was to allow character indexing):



<div class="block-rawhtml normal-font"><ruby class="pinyined-char"><span class="char traditional speak selected-text" style=""><span class="ind-char">新</span><span class="ind-char">年</span><span class="ind-char">快</span><span class="ind-char">樂</span></span><span class="char simplified"><span class="ind-char">新</span><span class="ind-char">年</span><span class="ind-char">快</span><span class="ind-char">乐</span></span><rp>(</rp><rt>xīn nián kuài lè</rt><rp>)</rp><span class="translation">Happy New Year!</span></ruby> </div>


Now that markup above generates a sentence like this:



新年快樂


The problem is that when I am using the Speech Synthesis API, the onboundary event fires after the word has been read (leading to some lag in showing the highlight) as well as not hitting each and every character index, as I would expect (it only operates on a word boundary, which for CJK characters seems to be a bit fuzzier).



Speech Synthesis can only take text - so what I am currently doing is just grabbing the text inside of the markup, and mapping the index of Speech Synthesis to my bastardized indexing character span indexing (and then grabbing the parent of the individual character - which should theoretically be the whole Chinese word) - the mapping works fine - if if the onboundary character index hits all of the characters - but it doesn't.



I'm wondering if there's a way to either extend Speech Synthesis so that I can grab the correct character to read, or possibly to use SSML - supposedly SSML documents are supported and you can add mark tags (so theoretically instead of just a span, I could add a mark tag for every character)



https://developer.mozilla.org/en-US/docs/Web/Events/mark



I am pretty lost on this, and even while adding a ton of extra complexity, it seems like what I want to do is out of reach.



All I want to be able to do is highlight the whole Chinese word when an individual character is read, and for all the characters to hit the onboundary (or another) event.



 msg.onboundary = function (event) {
$('.selected-text').removeAttr('style');
highlight(event.charIndex);
};

function highlight(index) {
$('.traditional .ind-char, .all .ind-char').eq(index).parent().css('color', '#FFF').css('background-color', '#337ab7').addClass('selected-text');
}


My apologies if this is a little confusing - I am happy to elaborate on anything that is unclear - it's a fairly complex problem, and I may not (probably am not) going about it the right way.







javascript speech-synthesis






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 27 '18 at 21:58









Andrew Alexander

3,0602075135




3,0602075135












  • I have since discovered that SSML works, however there is a bug that prevents correct processing on MacOS.
    – Andrew Alexander
    Dec 27 '18 at 23:02


















  • I have since discovered that SSML works, however there is a bug that prevents correct processing on MacOS.
    – Andrew Alexander
    Dec 27 '18 at 23:02
















I have since discovered that SSML works, however there is a bug that prevents correct processing on MacOS.
– Andrew Alexander
Dec 27 '18 at 23:02




I have since discovered that SSML works, however there is a bug that prevents correct processing on MacOS.
– Andrew Alexander
Dec 27 '18 at 23:02












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53951272%2ftracking-character-boundary-using-speech-synthesis%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53951272%2ftracking-character-boundary-using-speech-synthesis%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas