| Re: For readers On Sat, 28 May 2005 19:45:57 -0700, "Alan"
<info@optioncity.REMOVETHIS.net> wrote:
>First, book author names, titles, isbns, etc are from large, well-organized
>*finite* collections of terms,
>essentially "books in print". Next, academic authors of preprints or
>journal articles are, in principle, collectible
>into similar finite collections.
I think you are misinterpreting the term "limited vocabulary," which
has a fairly specific meaning into speech-recognition field. Any
speech recognition system recognizes a vocabulary that is "limited" in
the sense of being finite. A limited-vocabulary system recognizes only
a few thousand words -- perhaps 30,000 at most, and the larger the
vocabulary, the poorer the performance. The practical limit for
speaker-independent recognition, without substantial user training,
and under somewhat adverse conditions, is much lower than that.
Books in Print (the American edition) is currently an eight-volume
set. I don't know how many distinct words and names are in the
listings, but I suspect that it far exceeds 20,000.
I think you are also underestimating the importance of word-sequence
clues, which I mentioned in an earlier message. Without these clues,
speech recognition is frankly just awful, even under conditions
approaching ideal. If you say an isolated word, like "a," there is no
telling whether the software will hear you correctly, or hear "any,"
"and," "AA," "they," or something even more far-fetched. Recognition
tends to be much better for longer words, like "parliamentary" or
"idiopathic," but that does not help with connecting words, or with
proper names, which tend to be highly irregular and homonymic. (Am I
Jonathan Sachs, Sachse, Sacks, or Saks, or if your pronunciation is
not perfect, Sack, Stack, or Satz? Without contextual information, the
software hasn't a clue.)
My email address is LLM041103 at earthlink dot net. |