Frederick Jelinek, Electrical and Computer Engineering Department, Johns Hopkins University, Title TBA [Joint meeting with the Natural Language and Speech Processing Colloqium (NLaSP)]
ABSTRACT:
Automatic Speech Recognition is based on several components: signal
processor, acoustic model, language model, and search. In this talk, we
explore the use of Random Forests (RFs) in language modeling, the problem
of predicting the next word based on words already seen. The goal is to
develop a new language model smoothing technique based on randomly grown
Decision Trees (DTs). This new technique is complementary to many of the
existing techniques dealing with data sparseness.
Random forests were studied by Breiman in the context of classification into
a relatively small number of classes. We study their application to n-gram
language modeling which could be thought of as classification into a very
large number of classes. Unlike regular n-gram language models, RF language
models have the potential to generalize well to unseen data, even when
histories are long (>4). We show that our RF language models are superior
to regular n-gram language models in reducing both the perplexity (PPL) and
word error rate (WER) in a large vocabulary speech recognizer.
The new technique developed in this work is general. We will show that it
works well when combined with other techniques, including word clustering
and the structured language model (SLM).
BIO:
Professor Fred Jelinek is one of the world's pre-eminent speech recognition
scientists. His past work includes fundamental contributions to
information theory and coding. From 1972 to 1993 he headed the large
Continuous Speech Recognition group of the IBM T.J. Watson Research Center.
There he pioneered with his colleagues the statistical methods that are the
basis of current state-of-the art speech recognizers. Prof. Jelinek's
special interest is language modeling, that is, the prediction of future
words given preceding text or speech. He is also interested in novel
methods of automatic parsing, of text understanding, and of machine
translation.