Book Review
Roland Hausser, Foundations of Computational Linguistics: Man-Machine
Communication in Natural Language. XII, 534pp. ISBN 3-540-66015-1.
Springer-Verlag, Berlin, 1999. DM 89.00; USD 54.00
Reviewed by Geoffrey Sampson,
University of Sussex
elsnet Dec 1999
This is a rather different book from what the word `Foundations'
in Hausser's title might lead one to expect. It is not a book
like Manning & Schütze's well-received `Foundations of Statistical
Natural Language Processing,' which surveys various basic concepts
and techniques that any newcomer to the field needs to master
before going deeper into some particular area. Hausser's book
is more personal. Hausser aims to expound a novel linguistic
theory of his own; although he explicitly contrasts this with
various strands of established computational linguistics where
these are relevant, many aspects of the latter are barely or not
at all mentioned.
Hausser's version of computational linguistics has several components,
but the core (at least as presented in this book) is a new theory of
grammar, "left-associative (LA) grammar". The fundamental insight
behind left-associative grammar is that people produce sentences
linearly, one word at a time, always retaining the option to continue
in any way compatible with the words already uttered, so it is
misleading to group words into phrase-structure trees whose nonterminal
labels imply that valid options are foreclosed as soon as a constituent
is begun. For Hausser, a sentence like `Mary took a break' divides
into `Mary took a' and `break'.
He defines left-associative grammar formally, and gives a proof that
its generative capacity is equivalent to the class of recursive
languages, before specifying constraints on the formalism which
reduce that capacity to something like the class of natural languages.
A leading merit of left-associative grammar is that parsing is handled
in the same set of mechanisms as generation.
Left-associative grammar is embedded within a wider theory of
language, the "SLIM theory", which in turn is related to a computational
model realized as a robot, "CURIOUS", which accepts and produces
descriptive utterances about coloured shapes scattered across a
two-dimensional surface. Although this is highly simplified, Hausser
suggests that a scenario of this sort can be seen as the "basic
prototype of natural communication". It is reminiscent of Terry
Winograd's SHRDLU system of the early 1970s, though Hausser describes
CURIOUS as an "open" and SHRUDLU as a "closed" system.
The idea of making linear processing fundamental is interesting, but
it raises questions, which are not really answered. Hausser recognizes
that linguists have always intuitively seen `a break' (and not `Mary
took a') as a syntagm within the example quoted earlier; where do
these feelings come from, if they do not correspond to grammatical
reality?
Hausser shows little sympathy with the 1990s trend towards making
computational linguistics engage with real-life corpus data. He has
one chapter discussing linguistic corpora, but the sole purpose seems
to be to argue that statistical word-tagging algorithms are not 100
per cent successful.
Alluding to the inventions of the aeroplane and space travel, Hausser
holds out the prospect that cutting loose from traditional nonlinear
computational linguistics and developing the subject in a new style
"will soon change everyday life more profoundly than any of the previous
achievements of science". Some readers may be sceptical
FOR INFORMATION
Geoffrey Sampson (geoffs@cogs.susx.ac.uk) is professor
in Computer Science & Artificial Intelligence at the
University of Sussex, UK, and member of the ELSNET
Executive Board.