Computerlinguistik Uni Erlangen-Nürnberg: Reviews R.R. Hausser

Book Review
Roland Hausser, Foundations of Computational Linguistics: Man-Machine Communication in Natural Language. XII, 534pp. ISBN 3-540-66015-1. Springer-Verlag, Berlin, 1999. DM 89.00; USD 54.00

Reviewed by Geoffrey Sampson,
University of Sussex elsnet Dec 1999

This is a rather different book from what the word `Foundations' in Hausser's title might lead one to expect. It is not a book like Manning & Schütze's well-received `Foundations of Statistical Natural Language Processing,' which surveys various basic concepts and techniques that any newcomer to the field needs to master before going deeper into some particular area. Hausser's book is more personal. Hausser aims to expound a novel linguistic theory of his own; although he explicitly contrasts this with various strands of established computational linguistics where these are relevant, many aspects of the latter are barely or not at all mentioned.

Hausser's version of computational linguistics has several components, but the core (at least as presented in this book) is a new theory of grammar, "left-associative (LA) grammar". The fundamental insight behind left-associative grammar is that people produce sentences linearly, one word at a time, always retaining the option to continue in any way compatible with the words already uttered, so it is misleading to group words into phrase-structure trees whose nonterminal labels imply that valid options are foreclosed as soon as a constituent is begun. For Hausser, a sentence like `Mary took a break' divides into `Mary took a' and `break'.

He defines left-associative grammar formally, and gives a proof that its generative capacity is equivalent to the class of recursive languages, before specifying constraints on the formalism which reduce that capacity to something like the class of natural languages. A leading merit of left-associative grammar is that parsing is handled in the same set of mechanisms as generation.

Left-associative grammar is embedded within a wider theory of language, the "SLIM theory", which in turn is related to a computational model realized as a robot, "CURIOUS", which accepts and produces descriptive utterances about coloured shapes scattered across a two-dimensional surface. Although this is highly simplified, Hausser suggests that a scenario of this sort can be seen as the "basic prototype of natural communication". It is reminiscent of Terry Winograd's SHRDLU system of the early 1970s, though Hausser describes CURIOUS as an "open" and SHRUDLU as a "closed" system.

The idea of making linear processing fundamental is interesting, but it raises questions, which are not really answered. Hausser recognizes that linguists have always intuitively seen `a break' (and not `Mary took a') as a syntagm within the example quoted earlier; where do these feelings come from, if they do not correspond to grammatical reality?

Hausser shows little sympathy with the 1990s trend towards making computational linguistics engage with real-life corpus data. He has one chapter discussing linguistic corpora, but the sole purpose seems to be to argue that statistical word-tagging algorithms are not 100 per cent successful.

Alluding to the inventions of the aeroplane and space travel, Hausser holds out the prospect that cutting loose from traditional nonlinear computational linguistics and developing the subject in a new style "will soon change everyday life more profoundly than any of the previous achievements of science". Some readers may be sceptical

FOR INFORMATION
Geoffrey Sampson (geoffs@cogs.susx.ac.uk) is professor in Computer Science & Artificial Intelligence at the University of Sussex, UK, and member of the ELSNET Executive Board.