Roland Hausser

Foundations of Computational Linguistics.
Man-Machine Communication in Natural Language.

Berlin, Heidelberg, New York. Springer 1999, xii + 534 pp.
ISBN 3-540-66015-1.

Reviewed by Petr Sgall

The book under review constitutes an extremely comprehensive treatment of the domain of computational linguistics - a domain that has been developing very quickly and has been extended to cover computerized approaches to most different instances of language and speech. The author understands the goal of computational linguistics in reproducing the natural transmission of information "by modelling the speaker's production and the hearer's interpretation on a suitable type of computer... (which) amounts to the construction of autonomous cognitive machines (robots) which can communicate freely in natural language" (p. 1). Before discussing what "freely" means here, let us have a look at the structure of this very thoroughly equipped book (which includes exercises to individual chapters, a bibliography of 18 large-format pages, a name index of 4 pp. and a subject index of 14 pp.).
The book is organized into four main parts, each of which comprises six chapters. In Part I, Theory of Language, the methods and applications of computational linguistics are first illustrated by the robot CURIOUS, the functioning of which instantiates the author's theory of Surface Compositional Linear Internal Matching (SLIM). SLIM contains a cognitive characterization of semantic primitives, as well as a theory of signs and a delineation of syntax, semantics and pragmatics with their integration in the linguistic interaction of the speaker and the hearer. Comments on the development of theories of language from G. Frege to C. E. Shannon and W. Weaver and from F. de Saussure to N. Chomsky are included. Among these, e.g. R. Hausser's remark on Frege's principle (according to which a meaning of a complex expression is a function of the meanings of its parts and of the mode of composition) claims Frege's formulation to be defendable if applied "to syntactically analyzed surfaces" (p. 79); however, it may be discussed whether such an analysis brings more than what is covered by "meanings" of the parts and by their "modes of composition".
Part II, Theory of Grammar, concentrates, first of all, on formal grammar and its role in the description of natural languages, pointing out that the time-linear nature of language can be captured by an LA-grammar, based on testing possible continuations, rather than possible substitutions. Natural languages, which parse in linear time, belong to the lowest complexity class of a hierarchy defined on this basis. Let us just remark that the author's characterization of reference (without a background anchored in psychology) might be seen as not fully exhausting the problem, although his illustrations (pp.111-114) are clear enough to point to the necessity to work with a notion such as a (multistratal) stock of shared information. Such a notion would also be useful in connection with his view of personal pronouns (including those of 3rd person) as indexical, rather than symbolic, which is well substantiated.
Two basic parts of grammar are then scrutinized in Part III, Morphology and Syntax, where, after specifying the basic notions and processes, the author characterizes the issues of automatic word form recognition and presents a procedure for a morphological analysis of English. Within the left-associative approach then the syntactic notions of valency and agreement are discussed, along with (syntactic functions of) word order. A small subsystem of English grammar is then formulated and is later extended, step by step, to cover e.g. cases of free word order (in comparison of English to German, Chapter IV, pp. 310ff), complex noun phrases, complex verbs (328ff), interrogatives, and so on.
In Part IV, Semantics and Pragmatics, the author compares the semantics of languages of three kinds - logical, programming and natural. Different types of semantics are then examined (from Tarskian logical semantics and systems accounting specifically for intensional contexts, propositional attitudes and vagueness), and their relationships to different ontologies are found to yield different empirical results. After having found that programming languages cannot be based on a metalanguage-dependent Tarski semantics, the author concludes that Tarski was right when claiming that a complete analysis of natural languages is in principle impossible within logical semantics (pp. 383ff). Tarski's claim was founded on his analysis of the classical Liar paradox, based on self-reference. However, here the word "complete" appears to be of specific significance. As the present reviewer discussed elsewhere (Sgall 1994), it is possible for theoretical linguistics to handle the issues connected with Liar's sentence so that the restrictions on the relationship between a sentence and a (Carnapian) proposition are reexamined. A systematic analysis of this problem from a viewpoint of intensional logic can be found in Tichy (1988:227-233).
The last two chapters of the book (23 and 24 in Part IV) show how the author surmounts the just mentioned difficulties of truth-conditional semantics by the means of his SLIM-based theory. These issues are illustrated here with the SLIM machines both in the role of an interpreter and a producer of messages. Their states of cognition are characterized as ten activation states, classified as recognition (contextual, commented, and language-controlled), action (contextual, language-controlled and commented), inference, interpretation, production of language and cognitive stillstand. Although the account of different cognitive layers is very rich, systematic and convincing, it still remains to be checked as for the guaranties of this classification, and thus also of the author's claim that these machines can communicate freely in natural language. If they are able to communicate as freely as human beings can (even if emotionally marked speech and issues of stylistics are not directly taken into consideration), then also issues of the psychological background of speech (with the contents of memory and its structure, and so on) should be systematically described.
However, computational linguistics and/or computer modelling of natural language, although offering extremely rich possibilities of analyzing and illustrating properties of language, still differs from theoretical linguistics itself and from a theory of the semantico-pragmatic intepretation of discourse. A possible handling of the Liar's paradox in the context of truth-conditional semantics was already mentioned. The related problem of "propositional" attitudes can be accounted for, in a similar context, if linguistic meaning (underlying structure) is distinguished not only from truth conditions, but also from intension (cf. Peregrin and Sgall 1998). Further difficulties connected with the application of Tarskian semantics to natural language might be overcome if the role of the topic-focus articulation (as based on contextual boundness or on the opposition between "given" and "new" information, cf. Hajiöov , Partee and Sgall 1998) is duly reflected as pertinent for linguistic meaning (rather than just for contextual combinability of sentences), and if the indistinctness of linguistic meaning is acknowledged as one of its basic properties, see e.g. Nov k (1993). Truth-conditional semantics certainly has to be relativized to different contexts, possible worlds or situations, as can be done if an enriched version of H. Kamp's Discourse Representation Theory is properly connected with an account of topic and focus. In such a way the Tarskian foundations of semantics, safely based on truth conditions, still can be useful for theoretical linguistics.
These remarks on the complex relationships between computational and theoretical linguistics cannot be understood as denying that the book under review in any case presents an extremely rich and highly useful system of a computer based approach to semantics, which allows for much more than just experiments concerning the role of computers in understanding and producing discourses. The very broad image of the structure of natural language, which Hausser's book offers in a pedagogically appropriate way, makes it possible to analyze most different layers of natural language in a way diversified enough to cover phenomena of any aspect of language structure and to check the results of these analyses in a fully effective way.


