The information conveyed by words in sentences

The information conveyed by words in sentences

John Hale
Johns Hopkins University

hale@cogsci.jhu.edu

Locality in various forms (Kimball, 1973; Frazier,1978; Gibson, 1998) has been advanced as an important determinant of reading time patterns in both ambiguous and unambiguous sentences. Often the conceptual basis of locality is described in terms of reactivation of linguistic representations in short-term memory.

This talk presents an alternative basis for such reading time patterns in terms of the amount of information processed. In particular, a method is presented for calculating the amount of information received by a hearer from a speaker producing a sentence generated by a grammar known to both parties. This value is a natural explanation for reading time from the cognitivist perspective on reading as an information-processing task.

The method identifies each intermediate state of a left-to-right parser with the set of partial derivations that generate the prefix observed at that state. These partial derivations look like trees with ``unexpanded'' nonterminal symbols at some of their leaves. In the case of a top-down parser with a non-left-recursive grammar, the set of such partial parse trees is finite since the set of leftmost derivations is finite. This entails no loss of generality since various methods are available for removing left recursion. Then, applying foundational work of Grenander (1967), one computes the conditional entropy of the start symbol given the prefix, taking the set of partial derivations as the set of possible derivations. Unexpanded nonterminals are assigned their expected entropy. Any tree structure that is completely certain contributes zero entropy. In this way the conditional entropy at each word is obtained. Subtracting these gives the information conveyed. If semantic rules mirror syntactic rules in a one-to-one fashion (as proposed, e.g., by Steedman, 2000), then the information-processing work performed by a completely efficient sentence comprehender is given by this method.

This method differs from other ways of measuring information content (e.g., Shannnon, 1951) in that the model of grammar assumed is one that has context-free derivations, rather than just n-grams.

Using explicit grammars for English stimuli, the method is evaluated against experimentally observed reading times on a variety of linguistic constructions, including the data collected by Grodner et al. (2000) on subject and object relative clauses.

References

Frazier, Lyn. (1978). On Comprehending Sentences: Syntactic Parsing Strategies. Ph.D. dissertation, University of Massachusetts, Amherst, MA.

Gibson, Edward. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68:1-76.

Grenander, Ulf (1967). Syntax-Controlled Probabilities. Technical report, Brown University Division of Applied Mathematics.

Grodner, Daniel, Watson, Duane, & Gibson, Edward (2000). Locality effects on sentence processing. Talk presented at CUNY 2000.

Kimball, John (1973). Seven principles of surface structure parsing in natural language. Cognition, 2:15-47.

Shannon, Claude (1951). Prediction and entropy of printed English. Bell System Technical Journal, 30:50-64.

Steedman, Mark (2000). The Syntactic Process. MIT Press.