Predicting verb subcategorization from the semantic context preceding the verb

Douglas Roland
University of Colorado, Boulder

rolandd@babel.colorado.edu

 

A key problem for the field of sentence processing is extending our models of syntactic comprehension to deal with semantic issues, and the interaction between syntax and semantics.  Most current models of sentence processing that deal with the role of subcategorization in comprehension treat all senses of a verb as having the same subcategorization biases.  However, recent experimental and corpus-based work has shown that different senses of verbs have different subcategorization probabilities.  For example, Roland and Jurafsky (1998, 2000, in press) showed that verb sense differences are a significant cause of cross corpus subcategorization variation.  Hare, McRae, & Elman (2000, 2001) showed that the semantic context preceding the verb affects parsing decisions.  These results suggest that models of sentence processing need to incorporate verb sense and subcategorization information, but leave open the question of how to integrate this information into models of sentence processing.

This paper presents a computational model for predicting verb subcategorization based on the semantic context preceding the verb.  The context preceding the target verb is compared with the contexts preceding previously experienced examples of the same verb.  The subcategorization predictions are based on a combination of the prior probabilities of the subcategorizations and the subcategorizations of the most semantically similar previous examples of the verb, as measured by Latent Semantic Analysis.

This model is evaluated in three ways.  First, the model is shown to make the same subcategorization predictions as humans given the same contextual information.  The model made the same direct object vs. sentential complement predictions as the subjects in Hare et al. (2001).  However, these bias contexts were artificially created, so the model was also tested on predicting verb subcategorization based on naturally occurring contexts preceding corpus examples of the verb.  The model was also successful at this task, indicating that semantic context is a valid cue to verb subcategorization in a larger set of naturally occurring data.  Finally, because these results could be achieved by a model capable of only making broad sense distinctions, we also demonstrate that the model is capable of predicting verb subcategorization based on much more subtle distinctions in verb sense.

The success of this model provides further evidence of the need to incorporate verb sense and semantic information into models of sentence processing.  Additionally, the algorithm used in the model provides a method for integrating such information into processing models without the need for explicitly defining verb senses.

 

References

Hare, M., Elman, J., & McRae, K. (2000).  Sense and structure: Meaning as a determinant of verb categorization preferences.  Presentation at CUNY Sentence Processing Conference. La Jolla, California.

Hare, M., Elman, J., & McRae, K. (2001).  Sense and structure: Meaning as a determinant of verb categorization preferences.  Manuscript submitted for publication.

Roland, D., & Jurafsky, D. (1998).  How verb subcategorization frequencies are affected by corpus choice.  Proceedings of COLING-ACL 1998 (pp. 1117-1121), Montreal, Canada.

Roland, D., & Jurafsky, D. (in press).  Verb sense and verb subcategorization probabilities.  In P. Merlo & S. Stevenson (Eds.), The Lexical Basis of Sentence Processing: Formal, Computational, and Experimental Issues.  John Benjamins.

Roland, D., Jurafsky, D., Menn, L., Gahl, S., Elder, E., & Riddoch, C. (2000).  Verb subcategorization frequency differences between business-news and balanced corpora: The role of verb sense.  Proceedings of the Workshop on Comparing Corpora (pp. 28-34), Hong Kong, October 2000.