CHAPTER 4

LEARNING RESEARCH: MECHANISMS OF TRANSFER

The main function of a corpus-based lexical tutor is to present new words in multiple contexts. The goal of the tutor is to help learners gain transferable knowledge of new words, as signaled by their ability to comprehend the words in novel contexts. But what is the connection between multiple contexts and novel contexts?

A good deal of instruction is designed without a clear theory of the learning mechanisms involved. Such instruction may be successful, but even so it is an example of what Brown, Bransford, Ferrara, and Campione (1983) call "blind training." The case for a corpus tutor thus far is a blind argument, since it is based only on a putative analogy with reading. The logic is, words appear in numerous contexts in reading, reading produces rich word knowledge, therefore numerous contexts produce rich word knowledge. But any number of things about reading other than numerous contexts might be the cause of rich learning. A word may simply have to be seen several times to be remembered, not necessarily in any context; or several times in the same context; or several times separated by learning appropriate intervals; or in just one personally meaningful context which may take several instances to locate; and so on. If any of these were the main source of rich learning, then the road to efficiency might be something other than a concordance program that can assemble a random collection of contexts.

However, the notion that transferable learning takes place through meeting the to-be-learned material in a variety of contexts has been extensively validated in psychological and cognitive science studies.

What is transfer?

A common idea of the relationship between multi-contextuality and transfer to novel contexts is that the more variants of a task met in training, the greater the likelihood that any future variant will be one previously met. While this is no doubt true in cases, the claim for multi-contextuality goes beyond increasing the probability of a surface match. The claim is that training will transfer to truly novel tasks.

Schmidt and Bjork (1992) cite several experimental demonstrations of how transfer works. One involves teaching children to play "beanbag," a game that involves throwing a beanbag through a hole in the wall. One group was trained on a single version of the task, with the hole always 3 feet away from them, while the other was trained with the distance randomly varied between 2 and 4 feet. The random group took longer to reach criterion performance, but then were significantly more accurate in their shots, not only at their training distances but also at novel distances they were never trained for, including their competitors' 3-foot distance.

Another demonstration of the transfer power of contexts is a set of studies performed by Gick and Holyoak (1980, 1983) on "analogical problem solving" in which subjects learned a problem solution and then tried to solve a novel but analogous problem. They read about a general who wanted to capture a fortress; all roads to the fortress were mined, but the mines would not be detonated by small groups of soldiers, so the general broke his army into small groups to converge on the fortress from different angles. With this story in memory, subjects were presented with the problem of a surgeon who wanted to apply radiation to a tumor in an organ but feared the organ would be damaged. The solution was to apply several smaller doses from different angles, but subjects were surprisingly poor at seeing it. However, in a second experiment they were given two analogous problem-solutions before being asked to solve a third-the soldier story plus one about Red Adair putting out an oil fire, not having a hose powerful enough to reach the fire, and using several small hoses from different directions. With two solutions behind them, subjects easily transferred the solution to a third.

Two-for-transfer seems to be a replicable law of learning. These studies tell us what transfer means, but what is the mechanism?

Schema induction

Both Gick and Holyoak and Schmidt and Bjork explain the two-for-transfer effect in terms of a cognitive process called "schema induction" (Schmidt's version of the theory is discussed in Shapiro and Schmidt, 1982). A schema is simply the elements shared between two or more related concepts, situations, or motor activities seen apart from their unshared elements. Without a minimum of two concepts etc, no schema is induced because no elements are shared. For example, a divide- and-converge schema is common to the radiation doses, soldier groups, and hoses, but the radiation, soldiers, and hoses themselves are surface elements irrelevant to the schema. The surface elements are transformed into variables, possibly with default values.

A schema is thus smaller than any of the concrete surface configurations it participates in, and so "fits" a larger number of novel situations (just as a smaller car fits more parking spaces). A match is more likely with fewer features to match (echoing Thorndike's [1923] common elements theory of transfer). And of course schema induction proceeds with further experience of analogous examples, possibly becoming even smaller as even more common features show themselves really to be variables. Schema induction has been replicated many times, and successfully modeled as a computer program by among others Anderson (1983) and Hintzman (1986).

The relevance of schema induction to word-learning is clear. Suppose an intelligent Martian meets a dog and learns that it is called "dog"; her initial understanding of the concept is simply everything about that particular dog, its size, fur-length, colour, etc. If the first dog was a Dalmatian, then dogs are tall, short-haired, and white with spots. When a spaniel later appears and is also named "dog" by the Martian's hosts, she sees that size, fur-length and colour are not constants but rather variables of doghood, so that the core of invariance must be at a more abstract level, in whatever features the animals still have in common (such as meat-eating). Then, when a chihuahua is entered into the induction engine, a dog-concept comes out with a small body of semantic invariance and a long tail of variables. This schema should be quite adaptable to whatever dogs are met in the future-novel dogs. The point is, one instance would not have produced transferable knowledge because no abstraction process would have been initiated.

Schema induction and verbal learning

The dalmations and chihuahuas conveniently introduce an interesting point about learning words from examples. It appears that the broader the disparity between instances, the more flexible and transferable the schema induced. This is shown in a study by Nitsch (1978), used by both Schmidt and Bjork and Gick and Holyoak to elaborate their schema theories and extend them to verbal learning.

Nitsch tested subjects' ability to learn new vocabulary items from either a single example repeated several times, or several different examples. In line with the discussion above, she found that items could be learned faster and more easily from several repetitions of a single example (like the 3-foot beanbaggers) but with better transfer from encounters with several examples. But her finding went beyond that in an interesting way. She found that there was a further distinction between learning a word in several contexts, and learning a word not only in several contexts, but also in several different situations.

Two groups of subjects tried to infer the meaning of the word "crinch" (roughly meaning "offend") by meeting it in four context sentences. One group met the word in four contexts within the same situation, an incident in a restaurant (a waitress was "crinched" four times--when a diner failed to leave a tip, another argued about the prices, another knocked the ketchup on the floor, and another complained about the service). The other group also met "crinch" in four contexts, but four contexts that were also four different situations (churchgoers were crinched by a cowboy not removing his hat in church; spectators at a dog race when a man jumped on his seat and blocked their view; an antiques dealer when a customer flicked ashes on an antique chest; and a waitress when a diner complained). The outcome was that the greater the disparity in situations, the greater the transferability of word knowledge. But the mechanism is once again the same, the more disparate the contexts, the smaller the core of common features.

Variability and corpora

A concordance accessing a large corpus replicates on demand the exact learning materials presented to the second group in Nitsch's experiment. A corpus contains many individual texts, where words are likely to be used in many different types of situations. By contrast, meeting words in natural texts is likely to mean meeting them within a smaller number of situations. In this way, concordance may not only simulate vocabulary acquisition from natural reading, but improve upon it.

Schemas and prototypes

Schema induction entails a view of word learning rather different from some others, for example definitional learning. According to induction theory, concept meaning is dynamic, with semantic cores shrinking over time, more and more features revealed as variables, and semantic boundaries increasingly "fuzzy." According to definition theory, concept boundaries are all-or-none and fixed. These issues are reviewed in Smith and Medin (1981) and Lakoff (1987) under the heading classic vs prototype theories of concept meaning.

Briefly, classical theory is definitional theory (originating in Aristotle), and prototype theory is the theory that concepts consist of a very small core feature-set (down to none in the view of Wittgenstein, 1958) and a very large variable-set. Prototype theory is so called because it allows for conceptual gradation depending on the number of variables set to default--there are "good" or prototype birds (robins) or fruits (apples) as well as "less good" ones (ostriches and figs). The psychological evidence is almost exclusively in favour of prototype theory (Rosch and Mervis, 1975; Rosch, 1978) as well as the philosophical (Fodor, Garrett, Walker and Parkes, 1980). Armstrong, Gleitman, and Gleitman (1983) wrote that "it is widely agreed today in philosophy, linguistics, and psychology, that the definitional program for everyday categories has been defeated (p. 268)."

And yet for some reason prototype theory does not find its way into vocabulary instruction. Anderson and Nagy (1991) complain that word-learning theory and practice carry on with a mainly definitional basis ignoring the research findings about its inadequacies. Aitchison (1992) makes a similar observation in a second-language context:

Prototype theory has been regarded as a minor revolution within cognitive psychology. Yet so far, its findings have barely been considered within applied linguistics, even though they are likely to have important consequences for vocabulary teaching and learning (p. 71).

Perhaps a pedagogy of prototypes is not simple to work out. A first draft, following some ideas from Carrell (1988), might be that if words have fixed and independent meanings, then learning words is learning these meanings, presumably from dictionaries, and reading is assembling these fixed meanings into texts. Or, if words have small cores and many variables and optionals, then learning words is a bit about learning meanings and a lot about learning the many ways meanings can be instantiated in texts. In other words, a prototypes approach to vocabulary acquisition is the massive reading approach discussed in Chapter 3, already described as impractical for second language, and a possible reason that prototypes theory has barely been considered.

Conclusion

The mechanisms by which multiple contexts produce transferable, decontextualized knowledge are well known, and a corpus tutor is particularly well suited to exploiting this knowledge in a principled way. Further, the consideration of transfer mechanisms leads to a prototypes model of word meaning, which would be compatible with a corpus approach to vocabulary acquisition especially in a second language. Once again, these considerations suggest design parameters for a corpus-based tutor, as will the review of some other approaches to lexical tutoring in the next chapter.



contents

 top

 next