The role of perception
"He saw everything!" is invariably the complaint of the chess player who loses a game. Other variants to this lament are: "I completely missed (seeing) his move" or "How could I overlook that move?" It is no accident that the operation "seeing" is an element in all those statements. In the final analysis, perception seems to be the key to skill in chess.
It is not usually the case that one player calculates so many variations that he generates the correct one where his opponent, who has searched less completely, does not. As is obvious from the previous analysis, both players are restricted to looking at a mere handful of possible positions. The difference between two players is usually that one looks at the promising moves, and the other spends his time going down blind alleys. This, in a nutshell, is what de Groot discovered in his research into the determinants of skill in chess in the early 1930's and 1940's.
At first de Groot tried to determine why some players were better than others by examining their thought processes. To do this, he showed chess players unfamiliar positions and told them to choose a move and think out loud while doing so. He recorded their verbal statements by hand and attempted to estimate their statistics of search. His subjects ranged from the world's best chess players—Grandmasters like Alekhine, Keres, Euwe—to club players who would be rated as Class A to Class C on the USCF rating scale.4 Table 2.1, derived from de Groot, gives the results for position A (see Figure 2.1) by five Grandmasters and five experts.
The results are rather surprising and serve as a convincing refutation for the myths mentioned earlier. The only measure which clearly differentiates the Grandmasters from the experts is the one giving the estimated value of the chosen move. Four of the five Grandmasters chose the objectively correct move. None of the experts picked that move. Furthermore, as de Groot shows elsewhere [31, p. 128], all of the Grandmasters mentioned the correct move at some time in their analysis. Only 2 of the 5 experts ever mentioned the correct move in their analysis.
4 Arpad Elo [361 has derived a method for rating players by comparing their performance against each other in tournaments. In the USCF (United States Chess Federation) system, the standard deviation is 200 rating points and the scale runs as follows: Class E: below 1200, Class D: 1200-1399, Class C: 1400-1599, Class B: 1600-1799, Class A: 1800-1999, Expert: 2000-2199, Master: 2200 and above. Titles such as international Grandmaster (approximately 2500) are awarded by FIDE, the world chess body.
TABLE 2.1 Choose a move task
Position A (After de Groot [31])
Protocol structural variables a (Standard error)
Mean (Standard error)
1. Time to decision (minutes)
2. Number of fresh starts
3. Number of different first moves
4. Maximal depth calculated (plies)
5. Total number of move considered
8. Value of chosen move (best move rated 9)
Figure 2.1 De Groot's position A with White to play. The best move is BxN/5!
It is also enlightening to observe that the average depth of search, for Grandmaster and expert alike, has shrunk from mythical levels such as 40 plies to a mere 7 plies. Also, the "hundreds of continuations supposedly explored has dropped to an average of 3b moves in total (the range being 20-76 for Grandmasters).
Quite clearly, there was something about position A that attracted the Grandmasters' attention to the correct move, but not the attention of the experts. An argument can be advanced that the two groups perceived the position quite differently.
The role of perception became much better defined when de Groot modified a task used in an early Russian investigation [33]. He again permitted his subjects to look at a chess position taken from an unfamiliar game but limited the exposure to a few seconds. Then, after the players had reflected upon and organized what they had seen, they were encouraged to recall the position. Stronger players did so by calling out the positions of the pieces verbally. Weaker players were permitted to place pieces on an empty board.
The results of this experiment were striking. Grandmasters and masters recalled almost all the pieces correctly—approximately 93 percent of 22 pieces. Experts recalled 72 percent correctly, and class players recalled about 51 percent correctly.5 William Chase and Herbert Simon [22, 23], at Carnegie-Mellon University, replicated and extended this finding by adding a novice to their group of subjects and carrying out a valuable control procedure. Their master scored 81 percent, the A player 49 percent, and the novice 33 percent with a 5 second exposure to "quiet" middle game positions taken from master games. But when randomized (scrambled) versions of chess positions were shown for 5 seconds, everybody, including the master, recalled only 3 or 4 pieces correctly—fewer pieces than were recalled by the novice with structured positions.
Chase and Simon were thus able to conclude that the superior performance of the master was not due to extraordinary visual memory capacity, but rather to a chess-specific capacity. This conclusion parallels earlier ones by Djakow, Petrovsky, and Rudik [33] and de Groot [31]. These studies in perception suggest it is not the thought process but the perceptual process that seems to differentiate chess players according to skill level.
Because recall ability seems to be the most sensitive measure related to chess playing ability, it becomes important to discover how players perform the recall task. Simon and Gilmartin [85] have produced a model of the task that is embodied in a computer simulation called MAPP.
In this model, Simon and Gilmartin make a series of assumptions about the information processing capacities of humans. They assume, on the basis of much psychological research, that people possess two memory systems: short-term memory (STM) and long-term memory (LTM).8
Human memory
Short-term memory apparently has a limited capacity. This memory system represents what you are currently aware of, and it is therefore sometimes referred to as working memory or immediate memory. The limited capacity aspect of STM is best illustrated by the difficulty people have remembering a new, unfamiliar telephone number which they have just looked up. Unless they rehearse the 7 digits, they are likely to forget them before they can dial the number. Furthermore if people are given 2 such numbers in quick succession they have virtually no chance of recalling both of them accurately.
This observation does not prove that the capacity of STM is 7 digits. If the two phone numbers are familiar ones (e.g., home and office), most people have no difficulty at all remembering them, but they do run into
5 De Groot used a rather complicated scoring procedure whereby he awarded points for pieces placed correctly, and as well, for remembering spatial relations between pieces and material balance. He subtracted points for misplacing, adding or omitting pieces, as well as for interchanging pieces, shifting them over one file, and being uncertain. (See [31, p. 223].)
6 There is also good evidence for the existence of a very short-term memory. This memory holds information sent from the sensory system for about Ya of a second.
trouble if they are given 7 familiar phone numbers. From various experiments, George Miller [72] concluded that the capacity of STM is 7 chunks, plus or minus two. A chunk is a psychological unit—a familiar unit to the person. It is a shorthand code which can later be decoded to recover all the information it represents. It is essentially a label or symbol which designates or points to specific information in long-term memory.
Short-term memory is sometimes conceptualized as a storage bin with seven locations or slots. The organization of STM is temporal. Items can be entered serially as each is attended to. Once all seven slots have been filled, however, something must be lost if new information is to be processed. Usually the oldest item is replaced by the newest. Fortunately we have the ability to rehearse the contents of STM and the capacity to recode information and store it in a more permanent memory: LTM. Storing information in LTM takes time. Simon [83] has estimated that it takes about 5-10 seconds per chunk. Because of its small capacity and relatively long transfer time, STM is the major bottleneck in our ability to process information.
Long-term memory is the permanent storehouse of all that we know. It is virtually unlimited in capacity—barring certain neurological disorders, we can continue learning until we die and never exhaust our storage capacity. Items such as your phone number, your name, and the alphabet are stored in LTM. Information that is used frequently or which is intentionally practised or recoded resides permanently in LTM. Information in LTM is used to interpret and restructure information transmitted from the senses to STM.
A fascinating property of LTM is its organization. Information in LTM seems to be highly interconnected. This property contrasts sharply with most computer memory systems, where information is said to be location addressable. That is, one piece of information is linked to another by a location tag: e.g., "go from this cell (300) and get the contents of cell 385." Barring the existence of other pointers to cell 385, the only other way to retrieve its contents is via an exhaustive search of all memory cells —a highly inefficient way to retrieve information. The only practical way to get that information is to start from cell 300. What happens if the entry cue is changed slightly such that it no longer activates cell 300? The information in cell 385 cannot be retrieved.
Human LTM appears to be content addressable. That is, functionally similar items seem to be filed in the same location.7 Items may be grouped together on the basis of semantic similarity (meaning), phonetic similarity (sound) or other categorizations. An example of successful retrieval based on semantic similarity occurs for most English speakers when they are
7 Location is not meant to refer to an exact place in the brain. Memory capacity is apparently diffused throughout the brain, with some exceptions: e.g., language information which, for most people, is present primarily on the left side of the brain. Location thus refers to functional location of information—which may mean a series of neural pathways not occupying the same physical location.
asked for synonyms for "speedy." In short order people often report back words like "quick," "rapid," "swift," "fast." Your ability to retrieve information via phonetic cues can be demonstrated when you are asked to name words which rhyme with "bat." Words and concepts are not cross-listed under every possible category, however. For instance, most people have considerable difficulty in responding quickly to the demand "name words whose fourth letter is 'a.' "
When the memory system is confronted with a cue which is not directly associated with the desired information, it can often search the general area where the information is stored. This type of search is really a form of problem solving. Thus there are many possible paths to get from one piece of information to another, even when the items were never activated together before: e.g., name a type of dog that rhymes with "folly."8
Despite the highly connected structure of LTM, there are many occasions when information may jiot be accessible. There is a distinction between inaccessibility and absence. This distinction often underlies the difference in sensitivity between recall and recognition. "Who was Lyndon Johnson's Vice President?" Although many readers may not know the answer immediately, they may "know" that they know the answer and this knowledge may induce them to do a prolonged memory search. If the question were rephrased as a recognition task—Was Nelson Rockefeller, Richard Nixon, Hubert Humphrey, or Henry Kissinger, Lyndon Johnson's Vice President?—many more readers will select the correct answer, thereby demonstrating that the information was present in memory.
On the other hand, people often know that they do not know an answer and refuse to do much searching at all: "What is the first name of this writer's wife?" The reader, on analyzing the question might have gone to the location in memory where this information, if it existed, would be found and discovered that there was nothing present under that category.
Obviously, the more information there is stored in LTM, the more things can be recognized and labeled, provided that the information is stored in an organized fashion. Simon and Gilmartin postulated in their model that better chess players have more chess patterns (which they can thus recognize) stored in LTM than poorer players. In the short time available for looking at a chess position in the de Groot task (5 seconds), they assume that everybody is restricted to storing 7 chunks in STM.9
8 One way this problem might be solved is by a generate and test method. Generate a new letter for the "f" and then test the resulting sound pattern to see if it forms a word in the dog category. See Newell and Simon [75] for an excellent discussion of methods of problem solving.
9 Given the previous estimate of 5-10 seconds per chunk fixation time, this assumption is quite reasonable. The main problem with this parameter is that it was estimated from experiments in learning verbal materials. Charness [20] has obtained evidence suggesting that virtually all the information extracted during a 5-second exposure to a chess position is stored in LTM. This issue has not been satisfactorily resolved yet.
The key to the mystery of performance in the 5-second task is provided when you consider that the master's average chunk size may be 3-4 or more pieces, e.g., the castled king position in Figure 2.1, whereas the novice's chunk is a single piece on a square. Thus 7 X 3 = 21 pieces for the master, but 7x1 = 7 pieces for the novice. The idea that everyone has the same-sized STM is given further support by the condition of presenting scrambled chess positions. In that situation there are no longer any recognizable patterns larger than single pieces for the master, and he sinks to the level of the novice.
An analogy to the chess board situation for master versus amateur is the case of a child learning to read versus an adult who is already a skilled reader. When the child looks at this page he "sees" a page filled with letters which he must slowly and effortfully recombine and read as words. The adult, however, quickly and effortlessly "sees" the page as a series of words and possibly phrases (which may then have to be effortfully recom-bined into sentences compatible with his current knowledge about human chess skill). Both the adult and the child look at the same page but they produce very different encodings or descriptions of it, based on the size of the pattern they can use for effortless recognition.
The specific model developed by Simon and Gilmartin is considerably more complex than is outlined here—it describes how patterns are originally learned and later recognized, which pieces will be attended to, how the patterns are reproduced—but the reader can* easily perceive its explanatory power.
The simulation does a very good job of imitating strong and weak players by simply altering the number of patterns which are stored in LTM. Simon and Gilmartin estimated the size of the vocabulary of patterns that a master would theoretically need to perform the recall task as well as he does. They arrived at an estimate of about 50,000 patterns—which is roughly the same size as the vocabulary of recognizable words for an adult speaker of English.
Does this research mean that all one needs to do to become a Grandmaster is to sit down and memorize 50,000 chess patterns? I can hazard a guess that if you attempted to do this you would undoubtedly perform as well as the Grandmaster on the recall task, but your chess play would hardly improve. How then is it possible to link up this vast knowledge of patterns with chess skill?
Chase and Simon have suggested that the correlation between perceptual skill and chess skill can be accounted for by assuming that many chess patterns are directly associated with appropriate plausible moves. That is, when the master looks at a chess position his recognition of familiar configurations of pieces triggers certain "productions" into action. A production is a behavioral unit which has two components: a condition side and an action side. It can be modeled by a statement of the form: if condition X exists, do action Y. A chess production might be: if pattern X, consider move (plan) Y; or, more concretely: if there is an open file, consider moving a rook to it. This can be illustrated by way of a more complex example. Most skilled chess players will recognize the smothered mate position illustrated in Figure 2.2. White, if on move, can mate in 4 moves (or less) via (1) N-B7 ch., K-Nl; (2) N-R6 dbl. ch., K-Rl (if K-Bl, Q-B7 mate); (3) Q-N8 ch., RxQ; (4) N-B7 mate. Changing nonessential elements of the position (moving the black QRP to QR3) does not change the mate. Other changes, however, make a big difference: e.g., moving the black KRP to R3 or interchanging the black queen and rook.
Humans are probably sensitive to the critical features of such a position. They do not store a copy of each possible smothered mate position or each back row mate position, etc. They probably abstract more general descriptions. In the case of this type of smothered mate the features are probably propositions like: queen on open QR2-KN8 diagonal; knight capable of reaching KB7 in one move; opponent's king on KR1 hemmed in at KN2 and KR2. Probably the latter feature together with one of the former is sufficient to trigger the plan: try to reach a smothered mate position. One can come up with many other examples of such typical tactical plans which are part of every skilled player's repertoire.
Now there is a potential explanation for why, in position A (Figure 2.1), the Grandmasters all considered the correct move, but only a few experts did. There were probably certain features in the position which quite automatically, when recognized, elicited the appropriate move. Only those players who recognized the features and possessed the appropriate productions would generate the correct move for subsequent evaluation.
Parenthetically, it also becomes reasonable to speculate about questions like: why does a "highly intelligent" individual when playing chess, miss obvious moves? Moves are only "obvious" when the patterns thev soring from are recognized.10 One can also ask why masters do so well in
10 See Botvinnik's text [18] concerning Norbert Wiener, p. 61, for another explanation.
Figure 2.2 A smothered mate position with White to move.
simultaneous exhibitions,11 where they have only a few seconds to choose a move. If masters automatically\generate appropriate plausible moves, these moves will usually be good enough to beat all but the best players at the exhibition. A similar explanation will probably also suffice to explain a master's superiority at speed chess—where the entire game is played in less than 10 minutes. Indeed, if the perceptual process is so important for skilled play, it becomes possible to understand the idiosyncracies of a Bobby Fischer, who insists that the lighting in the tournament hall, the size of the board and the size of all the pieces, all be optimized.
Post a comment