Operant Conditioning and Behaviorism - an historical outline

Around the turn of the century, Edward Thorndike attempted to develop an objective experimental method for the mechanical problem solving ability of cats and dogs. Thorndike devised a number of wooden crates which required various combinations of latches, levers, strings and treadles to open them. A dog or a cat would be put in one of these 'puzzle-boxes' and, sooner or later would manage to escape from it. Thorndike's initial aim was to show that the anecdotal achievements of cats and dogs could be replicated in controlled, standardised circumstance, however, he soon realised that he could now measure animal intelligence using this equipment. His method was to set an animal the same task repeatedly, each time measuring the time it took to solve it. Thorndike could then compare these 'learning-curves' (see figure below) across different situations and different species. 

Thorndike learning curve 

Thorndike was particularly interested in discovering whether his animals could learn their tasks through imitation or observation. He compared the learning curves of cats who had been given the opportunity of observing others escaping from a box with those who had never seen the box being solved and found no difference in their rate of learning. He obtained the same null result with dogs and, even when he showed the animals the methods of opening a box by placing their paws on the appropriate levers and so on, he found no improvement. He fell back on a much simpler trial and error explanation of learning. Occasionally, quite by chance, an animal performs an action which frees it from the box. When the animal finds itself in the same position again it is more likely to perform the same action again. The reward of being freed from the box somehow strengthens an association between a stimulus, being in a certain position in the box, and an appropriate action. Reward acts to strengthen stimulus-response associations. The animal learns to solve the puzzle-box not by reflecting on possible actions and really puzzling its way out of it but by a quite mechanical development of actions originally made by chance. By 1910 Thorndike had formalised this notion into a 'law' of psychology - the law of effect. In full it reads: "Of several responses made to the same situation those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections to the situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond." Thorndike maintained that, in combination with the law of exercise, the notion that associations are strengthen by use and weakened with disuse, and the concept of instinct, the law of effect could explain all of human behavior in terms of the development of myriads of stimulus-response associations. It is worth briefly comparing trial and error learning with classical conditioning. In classical conditioning a neutral stimulus becomes association with part of a reflex (either the US or the UR). In trial and error learning no reflex is involved. A reinforcing or punishing event (a type of stimulus) alters the strength of association between a neutral stimulus and quite arbitrary response. The response is not to any part of a reflex. 

The behaviorist position that human behavior could be explained entirely in terms of reflexes, stimulus-response associations, and the effects of reinforcers upon them entirely excluding 'mental' terms like desires, goals and so on was taken up by John Broadhus Watson in his 1914 book 'Behavior: An Introduction to Comparative Psychology.'. Watson had also been involved in the introduction of the most favoured subject in comparative psychology - the laboratory rat. One of his early jobs which he used to fund his Ph.D. was as a caretaker, one of whose duties was to look after laboratory rats used in studies intended to mimic 'real-life' learning tasks such as navigating complex mazes. Watson became adept at taming rats and found he could train rats to open a puzzle-box like Thorndike's for a small food-reward. He also studied maze-learning but simplified the task dramatically. One type of maze is simply a long straight alley with food at the end. Watson found that once the animal was well trained at running this 'maze' it did so almost automatically. Once started by the stimulus of the maze its behavior becomes a series of associations between movements (or their kinaesthetic consequences) rather than stimuli in the outside world. This is made plain by shortening the alleyway - the well-trained rats now run straight into the end wall. This was known as the kerplunk experiment. The development of well-controlled behavioral techniques by Watson also allowed him to explore animals sensory abilities, for example their abilities to discriminate between similar stimuli, experimentally. Watson's theoretical position was even more extreme than Thorndike's - he would have no place for mentalistic concepts like pleasure or distress in his explanations of behavior. He essentially rejected the law of effect, denying that pleasure or discomfort caused stimulus-response associations to be learned. For Watson, all that was important was the frequency of occurrence of stimulus-response pairings. Reinforcers might cause some responses to occur more often in the presence of particular stimuli, but they did not act directly to cause their learning. Watson could therefore reject the notion that some mental traces of stimuli and responses needed to be retained in an animals mind until a reinforcer caused an association between them to be strengthened, which is a rather mentalistic consequence of the law of effect. 

Publishing his second book 'Psychology from the Standpoint of a Behaviorist' in 1919, Watson became the founder of the american school of behaviorism. His rejection of mentalism was total. He felt that thought was explicable as subvocalisation and that speech was simply another behavior which might be learned by the law-of effect. In 'Psychology from the Standpoint of a Behaviorist' he addresses a number of practical human problems such as education, the development of emotional reaction and the effects of factors like alcohol or drugs on human performance. He even suggests that thought processes might be investigated by monitoring movements in the larynx. Watson believed that mental illness was the result of 'habit distortion' which might be caused by fortuitous learning of inappropriate associations which then go on to influence a person's behavior so that it become ever more abnormal. Watson tested part of this hypothesis on a baby in the hospital in which he worked. The baby, 'little Albert', apparently showed no particular fears or phobias about anything apart from sudden loud sounds. For example, when Watson placed a tame white rat in little Albert's lap the child happily played with the animal. On a subsequent occasion Watson placed the rat in Albert's lap and his assistant made a loud noise by striking a large steel bar directly behind Albert's head. One week later Albert was subjected to the same experience. After this, when Albert was showed the rat be began to fret, appearing anxious. Similar reactions were produced by other furry objects (a fur coat). Watson was keen to use this as evidence for the behavioral basis of phobias, however, apparently Albert's reactions to the rat were quite mild. Nevertheless, one of the most widespread applications of conditioning has been in the treatment of phobias and other behavior problems and the case of Little Albert is often cited as the first experiment in this field. 

In the 1920's behaviorism began to wane in popularity somewhat. A number of studies in the Berkeley laboratory of Edward Tolman appeared both to show flaws in the law of effect and require mental representations in their explanation. For example, rats were allowed to explore a maze in which there were three routes of different lengths between the starting position and the goal. The rats behavior when the maze was blocked implied that they must have some sort of mental map of the maze. The rats prefer the routes according to their shortness, so, when the maze is blocked at point A, stopping them using the shortest route, they will choose the second shortest route. When, however, the maze is blocked at point B the rats does not retrace his steps and use route 2, which would be predicted according to the law of effect, but rather uses route 3 . The rat must be recognising that block B will stop him using route 2 by using some memory of the layout of the maze. Tolman's group also showed that animals could use knowledge they gained learning a maze by running to navigate it swimming and that unexpected changes in the quality of reward could weaken learning even though the animal was still rewarded. This result was developed further by Crespi who, in 1942, showed that unexpected decreases in reward quantity caused rats temporarily to run a maze more slowly than normal while unexpected increases caused a temporary elevation in running speed. 

Tolman's maze experiment 

At the same time as this work was appearing in the USA the Polish psychologists Konorski and Miller began the first cognitive analyses of classical conditioning - the forerunners of the work of Rescorla, Wagner, Dickinson and Mackintosh. In Germany Wolfgang Koehler was studying insight and observation as mechanisms of learning in Chimps. All work which was quite problematic for behaviorism. 

In 1938 Burrhus Friederich Skinner published what was arguably the most influential work on animal behavior of the century 'The Behavior of Organisms'. In the interim it had been shown that Tolman's results were sensitive to factors like the openness of his maze - if the rats could not see stimuli outside the maze they did not make appropriate choices when it was blocked, suggesting that they may have learned many stimulus response associations in different parts of the maze, perhaps in sequence, rather than having internalised a map of it. Skinner resurrected the law of effect in more starkly behavioral terms and provided a technology which allowed sequences of behavior produced over a long time to be studied objectively. His Skinner-Box was a great improvement on the individual learning trials of Watson and Thorndike. Skinner developed the basic concept of operant conditioning, claiming that this type of learning was not the result of stimulus-response learning - for Skinner the basic association in operant conditioning was between the operant response and the reinforcer, the discriminative stimulus served to signal when this association would be acted upon. 

This document was restructured from a lecture kindly provided by R.W.Kentridge