MSW's Students (2007-2009), Christ College, Bangalore, India

Tuesday, January 8, 2008

Learning notes

LEARNING/CONDITIONING

Learning seems to be one process that many people take for granted (just assume it happens and happens basically the same way for most people) but know very little about.

So, how do we learn? How do other animals learn? Do we learn the same way? What are our limitations? Can we learn anything? Is there one right way to learn? To answer these questions, we need to first establish a definition of learning. Our definition is comprised of several different components:

The 4 Factors That Form The Definition of Learning:

1) learning is inferred from a change in behavior/performance*

2) learning results in an inferred change in memory

3) learning is the result of experience

4) learning is relatively permanent

This means that behavior changes that are temporary or due to things like drugs, alcohol, etc., are not "learned".

* Behavior Potential - once something is learned, an organism can exhibit a behavior that indicates learning as occurred. Thus, once a behavior has been "learned", it can be exhibited by "performance" of a corresponding behavior.

It is the combination of these 4 factors that make our definition of learning. Or, you can go with a slightly less comprehensive definition that is offered in many text books: Learning is a relatively durable change in behavior or knowledge that is due to experience.

We are going to discuss the two main types of learning examined by researchers, classical conditioning and operant conditioning.


I. Classical Conditioning

Classical Conditioning can be defined as a type of learning in which a stimulus acquires the capacity to evoke a reflexive response that was originally evoked by a different stimulus.

A. Ivan Pavlov - Russian physiologist interested in behavior (digestion).

1) Pavlov was studying salivation in dogs - he was measuring the amount of salivation produced by the salivary glands of dogs by presenting them meat powder through a food dispenser.

The dispenser would deliver the meat powder to which the animals salivated. However, what Pavlov noticed was that the food dispenser made a sound when delivering the powder, and that the dogs salivated before the powder was delivered. He realized that the dogs associated the sound (which occurred seconds before the powder actually arrived) with the delivery of the food. Thus, the dogs had "learned" that when the sound occurred, the meat powder was going to arrive.

This is conditioning (Stimulus-Response; S-R Bonds). The stimulus (sound of food dispenser) produced a response (salivation). It is important to note that at this point, we are talking about reflexive responses (salivation is automatic).

2) Terminology (if you are still confused by these definitions, please look in the non-Psychology jargon glossary on the AlleyDog.com homepage):

a) Unconditioned Stimulus (US) - a stimulus that evokes an unconditioned response without any prior conditioning (no learning needed for the response to occur).

b) Unconditioned Response (UR) - an unlearned reaction/response to an unconditioned stimulus that occurs without prior conditioning.

c) Conditioned Stimulus (CS) - a previously neutral stimulus that has, through conditioning, acquired the capacity to evoke a conditioned response.

d) Conditioned Response (CR) - a learned reaction to a conditioned stimulus that occurs because of prior conditioning.

*These are reflexive behaviors. Not a result from engaging in goal directed behavior.

e) Trial - presentation of a stimulus or pair of stimuli.

Don't worry, we will get to some examples that make this all much more clear.

3) Basic Principles:

a) Acquisition - formation of a new CR tendency. This means that when an organism learns something new, it has been "acquired".

Pavlov believed in contiguity - temporal association between two events that occur closely together in time. The more closely in time two events occurred, the more likely they were to become associated; s time passes, association becomes less likely.

For example, when people are house training a dog -- you notice that the dog went to the bathroom on the rug,. If the dog had the accident hours ago, it will not do any good to scold the dog because too much time has passed for the dog to associate your scolding with the accident. But, if you catch the dog right after the accident occurred, it is more likely to become associated with the accident.

There are several different ways conditioning can occur -- order that the stimulus-response can occur:

1. delayed conditioning (forward) - the CS is presented before the US and it (CS) stays on until the US is presented. This is generally the best, especially when the delay is short.

example - a bell begins to ring and continues to ring until food is presented.

2. trace conditioning - discrete event is presented, then the US occurs. Shorter the interval the better, but as you can tell, this approach is not very effective.

example - a bell begins ringing and ends just before the food is presented.

3. simultaneous conditioning - CS and US presented together. Not very good.

example - the bell begins to ring at the same time the food is presented. Both begin, continue, and end at the same time.

4. backward conditioning - US occurs before CS.

example - the food is presented, then the bell rings. This is not really effective.

b) Extinction - this is a gradual weakening and eventual disappearance of the CR tendency. Extinction occurs from multiple presentations of CS without the US.

Essentially, the organism continues to be presented with the conditioned stimulus but without the unconditioned stimulus the CS loses its power to evoke the CR. For example, Pavlov's dogs stopped salivating when the dispenser sound kept occurring without the meat powder following.

c) Spontaneous Recovery - sometimes there will be a reappearance of a response that had been extinguished. The recovery can occur after a period of non-exposure to the CS. It is called spontaneous because the response seems to reappear out of nowhere.

d) Stimulus Generalization - a response to a specific stimulus becomes associated to other stimuli (similar stimuli) and now occurs to those other similar stimuli.

For Example - a child who gets bitten by black lab, later becomes afraid of all dogs. The original fear evoked by the Black Lab has now generalized to ALL dogs.

Another Example - little Albert (I am assuming you are familiar with Little Albert, so I will give a very general example).

John Watson conditioned a baby (Albert) to be afraid of a white rabbit by showing Albert the rabbit and then slamming two metal pipes together behind Albert's head (nice!). The pipes produced a very loud, sudden noise that frightened Albert and made him cry. Watson did this several times (multiple trials) until Albert was afraid of the rabbit. Previously he would pet the rabbit and play with it. After conditioning, the sight of the rabbit made Albert scream -- then what Watson found was that Albert began to show similar terrified behaviors to Watson's face (just looking at Watson's face made Albert cry. What a shock!). What Watson realized was that Albert was responding to the white beard Watson had at the time. So, the fear evoked by the white, furry, rabbit, had generalized to other white, furry things, like Watson's beard.

f) Stimulus Discrimination - learning to respond to one stimulus and not another. Thus, an organisms becomes conditioned to respond to a specific stimulus and not to other stimuli.

For Example - a puppy may initially respond to lots of different people, but over time it learns to respond to only one or a few people's commands.

g) Higher Order Conditioning - a CS can be used to produce a response from another neutral stimulus (can evoke CS). There are a couple of different orders or levels. Let's take a "Pavlovian Dog-like" example to look at the different orders:

In this example, light is paired with food. The food is a US since it produces a response without any prior learning. Then, when food is paired with a neutral stimulus (light) it becomes a Conditioned Stimulus (CS) - the dog begins to respond (salivate) to the light without the presentation of the food.

first order:

1) light -- US (food)
\--> UR (salivation)
2) light -- US (food)
\--> CR (salivation)

second order:

3) tone -- light
\--> CR (salivation)
4) tone -- light
\--> CR (salivation )

B. Classical Conditioning in Everyday Life

One of the great things about conditioning is that we can see it all around us. Here are some examples of classical conditioning that you may see:

1. Conditioned Fear & Anxiety - many phobias that people experience are the results of conditioning.

For Example - "fear of bridges" - fear of bridges can develop from many different sources. For example, while a child rides in a car over a dilapidated bridge, his father makes jokes about the bridge collapsing and all of them falling into the river below. The father finds this funny and so decides to do it whenever they cross the bridge. Years later, the child has grown up and now is afraid to drive over any bridge. In this case, the fear of one bridge generalized to all bridges which now evoke fear.


2. Advertising - modern advertising strategies evolved from John Watson's use of conditioning. The approach is to link an attractive US with a CS (the product being sold) so the consumer will feel positively toward the product just like they do with the US.

US --> CS --> CR/UR

attractive person --> car --> pleasant emotional response

3. A Clockwork Orange - No additional information necessary! If you haven't seen this movie or read the book, do it. You will find it very interesting, and a wonderful example of conditioning in action.


II. Operant Conditioning

Operant conditioning can be defined as a type of learning in which voluntary (controllable; non-reflexive) behavior is strengthened if it is reinforced and weakened if it is punished (or not reinforced).

Note: Skinner referred to this as Instrumental Conditioning/Learning

A. The most prominent figure in the development and study of Operant Conditioning was B. F. Skinner

1. History:

a) As an Undergraduate he was an English major, then decided to study Psychology in graduate school.

b) Early in his career he believed much of behavior could be studied in a single, controlled environment (created Skinner box - address later). Instead of observing behavior in the natural world, he attempted to study behavior in a closed, controlled unit. This prevents any factors not under study from interfering with the study - as a result, Skinner could truly study behavior and specific factors that influence behavior.

c) during the "cognitive revolution" that swept Psychology (discussed later), Skinner stuck to the position that behavior was not guided by inner force or cognition. This made him a "radical behaviorist".

d) as his theories of Operant Conditioning developed, Skinner became passionate about social issues, such as free will, how they developed, why they developed, how they were propagated, etc.

2. Skinner's views of Operant Conditioning

a) Operant Conditioning is different from Classical Conditioning in that the behaviors studied in Classical Conditioning are reflexive (for example, salivating). However, the behaviors studied and governed by the principles of Operant Conditioning are non-reflexive (for example, gambling). So, compared to Classical Conditioning, Operant Conditioning attempts to predict non-reflexive, more complex behaviors, and the conditions in which they will occur. In addition, Operant Conditioning deals with behaviors that are performed so that the organism can obtain reinforcement.

b) there are many factors involved in determining if an organism will engage in a behavior - just because there is food doesn't mean an organism will eat (time of day, last meal, etc.). SO, unlike classical conditioning...(go to "c", below)

c) in Op. Cond., the organism has a lot of control. Just because a stimulus is presented, does not necessarily mean that an organism is going to react in any specific way. Instead, reinforcement is dependent on the organism's behavior. In other words, in order for an organism to receive some type of reinforcement, the organism must behave in a specific manner. For example, you can't win at a slot machine unless several things happen, most importantly, you pull the lever. Pulling the lever is a voluntary, non-reflexive behavior that must be exhibited before reinforcement (hopefully a jackpot) can be delivered.

d) in classical conditioning, the controlling stimulus comes before the behavior. But in Operant Conditioning, the controlling stimulus comes after the behavior. If we look at Pavlov's meat powder example, you remember that the sound occurred (controlling stimulus), the dog salivated, and then the meat powder was delivered. With Operant conditioning, the sound would occur, then the dog would have to perform some behavior in order to get the meat powder as a reinforcement. (like making a dog sit to receive a bone).

e) Skinner Box - This is a chamber in which Skinner placed animals such as rats and pigeons to study. The chamber contains either a lever or key that can be pressed in order to receive reinforcements such as food and water.

* the Skinner Box created Free Operant Procedure - responses can be made and recorded continuously without the need to stop the experiment for the experimenter to record the responses made by the animal.

f) Shaping - operant conditioning method for creating an entirely new behavior by using rewards to guide an organism toward a desired behavior (called Successive Approximations). In doing so, the organism is rewarded with each small advancement in the right direction. Once one appropriate behavior is made and rewarded, the organism is not reinforced again until they make a further advancement, then another and another until the organism is only rewarded once the entire behavior is performed.

For Example, to get a rat to learn how to press a lever, the experimenter will use small rewards after each behavior that brings the rat toward pressing the lever. So, the rat is placed in the box. When it takes a step toward the lever, the experimenter will reinforce the behavior by presenting food or water in the dish (located next to or under the lever). Then, when the rat makes any additional behavior toward the lever, like standing in front of the lever, it is given reinforcement (note that the rat will no longer get a reward for just taking a single step in the direction of the lever). This continues until the rat reliably goes to the lever and presses it to receive reward.

3. Principles of Reinforcement

a) Skinner identified two types of reinforcing events - those in which a reward is given; and those in which something bad is removed. In either case, the point of reinforcement is to increase the frequency or probability of a response occurring again.

1) positive reinforcement - give an organism a pleasant stimulus when the operant response is made. For example, a rat presses the lever (operant response) and it receives a treat (positive reinforcement)

2) negative reinforcement - take away an unpleasant stimulus when the operant response is made. For example, stop shocking a rat when it presses the lever (yikes!)

** I can't tell you how often people use the term "negative reinforcement" incorrectly. It is NOT a method of increasing the chances an organism will behave in a bad way. It is a method of rewarding the behavior you want to increase. It is a good thing - not a bad thing!

b) Skinner also identified two types of reinforcers

1) primary reinforcer - stimulus that naturally strengthens any response that precedes it (e.g., food, water, sex) without the need for any learning on the part of the organism. These reinforcers are naturally reinforcing.

2) secondary/conditioned reinforcer - a previously neutral stimulus that acquires the ability to strengthen responses because the stimulus has been paired with a primary reinforcer. For example, an organism may become conditioned to the sound of food dispenser, which occurs after the operant response is made. Thus, the sound of the food dispenser becomes reinforcing. Notice the similarity to Classical Conditioning, with the exception that the behavior is voluntary and occurs before the presentation of a reinforcer.

4. Schedules of Reinforcement

There are two types of reinforcement schedules - continuous, and partial/intermittent (four subtypes of partial schedules)

a) Fixed Ratio (FR) - reinforcement given after every N th responses, where N is the size of the ratio (i.e., a certain number of responses have to occur before getting reinforcement).

For example - many factory workers are paid according to the number of some product they produce. A worker may get paid $10.00 for every 100 widgets he makes. This would be an example of an FR100 schedule.

b) Variable Ratio (VR) - the variable ration schedule is the same as the FR except that the ratio varies, and is not stable like the FR schedule. Reinforcement is given after every N th response, but N is an average.

For example - slot machines in casinos function on VR schedules (despite what many people believe about their "systems"). The slot machine is programmed to provide a "winner" every average N th response, such as every 75th lever pull on average. So, the slot machine may give a winner after 1 pull, then on the 190th pull, then on the 33rd pull, etc...just so long as it averages out to give a winner on average, every 75th pull.

c) Fixed Interval (FI) - a designated amount of time must pass, and then a certain response must be made in order to get reinforcement.

For example - when you wait for a bus example. The bus may run on a specific schedule, like it stops at the nearest location to you every 20 minutes. After one bus has stopped and left your bus stop, the timer resets so that the next one will arrive in 20 minutes. You must wait that amount of time for the bus to arrive and stop for you to get on it.

d) Variable Interval (VI) - same as FI but now the time interval varies.

For example - when you wait to get your mail. Your mail carrier may come to your house at approximately the same time each day. So, you go out and check at the approximate time the mail usually arrives, but there is no mail. You wait a little while and check, but no mail. This continues until some time has passed (a varied amount of time) and then you go out, check, and to your delight, there is mail.

5. Punishment - Whereas reinforcement increases the probability of a response occurring again, the premise of punishment is to decrease the frequency or probability of a response occurring again.

a) Skinner did not believe that punishment was as powerful a form of control as reinforcement, even though it is the so commonly used. Thus, it is not truly the opposite of reinforcement like he originally thought, and the effects are normally short-lived.

b) there are two types of punishment:

1) Positive - presentation of an aversive stimulus to decrease the probability of an operant response occurring again. For example, a child reaches for a cookie before dinner, and you slap his hand.

2) Negative - the removal of a pleasant stimulus to decrease the probability of an operant response occurring again. For example, each time a child says a curse word, you remove one dollar from their piggy bank.

6. Applications of Operant Conditioning

a) In the Classroom

Skinner thought that our education system was ineffective. He suggested that one teacher in a classroom could not teach many students adequately when each child learns at a different rate. He proposed using teaching machines (what we now call computers) that would allow each student to move at their own pace. The teaching machine would provide self-paced learning that gave immediate feedback, immediate reinforcement, identification of problem areas, etc., that a teacher could not possibly provide.

b) In the Workplace

I already gave the example of piece work in factories.

Another example - study by Pedalino & Gamboa (1974) - To help reduce the frequency of employee tardiness, the researchers implemented a game-like system for all employees that arrived on time. When an employee arrived on time, they were allowed to draw a card. Over the course of a 5-day workweek, the employee would have a full hand for poker. At the end of the week, the best hand won $20. This simple method reduced employee tardiness significantly and demonstrated the effectiveness of operant conditioning on humans.

There are also many clinical uses, including Ivar Lovaas' method of teaching autistic children how to speak

No comments: