Welcome to PetForums

Join thousands of other pet owners and pet lovers on the UK's most popular and friendly pet community and discussion forum.

Sign Up

Differential/variable rewarding of established behavours

Discussion in 'Dog Training and Behaviour' started by RobD-BCactive, Mar 30, 2011.


  1. RobD-BCactive

    RobD-BCactive PetForums VIP

    Joined:
    Jul 1, 2010
    Messages:
    2,401
    Likes Received:
    29
    ???? Do tell more, please :)
    Links to study would be fine.
     
  2. tripod

    tripod PetForums VIP

    Joined:
    Feb 14, 2010
    Messages:
    1,618
    Likes Received:
    79
    Do you want an example of each or are you specifically asking about reinforcement schedules as teh title of the thread suggests?

    You really are becoming a learning theory junky aren't you? :D If so welcome to the club :tongue_smilie:
     
  3. RobD-BCactive

    RobD-BCactive PetForums VIP

    Joined:
    Jul 1, 2010
    Messages:
    2,401
    Likes Received:
    29
    I meant the thread to discuss "differential/variable rewarding of established behaviours" using a Clicker, but I don't expect spoon feeding, a URL to a good source would be great, rather than take up your time.

    I guess you got me on the "junky" bit, though I think it's more practical interest, not just theory :)
     
  4. tripod

    tripod PetForums VIP

    Joined:
    Feb 14, 2010
    Messages:
    1,618
    Likes Received:
    79
    The word theory in that context refers to information regarded as pretty reliable as far as science goes (you can't prove anything, only disprove it) - theory in science does not have the same meaning as it does when used in everyday language.
    Theory refers to a statement/info regarded as as true as it gets with the info we have right now.

    Learning Theory there is the same as we might, in science, refer to the Theory of Evolution.

    The study of learning theory is designed to refine your practical application of it - that's when it gets really tough and very very interesting :)

    We can discuss reinforcement schedules, would never spoon feed ;) but want more than just links in a discussion :)

    Reinforcement schedules refer to when and how you reward a behaviour.

    When first teaching a behaviour we tend to use a continuous schedule - we reward every correct or almost correct response (depending on our criteria).

    Once a behaviour is established we can move to a more variable schedule.

    We may choose to reward correct behaviours after a variable number of correct responses. This is known as the 'slot machine' effect and helps explain how these gambling machines are so addictive.

    Variable ratio schedule may mean that we ask for twenty responses but only have let's say 10 treats. That would mean that on average we would reward every second correct response.
    This would be too predictable - if the animal knows to expect reward on the response number 1, 3, 5 and so on they would know not to expect one on 2,4,6 etc. so may not perform as nicely.

    A variable schedule however will reward without following that pattern, so it might look like this:
    Reward for reponse numbers 1,2,5,7,9,10,11,17,18,20

    The animal performs strongly because they don't know which one will earn the reward.

    This starts because the first time the animal isn't rewarded when expected will work harder (e.g. look mum I'm sitting really quickly, did you miss it?) This relies on using extinction bursts.

    A differential reinforcement schedule will involve choosing a specific criteria by which to reward rather than another.
    For example we may use DRO - differential reinforcement of other behaviour. So if we wanted to reduce a particular behaviour we might reward any other behaviour, other than the inappropriate one.

    Another example, DRI (or DRA) - differential reinforcement of incompatible behaviour.
    So again if we want to reduce a behaviour we might reward an incompatible behaviour.

    We use differential reinforcement when upping criteria.
    We migth teach sits with a continuous rate. Once we have it we might choose a variable rate to get it really solid.
    Then once the dog gets it we might choose to only reward sits that are straight, or sits that are quick, or sits on your left, or whatever criteria you are working on.

    We can't forget the power of random reinforcement also. As the name suggests, the response is seen as a result of lack of contingency and thus becomes very strong.
    For example, your dog sees a cat in a garden and every time you pass it since then, your dog checks teh garden to see if the cat is there. Your dog doesn't know what behaviour caused the appearance of the cat so they better check just in case. Random reinforcement is very very very powerful!
     
  5. RobD-BCactive

    RobD-BCactive PetForums VIP

    Joined:
    Jul 1, 2010
    Messages:
    2,401
    Likes Received:
    29
    Hmmm, and what if we aren't quite in tune with what the animal perceives as a reward at that moment? Treats offered to dogs who aren't interested springs to mind.

    On the recall, I've tried to vary, I kind of have feeling I goof up often enough, to probably make deliberate witholding uncessary and perverse. Just trying not to be too predictable seems to work out, with occasional creative ideas.

    I really have impression that Freddie sometimes doesn't give a hoot about praise, or physical touch, but at other times it's a big thing. In the puppy stage it was very simple and predictable, but not so with an older dog.
     
  6. tripod

    tripod PetForums VIP

    Joined:
    Feb 14, 2010
    Messages:
    1,618
    Likes Received:
    79
    If it doesn't cause an overall increase in behavioural frequency than it isn't reinforcement by definition so not relevant to a reinforcement schedule. Its circular reasoning and goes nowhere ;)

    If you don't have hte behaviour on learned basis, my measure of that tends to be a count. Ask for a certain number of x behaviour a day, in specific situations if thats relevant, and see what proportion of successes there are. If you get 80%+ then you may be ready for a variable schedule in that context.

    If you don't have that, then the behaviour is not learned in that situation e.g. in the park within 20m of other dogs, and you should be working in a continuous schedule.
     
  7. RobD-BCactive

    RobD-BCactive PetForums VIP

    Joined:
    Jul 1, 2010
    Messages:
    2,401
    Likes Received:
    29
    OK that makes sense, though the thought of recall suceeding in some regular enough context to measure, at a rate of only 80% would worry me silly :)
     
  8. Pawsitive

    Pawsitive PetForums Junior

    Joined:
    Mar 24, 2011
    Messages:
    93
    Likes Received:
    3
    really interesting thread! :)
     
  9. tripod

    tripod PetForums VIP

    Joined:
    Feb 14, 2010
    Messages:
    1,618
    Likes Received:
    79
    I don't mean that recalls should only be 80%, I mean that to progress to a variable schedule you need at least 80% success - its just one criteria that can be used to estimate that a behaviour has been learned.

    Always remember that behaviour is contextual and if you don't think your dog will recall don't allow them off leash - past behaviour is the best predictor of future behavioiur so if you have had trouble before don't risk it again, without lots more proofing :)
     
  10. Irish Setter Gal

    Irish Setter Gal PetForums Senior

    Joined:
    Mar 17, 2011
    Messages:
    697
    Likes Received:
    46
    When you change the context ie recall now being done in the park having progressed from the garden do you revert to 100% reward before reverting to a variable, or do you continue on a variable since the behaviour has been 'learnt' albeit in a different context?
     
  11. RobD-BCactive

    RobD-BCactive PetForums VIP

    Joined:
    Jul 1, 2010
    Messages:
    2,401
    Likes Received:
    29
    Yes, what you meant was clear.

    The thing is, I tried to teach solid recall by an "errorless learning" process, and it seemed very very effective. If a failure occured (say 2 attempts were required) in the situation, then I backed off for a while and picked my moments in the same situation, but when I was very sure of success due to dog glancing at me say during play with a pal. I'm wondering if by just varying the reward, I was by accident "witholding" at times, the occasions where urgent circumstance meant no reward was attempted were very limited.

    And if you have 100% success then yes, one technically cannot be "reinforcing"; but practically we want to maintain that success rate, not have it slip back due to a predictable perceived witholding of something nice, that may act as negative punishment. Or would you disagree?
     
    #11 RobD-BCactive, Apr 1, 2011
    Last edited: Apr 1, 2011
  12. tripod

    tripod PetForums VIP

    Joined:
    Feb 14, 2010
    Messages:
    1,618
    Likes Received:
    79
    Ok when it comes to operant conditioning once you have the reponse learned (and remember its contextual) that learned response is retained most effectively if rewarded intermittently rather than every time.

    I feel with recall that it is probably one of the more difficult ones to teach. I tend to reward everytime but my rewards are Premack-ing when out and about.
    I also practice recalls/emergency stop about every 2-3 minutes on a walk - recall and then release to continue having fun.
    If playing with another dog or really engaged in another activity I up my recalls to about every half to a minute.
    And I use lots of varied recall exercises when practicing.
    I take no chances as my dog has a history of dog aggression and high prey drive.

    I think with recalls that they should be proofed to death in 'easy situations' and work to get a 100% success rate.
    Then when upping the distraction level, or some other criteria, work on a long line - keep the dog successful. Once they learn that freedom may lead to great rewards its difficult to crawl back.
     
  13. tripod

    tripod PetForums VIP

    Joined:
    Feb 14, 2010
    Messages:
    1,618
    Likes Received:
    79
    Yep because to your dog you are now teaching a new behaviour! They are such amazing discriminators that teeny tiny little differences means that everything is suddenly different.

    But in a new context you might move from continuous to variable quicker.
     
  14. Irish Setter Gal

    Irish Setter Gal PetForums Senior

    Joined:
    Mar 17, 2011
    Messages:
    697
    Likes Received:
    46
    How's about this one: teaching the 'turn to whistle' command at distance, which is effectively a directional turn with recall?

    There comes a point when you want to reward for the distance travelled out rather than the turn, and since all rewards should be within two secs of event for it to be considered a reward - how does that work?

    I can see that the reward could be given to a dog that naturally will 'run away' and has learnt the recall, but how do you reward for a dog that won't 'free run away' the distance you need, or am I now getting too technical with the basics.

    I have my own answers, for this particular question, just interested in somebody elses take on it.
     
  15. RobD-BCactive

    RobD-BCactive PetForums VIP

    Joined:
    Jul 1, 2010
    Messages:
    2,401
    Likes Received:
    29
    On distance stuff I've captured the behaviours in self-rewarding situations, with feedback being limitted to verbal encouragement and a "no reward" signal. I'm trying to improve it, whilst adding things to games, so fun bits are given as rewards. Also using the premacking idea, so swimming after a stick is the reward for previous peformance.

    Seen similar method in Working Sheepdog Training materials.
     
  16. tripod

    tripod PetForums VIP

    Joined:
    Feb 14, 2010
    Messages:
    1,618
    Likes Received:
    79
    We are going OT a little bit - reinforcement schedules have nothing to do with prescriptions for training specific exercises nor the type or combinations of rewards used.

    Distance behaviours are tough and one of the reasons that markers were developed in marine animal training.

    Teaching behaviours at a distance makes no sense to me - the turn should be well established before criteria such as distance are increased.

    My dog is, who is not now or never will be a gundog ;), has left and right on verbal cue as well as an away cue for distance. These were taught with the use of shaping and targets.

    Distance behaviours, especially the runout itself, can be self rewarding.

    Teaching with targets is super easy and quick but using retrieves to teach recall, self control and directionals is also very effective in relevant dogs.
     
  17. Twiggy

    Twiggy PetForums VIP

    Joined:
    Jun 24, 2010
    Messages:
    13,591
    Likes Received:
    5,750
    My dog is, who is not now or never will be a gundog ;), has left and right on verbal cue as well as an away cue for distance. These were taught with the use of shaping and targets.

    Much as agility dogs are taught right and left and obedience dogs for sendaway?

    Distance behaviours, especially the runout itself, can be self rewarding.

    Most certainly can...!!
     
  18. Irish Setter Gal

    Irish Setter Gal PetForums Senior

    Joined:
    Mar 17, 2011
    Messages:
    697
    Likes Received:
    46
    I'm not teaching behaviours at distance, merely asking for extended run out. Obviously I'm not going to 'teach' a behaviour at distance, but it could be argued that the run out is 'teaching' which is what I was throwing into the pot.

    Not necessarily the when the dog genuinely won't run out. We have encountered this problem form day one, most people would think it a blessing in disguise that their dog won't 'run away', but there not in this case.

    One of the most distinguished and respected people in HPR training, and an A class judge at that, did help me through the problem, I just wondered how the differential style of rewarding would have fitted my particular problem at the time - and it wasn't the way we resolved it :blink:
     
  19. tripod

    tripod PetForums VIP

    Joined:
    Feb 14, 2010
    Messages:
    1,618
    Likes Received:
    79
    Differential reinforcement is probably less useful on a behaviour that is not established, so I don't think its really relevant here, unless I don't understand what you mean.

    Exactly why I said "can be self rewarding" ;) My dog would be an example of that - he didn't run out on his own and 'naturally' doesn't wander far off leash.

    Yes Twiggy I taught him distance and directions in much the same way as those for agility dogs - he's not one of those either ;)

    I have since seen another way of teaching directions for agility using spins to left or right on cue. But I did my fellas directions with targeting.
    I am working with a pibble at the mo and teaching the outrun with retrieves as part of impulse control work.
     
  20. Twiggy

    Twiggy PetForums VIP

    Joined:
    Jun 24, 2010
    Messages:
    13,591
    Likes Received:
    5,750
    I'm not teaching behaviours at distance, merely asking for extended run out. Obviously I'm not going to 'teach' a behaviour at distance, but it could be argued that the run out is 'teaching' which is what I was throwing into the pot.

    If my memory serves me correctly (and we're talking at least 25 years ago) I think one of the methods of teaching working trials dogs an extended outrun was a pole with a flat platform on top on which was placed titbits. The distance was gradually extended until the pole was more or less hidden in a hedge or fence. I'm pretty sure they can also be redirected left or right at a distance.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice