Category Archives: assessment

Written reports – a lost art?

Despite the date, this is not a Xmas newsletter (that particular seasonal art) or even a critique of Xmas newsletters – but season’s greetings to all anyway. This post is a look at written reports of a different kind – but often also summarising events in an overly positive way  with the more average truths slipping through the cracks unremarked.  Similarly, Xmas letters can become a bit impersonal and lose their impact.  I guess it all depends on the purpose of the exercise.

A lot gets written (and presented) about how to give effective (verbal)feedback but there is much less space given to considering how to put this feedback into optimal written form that might serve a useful purpose. Only last week a teaching visitor suggested (after attending a workshop for Teaching Visitors) that she would appreciate some suggestions on writing a good report.

There’s not much point writing a literary masterpiece if nobody reads it and no point agonising over how to diplomatically state some unwelcome truths if it all disappears into the ether.  There are several purposes and destinations of such reports.  They may be primarily intended for the learner, or for the body that oversees training (as part of an in-training assessment framework or a way of satisfying accreditors).  We write our Clinical Teaching visit reports assuming that the learner reads them and perhaps their supervisor skims them.  The supervisor may note that we have observed the same things as they have during their observation sessions. We say such things as “would benefit from seeing more complex patients” and hope they do something about it.  In the past I have known conscientious registrars to re-visit their reports prior to exams – but the report needs, therefore, to be worth re-visiting and contain content that informs the reader.  A list of “strengths” and “weaknesses” may be vaguely helpful but it would be interesting to know what both learners and supervisors do with Likert type scores on multiple consultations skills.

It is possible of course that some organisations view the written report as legal evidence to justify later decision-making regarding competency, rather than a useful formative process. This 2011 report  noted that supervisors tended to routinely grade residents as at or above the expected level and that As currently used by trainees and supervisors, the assessment forms may underreport trainee underperformance, do not discriminate strongly between different levels of performance of trainees …. and do not provide trainees with enough specific feedback to guide their professional development.” It questioned “What does this actually say about their developing competency? If a trainee does a core medical, surgical or emergency term in Term 1, performing “at expected level” indicates a lower level of performance than if the term was completed in Term 5. The phrases “at expected level” or “above expected level” do not indicate a specific level of competence”.

Although, as noted, there is little in the literature (compared to giving verbal feedback) there is an article on “Twelve tips for completing quality in-training evaluation reports”  – although this is directed more at the end of term evaluations done by the ongoing supervisor rather than a one-off report done by a CT visitor.  This article notes that the more recent literature emphasises the importance of qualitative assessments (as opposed to concern about the reliability of the assigned ratings) and the focus now is on improving the quality of the written, narrative comments.  What you say is important. The article suggests completing the comments section prior to the ratings section in order to avoid the tendency to rate all components the same (eg all 4/5).  It is important that the feedback form you use enables you to provide such feedback and also has meaningful anchors on the rating scale.

A study looking at the quality of written feedback noted consistent differences between trainee-trainer pairs in the nature of comments which suggested that feedback quality was determined not so much by the instrument as by the users.

So how can our written feedback on observed consultations be most meaningful? I guess by this I mean that the learner hears what is said, is able to relate it to what they did and is able to recall it in order to improve future practice.  The supervisor or educator should experience a document that enlightens them about the learner’s performance and progress in a way that informs future supervision / teaching / clinical experiences / remediation.  In some instances it will contribute concisely to an overall program of evaluation.  Here’s how this might be achieved.

  1. All the usual principles of effective feedback apply (non judgmental, based on observed behaviours etc) – see earlier posts.
  2. It is particularly important that it should be timely – if they don’t get the report for 6 weeks they will have forgotten the consultations. I know I have. The more complicated the process the more points there are at which this can fail. Unfortunately my Xmas newsletters are running late and are not very timely this year.  Fortunately, in this context, this probably has no significant impact.
  3. The written report should reinforce what was said in person. No surprises.
  4. The written report speaks to the individual – it is not generic. It asks to be taken personally.  (I am much more likely to read a Xmas newsletter that seems to know who I am).  Filling in forms can be seen as “just a formality” but there is considerable engagement between learner and visitor during the GP Clinical Teaching Visit plus a relative lack of constraint because they aren’t the supervisor (with the attendant conflict of roles). However the conflict between feedback and assessment remains.
  5. Good feedback tries to be specific and behavioural such as “I like the way you listen to the patients at the start of every consultation – keep doing this.” or “remember to ensure that you advise the patient on what to expect and safety-net before the consult finishes” or “As we discussed, I noticed the patient had trouble understanding some of the technical terms you used eg hyponatraemia, globus and crepitations . I suggest practising the use of some lay terms when appropriate or providing explanation. Perhaps you could discuss this specifically with your supervisor in your next teaching session”. “
  6. However, feedback can also include some global encouraging comments such as “Dr Smith demonstrated many attributes of a good family doctor today” or “Dr Singh has a great manner with older patients.” (hopefully the specifics were discussed at the time).
  7. There is no need to cover everything in the narrative feedback – in fact a small number of concise points works best so stick to the most important. This will add weight to them and make them more memorable. (Just like long newsletters are unlikely to be read to the end). The message might be “If you work on anything in the next couple of months, it should be this….”
  8. It has been noted that effective feedback requires accurate self-assessment, reflection and insight on the part of the learner so it is a plus if the report can encourage this.
  9. The written report should suggest ways of improving, developing and exploring – with references and links that can be utilised at a future point in time. For instance, you might comment on their previous lack of experience and current lack of confidence in women’s health and suggest that the FPA course might be useful (and provide a link) or sitting in with Dr X in the practice who does a lot of women’s health.
  10. If ratings have been made at “below the expected level” then it would be useful to make specific comments about these areas and the expected improvements to be made. This requires being aware of the gap between performance and the appropriate standard which is the essence of feedback.  (Perhaps this point can be left out of our Xmas newsletters!)

At the end of the process you will have conscientiously filled in the assessment form AND provided a couple of “take-home messages” that will be worth acting on and revisiting – for all concerned

Diagnosing (and responding to) the struggling learner

I am posting this on the heels of the last post because that one was rather grand and strategic in its approach and I felt it needed to be followed by something more practical that related to day-to-day supervision of trainees. I almost called it “remedial diagnosis” but there is a continuum – from feedback aimed at progressive improvement to focussed interventions to help a learner get up to speed in a particular area and onward to identified areas of major deficit requiring official “remediation”.  We all need to remediate bits of our practice (depending on your definition).

Three weeks ago I went to the dentist and had root canal therapy. This felt like “deep excavation” as in the (unreadable below!) danger sign by the major beach renovation that I noted in my morning walk today.  These problems are often revealed after large storms just as a learner’s problems are often revealed after exams.  Of course some required renovations may be largely cosmetic.


If some in-training formative assessment has suggested a risk for problems with the final exams (or indeed for performance as a GP at the end of training) the specific needs can only be targeted if the specific problems are identified. This will then suggest a more focussed approach for both learners and supervisors / educators.

What do we know about remediation?

Remediation implies intervention in response to performance against a standard. A recent review concluded (pessimistically, as systematic reviews often do) that most studies on remediation are on undergraduate students, focussed on the next exam, with rare long term follow-up and improvements that were not sustained. Active components of the process could not be identified. (Cleland J et al 2013 The remediationchallenge: theoretical and methodological insights from a systematic review.  Medical Education v47 3 pp242 -51).  A paper appealingly entitled “Twelve tips for developing and maintaining a remediation program in medical education” (Kalet A et al 2016 Med Teacher v38 8 pp787-792) has a few interesting observations but is directed at institutions. It noted the common observation that educators spend 80% of time with 20% of trainees, that many trainees will struggle at some point, may need more or less resources, and yet there is limited recognition of this or investment in resources at any level.  The relevant chapter in “Understanding Medical Education” (Swanwick T) notes that performance is a function of ability plus other important factors.  The quality of the learning and working environment is also important – sometimes maybe the fault lies more with us.  It observes that successful models of remediation aren’t well established and, as with the Kalet article, it advises personalised support rather than a “standard prescription”.

So, we are left a bit to our own devices in diagnosing and managing. Nevertheless, I think we are fortunate in GP training, historically, in that, up to now, we have had a personalised training culture that emphasises, accepts (and, indeed, wants) feedback.  Problems cluster into several areas.

Four common problems and ways to address them

  • Communication skills can sometimes be the most obvious limiting factor in performance. These can be subdivided into language skills (a large and well-addressed topic on its own) or more subtle skills within the consultation – use of words or phrases, jargon, clarity or conciseness, tone of voice, body language etc. These are often picked up on observation (or less often, but notably, from patient feedback). The most useful way to draw these to the attention of the learner, and to begin addressing the issues, is to use video debrief.
  • The easiest diagnosis is lack of knowledge. This might be revealed in a workshop quiz or a teaching visit or in a supervisor’s review of cases. Sometimes GP registrars (particularly if they have done previous training in a sub-specialty) underestimate the breadth of knowledge required for general practice. Sometimes this awareness does not dawn until the exam is failed and they admit “I didn’t take it seriously”. In GP training, considerable knowledge is required for the AKT and it underpins both the AKT and the OSCE. Sometimes the issue is the type of knowledge required. They may have studied Harrison’s (say) and be able to argue the toss about various auto-immune diseases or the validity of vitamin D testing and yet have insufficient knowledge of the up-to-date, evidence-based guidelines for common chronic diseases. They may have very specific gaps, such as women’s health or musculoskeletal medicine, because of personal interests or their practice case-load. In real life the GP needs to know where to go to fill in the gaps that are revealed on a daily basis but, for the exam, the registrar needs to have explored and filled in these gaps more thoroughly. The supervisor can stretch their knowledge in case discussions, monitor their case-load, direct them to relevant resources and touch base re study. Registrars can present to practice meetings (teaching enhances learning). Prior to exams it is useful to practice knowledge tests and follow up on feedback from wrong answers.
  • Consultation skills deficiencies are often about structure. They may be picked up because of difficulty with time management but, equally, there may be problems within the consultation. The registrar may not elicit the presenting problem adequately, convey a diagnosis, negotiate appropriately with the patient regarding management, utilise relevant community resources, introduce opportunistic prevention or implement adequate safety netting. All these skills, and others, are necessary in managing patients safely and competently in general practice. There are many “models” of the GP consultation which can be helpful to learners if discussed explicitly. It can also be useful to have registrars sitting in for short periods with different GPs in the practice in order to observe different consulting methods. However, this is less useful if it is just a passive process and the registrar does not get the chance to discuss and reflect on different approaches. The most useful coaching is direct feedback as a result of observation by supervisors and teaching visitors. This may require extra funding.
  • Inadequate clinical reasoning is the more challenging diagnosis. Good clinical unsafereasoning is something you hope a registrar would have acquired through medical school and hospital experience but this is not always the case. Even if knowledge content and procedural skills are adequate, poor clinical reasoning is an unsafe structure on which to build. This issue may come to light through failure in the KFP or through observation by the supervisor.  It may be necessary to go back to basics. A useful method is to utilise and explore Random Case Analysis (RCA)in teaching sessions.   A helpful article is particularly the use of “why?” and “what if?” questions when interrogating.Sometimes clinical reasoning needs to be tweaked to be appropriate for general practice. A re-read of Murtagh on this topic is always useful and Practice KFPs can reveal poor clinical reasoning.  Registrars can sometimes be observed to apparently leap towards the correct diagnosis or arrive circuitously at the correct and safe conclusion without the clinical reasoning being obvious.  In these circumstances it is useful to question the registrar about each stage of their thinking and decision making in order to practice articulating their clinical reasoning.In summary

    Remediation “diagnoses” can be made in the areas of communication, consultation skills, knowledge and clinical reasoning (and, no doubt, others). The “symptoms” often come to light during observation, workshop quizzes, in-training assessments, case discussions and practice or patient feedback.  Management strategies include direction to appropriate resources, direct observation, video debriefing, case discussion, practice exam questions (with feedback and action) and random case analysis. Most organisations have templates for relevant “plans” which are useful to keep all parties on track.

    Funders and standard-setters are more likely to have “policies” on remediation rather than any helpful resources on how to do it. There is not much in the literature and it is often difficult to develop expertise on a case by case basis (with much individual variation).  Prior to the reorganisation of GP training in Australia some Training Providers had developed educational remediation expertise which could be utilised in more extreme cases by other providers. As educators we need to develop our own skills, document our findings and share freely with others. Supervisors need to know what to do if they suspect a performance issue ie communication channels with the training organisation should be open.

Early identification of the struggling learner

The holy grail and silver bullets

Early identification of learning needs is of course the holy grail of much education and vocational training. It has become even more pertinent in GP training since time lines for completion of training have been tightened and rigidly enforced.  Gone are the days of relatively leisurely acquisition and reinforcement of knowledge and skills, with multiple opportunities for nurturing the best possible GP skillset.

Consequently there is an even more urgent search for the silver bullet – that one test that will accurately predict potential exam failures whilst avoiding over identifying those who will make it through regardless (effort and funds need to be targeted). If it all sounds a bit impersonal well…..there’s a challenge.


Often the term “at risk registrar” is used but I have limited this discussion to the academic and educational issues in training. The discussion on predictors also often strays into the area of selection in an effort to predict both who will succeed in training and who will be an appropriate practitioner beyond the end of training – but this is beyond the scope of this discussion, although it does suggest utilising existing selection measures.

The literature occasionally comes up with interesting predictors (Most of it is in the undergraduate sphere. Vocational training is less conducive to research).  There are suggestions, for instance, that students who fail to complete paperwork and whose immunisations are not up to date are likely to have worse outcomes.  This is not totally surprising and rings true in vocational training perhaps as the Sandra/Michelle/fill-in-a-name test.  The admin staff who first encounter applicants for training are often noted to predict who will have issues in training. This no doubt is based on a composite of attributes of the trainees and the experienced admin person’s assessment is akin to a doctor’s informed clinical judgment.  However it is not numerical and would not stand up on appeal. It is often an implicit flag.   Obviously undergraduate predictors may be different from post graduate predictors but there is always a tendency to implement tools validated at another level of training. They should then be validated in context.

Note that, once in training, the reason for identifying these “at risk” learners is in order to implement some sort of effective intervention in order to improve the outcomes. This requires diagnosis of the specific problems.

Thus, there is interest in finding the one test that correlates with exam outcomes – and there may be mention of P values, ROC curves, etc. Given that different exams test different collections of skills, it is not surprising that one predictor never quite does the job.  As an educator I’m not happy that something just reaches statistical significance but is too ambiguous to apply on the ground.  I want to feel confident that a set of results can effectively detect at least the extremes of learner progression through training: those who will sail through regardless and those who are highly likely to fail something (if no extra intervention occurs).

veg2The “triple test”

If an appropriate collection of warning flags is implemented, then the number of flags tends to correlate with exam outcomes (our only current gold standard). It is possible to identify a small number of measures that do this best and work has been done on this.  This measure + that measure + a third measure can predict exam outcomes with a higher degree of accuracy.  My colleague, Tony Saltis, interpreted this as “like a triple test”.  It appeared to me that this analogy might cut through to educators who are primarily doctors.  In the educational sphere this analogy can be extended (although one should not push analogies too far).  Combining separate tests can provide extra predictive accuracy.  In prenatal testing there have been a double test and in some places a quadruple test and now there is the more expensive cell-free foetal DNA which is not yet universally used. There are pros and cons of different approaches.  Extra sensitivity and specificity for one condition does not mean that a test detects all conditions and, of course, in Australia, the different modality of ultrasound added to that particular mix.

Any chosen collection of tests will not be the final answer. Each component of any “triple (or quadruple) test” should have the usual constraints of being performed in the equivalent of accredited and reliable labs, in a consistent fashion and results of screening tests should be interpreted in the context of the population on which they are performed.  They also need to be performed at the most appropriate time.

Hints in the evidence

I have previously found that rankings in a pre-entry exam-standard MCQ are highly predictive of final exam results. However, to apply this in different contexts there is a proviso that it be administered in exam conditions and the significance of specific rankings can only confidently be applied to the particular cohort. The addition of data from interview scores, possibly selection bands, from types of early in-training assessments and patient feedback scores appear to add to this accuracy, in the data examined – particularly for the OSCE exam.  (Regan C Identifying at risk registrars: how useful are components of a Commencement Assessment? GPTEC Hobart August 2015).  Research is also ongoing in Australian GP training in other regions by Neil Spike and Rebecca Stewart et al (see GPTEC 2016).  I would suggest that the pattern of results is important.


The way forward

Now that GP training in Australia has been reorganised geographically it is up to the new organisations (and perhaps the colleges) to start collecting all the relevant data anew and to ensure it is accessible for relevant analysis. There is much data that can potentially be used but there needs to be a commitment to this sort of evaluation over the long term. It should not be siloed off from the day to day work of educators who understand the implementation and significance of these data.

Utilising data already collected would obviously be cost-effective and time-efficient – in addition to any additional tools devised for the purpose. I suspect there is a useful “triple test” in your particular training context but you need to generate the evidence by follow-up. Validity does not reside in a specific tool but includes the context and how it is administered.  There needs to be an openness to future change depending on the findings.  The pace of this change (or innovation) can, ironically, be slowed by the need to work through IT systems which develop their own rigidity.

This is an exciting area for evidence-based education and the additional challenge is for collegiality, learning from each other and sharing between training organisations. Only then can we claim to be aiming for best practice.

Of course the big question is, having identified those at risk – what and how much extra effort can you put in to modify the outcomes and what interventions have proven efficacy?

“You’ll make a great family doctor!” How do we know?

The above is a phrase which I have often seen written on Teaching Visit Reports.  The enthusiasm almost jumps off the page (/screen) and it comes across as a comment (feedback) by a colleague, recognising certain (not always clearly articulated) perceived attributes of the trainee.  Similarly, medical educators frequently make a global judgment about whether a registrar is likely to proceed through training successfully. What are these experienced assessors seeing?

bark grandma treeAssessors are always searching for valid methods to assure/measure/predict training outcomes and much of this is currently focussed on competencies. This is a topic for a later blog but this post is about a slightly more elusive topic.  It is about “pre-competencies” or perhaps “beyond competencies”.

Three essential ingredients 

Over the last twenty years or so I have come to the conclusion that there are at least three crucial attributes.  If a registrar is obviously 1. Curious 2. Caring and 3. Conscientious then I think “this is the sort of family doctor I would be happy for my family to see” and I relax, to some extent, as an educator.  These qualities assure me that the registrar will achieve that desired outcome of a safe and competent (and, dare I say, good) GP.  Let me explain why I feel reassured that these global judgments predict and reflect some of the more atomized competencies that programs try to measure along the way.

  • Curiosity as an intellectual/cognitive attribute is obviously important to a scientific approach but it can remain cold and objective without also the interest in the person that is central to general practice ( )  In practice, curiosity means that the GP will not have the intellectual laziness that is happy with easy answers.  It drives self-directed learning and ongoing professional development. The curious GP can’t help asking questions and searching out answers.Curiosity ensures you search out what you need to know (ie the curriculum, in more formal terms)
  • In terms of “caring” I am thinking more about a passion for the job not just the soft and fluffy emotion that is sometimes claimed for GPs.  The registrar who cares is motivated about being a good GP, is concerned about the person as a whole (not just their disease) and cares about what happens to the patient in the health system. It is articulated somewhat in domain 1 of RACGP and in the widely applied CanMEDS framework
  • Conscientiousness is representative of the ethical and professional framework that is so difficult to measure during training.  It rates of course in various training frameworks (domains 4 & 5 in RACGP, ACRRM domain 6 and the Professional Role in CanMEDS ).  It contributes to patient safety and ensures the registrar commits to ongoing learning.  These are the learners who get their paperwork done, meet deadlines and make study plans and who ensure that patients are safety-netted and followed up.

paperbark fernleigh

I am often tempted to add Communication Skills to the above three attributes but, although some people appear to be inherently better communicators, there is a sense in which communication is a skill which can be taught and learnt.  Certainly training programs assume this although the debate continues.  The attributes identified above precede the competency tick-boxes.  It is hard to imagine a trainee with the above attributes who does not go on to acquire the required competencies unless some impossible obstacles hinder their progress.  There are people who seem to have the above attributes in spades.  As educators we just need to point them in the direction of what they need to learn.

I guess some would want to add other attributes such as insight or resilience and I wouldn’t argue with that. They are perhaps even more basic to life generally.

There have been attempts to describe and operationalise some of the occupational attributes of the GP (eg Situational Judgment Tests – a topic for another post) in the hope that these will have predictive validity.

The recipe – the benefits of a training program

 Of course the main question for educators is whether such attributes can be taught or their lack compensated for.  Some trainees commence training with these attributes evident and seem to grow organically.  They make it through regardless.  Other learners need to have a top up in one or more of the ingredients and some need a bit more stirring or leavening before being put in the oven.  These ingredients need to be mixed together in appropriate quantities and cooked for the required length of time.  This is the benefit of an appropriately-resourced training program that articulates the best recipe for success.  Hopefully an effective program optimizes this process with less wasted trial and effort of the ad-hoc approaches of a generation ago.  To push the metaphor however, shortcuts may also decrease the quality of the product so we should beware in the future.

A bit more necessary spice

These basic ingredients have always been relevant in the making of a good GP but perhaps historically the preferred flavour has changed over the decades – a basic sponge is not good enough for current needs and percotyledonhaps different spices are required in certain practice contexts.  This means that specific curricular content and required skills change over time – and the curious, caring, conscientious doctor takes this on board. Over the years we have recognised and encouraged the crucial ingredients and added in some of the specifics needed to produce the outcome appropriate to the needs of our community at this point in time.

The garnish – exam technique 

Although being a good GP is the ultimate goal, passing the relevant exams is obviously crucial so, as educators, we are required to add a collection of more or less exotic “garnishes” which represent the methods of assessment appropriate to each sub-culture.  Thus the registrar who we believe will be a good GP also needs to ensure they are acquainted with the relevant exam strategies before their task is finished – be they OSCEs, KFPs, or MiniCEXes etc.  The appropriate garnish may just make the difference in the reality cooking show metaphor of training!

I will look at other potential, more measurable, predictors of training outcomes later.