Cognitive Performance Assessment Using Simulation, Among Anesthesiology Residents – A Review

Review Article

Austin J Anesthesia and Analgesia. 2020; 8(1): 1085.

Cognitive Performance Assessment Using Simulation, Among Anesthesiology Residents – A Review

Sidi A*

Department of Anesthesiology, University of Florida College of Medicine, USA

*Corresponding author: Avner Sidi, Department of Anesthesiology, University of Florida College of Medicine, USA

Received: January 14, 2020; Accepted: February 18, 2020; Published: February 25, 2020

Abstract

Our goal was to develop and validate a realistic assessment technique (using simulation), that will allow to assess Cognitive and Technical skills and performance of Anesthesiology residents. To achieve all these integrated goals we built a 3 phase plan.

The first phase in our investigation was to assess the construct validity (= progression of scores within progressing levels of training) of the simulationbased OSCE (=Objective Structured Clinical Examination) summative assessment tools developed and established by the Israeli Board Examination, and its potential generalizability to an American training program for formative or summative assessment. The exam related to 1st phase of our investigation was administered 66 times to 50 different residents. In the second phase we evaluated the deficiencies in cognitive performance according to error rates and performance grades within and across different clinical domains, and between PGY (= post graduate year of training) levels – in 47 residents tested 80 times. In the last phase, where our primary aim was to detect changes in “higherorder” deficiencies, comparing 2 successive academic years -- 35 PGY-3 & -4 residents were tested 50 times.

The pass rate in the 1st phase was significantly higher for PGY3 and PGY4 residents compared to PGY2 residents in the OR; this rate was also significantly higher for PGY4 residents compared to PGY2 residents when all three clinical domains were combined (11/22=0.50 vs. 2/23=0.09). The cognitive success rate by PGY4 residents in the 2nd phase was 0.5 - 0.68, and significantly lower than the non-cognitive success rate for Resuscitation and Trauma. In the 3rd phase we found a change in mean error rates across years. In all 3 clinical domains, the cognitive success rate was higher (range, 0.74–1.00) than the previous year’s value (range, 0.39–0.87). The reduction in error rates is primarily due to decreases in non-technical errors, predominantly in resuscitation & trauma.

Conclusions: In its 1st phase, our study demonstrated the “generalizability”, sharing of scenarios. In the 2nd phase our main findings revealed that PGY-3 & -4 residents’ error rates were higher for the cognitive items as compared to the non-cognitive ones in each domain tested. In the final phase we demonstrated that not only simulation is effective at identifying these errors, but also that simulation may be a valuable way to teach and combat these errors.

Keywords: Assessment; Cognitive; Skills; Learning; Simulation; Anesthesiology; Residency

Introduction

The definition of performance in anesthesia varies dramatically – from vague (vigilance, data interpretation, plan formulation, and implementation) [1] to very technical, organized, and detailed (gathering information for preoperative evaluation, equipment preuse preparation, intra-operative checks, postoperative management, airway assessment) [2,3]. Some investigators evaluate performance in anesthesia by separating basic knowledge (gathering information) or the technical (initiating and working with protocols, reviewing checklists) from the cognitive and behavioral or affective (decisionmaking and team interaction) aspects [4,5]. This separation is based on strong analogies to performance during management of critical events in aviation, another complex and dynamic domain [5]. Most educators in anesthesia today believe it is important to measure two separate aspects of skilled performance in managing crisis situations: implementing appropriate technical actions (technical performance) and manifesting appropriate crisis solving and management of anesthesia non-technical behaviors.

The definition of Anesthesia Non-Technical Skills (ANTS), [6- 10] includes: (a) task management (planning, prioritizing, keeping standards, using resources); (b) team work (coordinating, exchanging information, using authority, assessing capabilities, supporting); (c) situation awareness (interpreting information, recognizing, anticipating; (d) decision making (identifying & selecting options, re-evaluating). Conversely, technical skills refers to everything that is not ANTS: basic & technical knowledge (gathering information, preparation of drugs and equipment, initiating and working with protocols and checklists) [11,3], [12-14] and psychomotor skills (perception, guided response) [15]. The ANTS concept was developed and evaluated in a project between the University of Aberdeen Industrial Psychology Research Center and the Scottish Clinical Simulation Center. A team of anesthetists and psychologists was assembled and designed the anesthetists’ non-technical skills system using methods of task analysis similar to the one used for pilots [7,16]. The ANTS include the main nontechnical skills (cognitive and affective) associated with good anesthetic practice [11,3,17].

Models that integrates lower-level knowledge and lower-level skills-based learning with a higher-level skills (of attitude, skills, behavior and culture of patient safety) – were developed for the simulated [13,18] and non-simulated[19 environment. One of the early models integrates four progressive capabilities: understanding (knows), application (knows how), integration (shows how) and practice (does) [19]. Knowledge is at the base of this framework and action/doing is at the top. Basic anesthesia knowledge is also a predictive academic variable for anesthesia resident clinical higherlevel performance and is measured by using different tests during the first year of training [11].

The anatomical locations in the human brain for upper-level and lower level knowledge / learning are different, with the use of different neuro-transmitters: Cognitive learning and memory (motivation, decision-making) is based in the basal ganglia contrasting with the known role of the medial temporal lobe in declarative memory [20]. Nontechnical skills can be divided into two subgroups: (1) cognitive or mental skills (decision-making, planning, strategy, risk assessment, situation awareness); and (2) social or interpersonal affective skills (teamwork, communication, leadership). Both are necessary for safe and effective performance in the operating room, [21] and represent 2 of 3 legs in the skills triangle (with the psychomotor skills being the third leg), was already presented in previous publication in this journal and other publications [15,18,22].

Competency assessment of non-technical (= cognitive and affective) and technical (= psychomotor) skills [15,22], is extremely hard be accomplished using only traditional examinations [11,23-25]. Most clinical competence assessments use either performance-based methods (e.g., objective structured clinical examinations aka OSCEs) or tests that assess the “technical rationality” part of clinical reasoning (e.g., multiple-choice questions). These fail to capture the uncertainty of some clinical scenarios that will be encountered. Problemsolving in the operating room requires a mixture of knowledge and experience [24].

Current evaluation methods (including simulation-based) typically measure basic knowledge and performance, rather than competency, in the complex tasks of acute care [2]. This is why it is important to develop more efficacious methods to measure acute care clinical performance. Simulation could be used to measure advanced cognitive diagnostic and therapeutic management skills and the ability to integrate knowledge, clinical judgment, communication, and teamwork into the simulated practice setting.

Our goal was to develop and validate a realistic assessment technique (using simulation –environment & methodology), that will allow us to assess skills and performance of Anesthesiology residents, differentiate between Cognitive and Technical performance, enable us to detect deficiencies, and identify longitudinal changes in cognitive skills – meaning that cognitive performance deficiencies can be improved over time. In order to achieve all these integrated goals we built a 3 step / phase plan.

Our first phase was to assess the “construct validity” (= progression of scores within different progressing levels of training) of the simulation-based OSCE summative assessment tools developed originally for non-American Examination [26,27] and their potential generalizability to an American training program for formative or summative assessment. American Anesthesiology residents across all post-graduate years (PGY 2-4) in one institution (of 80 in the residency program) were examined. This validation could not be performed in the Israeli Board setup which tested only graduating residents equivalent to American PGY4 residents. The other aim was to demonstrate the “generalizability”, sharing the scenarios developed for the non-American examination with an American academic environment for formative (teaching) and summative (testing) assessment [28].

The second phase of our investigation was to evaluate the deficiencies in cognitive performance according to error rates and performance grades within and across different clinical domains, and between PGY levels. Based on our previous preliminary work 47, we hypothesized that we would uncover some deficiencies in knowledge and skills, and that there would be fewer higher-order cognitive deficiencies in graduating compared to starting PGY residents [18].

In the last phase, our primary aim was to evaluate cognitive performance, and detect “higher-order” deficiencies according to error rates and performance grades within three different clinical domains (OR, trauma, and cardiac resuscitation) and between PGY levels, comparing 2 successive academic years. Our main objective was to demonstrate that simulation can effectively serve as assessment of cognitive skills and can help detect “higher-order” deficiencies, which are not as well identified through more traditional assessment tools. We hypothesized that simulation can identify longitudinal changes in cognitive skills – meaning that cognitive performance deficiencies should improve over time. We expected to see improvement in some deficiencies in knowledge and skills and hypothesized that there would also be fewer higher-order cognitive deficiencies for residents in the subsequent academic year from a learning effect. This learning effect is known as “construct validity” or progression of scores over time within progressing levels of training [28,1]. We expected that progression in scores will also be evident for the whole group of graduating residents evaluated in other fields and different scenarios.

Methods

In order to achieve the above mentioned 3 goals, we built a 3 phases plan (see also detailed description in the Introduction). The first phase of our investigation we used summative assessment tools developed by the Israeli Board Examination [26.27.29]. In the 2nd and 3rd phases of our investigation we evaluated the deficiencies in cognitive performance according to error rates and performance grades within and across different clinical domains, and between PGY levels Following Institutional Review Board (IRB) approval, all study phases were conducted at the University of Florida anesthesiology residency program.

Scenarios

In Phase 1: Two similar but not identical scenarios (to counter scenario content leakage and enhance content security) were used in each of three clinical domains: resuscitation, trauma, and operating room crisis management - in a simulated environment [26.27.29]. These scenarios were originally developed by the Israeli Board of Anesthesiology Examination Committee [26,27], [30,31]. Faculty members from the Department of Anaesthesiology in the University of Florida, assisted by educational and simulation experts, translated the scenarios with maximal adherence to the original script [26,27], scenario protocol, language, and assessment tools. No change was made in scoring, assessment, pass/fail determinations, orientation of residents, or the examination process itself.

In phase 2,3: We used a previously described scenario approach: (first stage: basic knowledge; second stage: exploring advanced cognition by discussion /debriefing)[18,26-29,32,33] [see Figure 67 and our previous publications. [28,33]].

Two similar but not identical scenarios were used in each of three clinical domains (cardiac resuscitation, trauma management, intraoperative crisis management), in a simulated environment. These six scenarios were originally developed and used by the Israeli Board of Anaesthesiology Examination Committee [26-29],[32,33]. Faculty members in the University of Florida, Department of Anesthesiology, assisted by educational experts, translated and adapted the material and methods.

Participants

In Phase 1: Fifty Anesthesiology residents in Post-Graduate Years (PGYs) 2-4 were evaluated. The examination was administered 66 times to 50 different residents. All residents were recruited by the chief residents, and had previously participated in an orientation and sessions with the Human Patient Simulator (CAE Healthcare, Sarasota, FL). We evaluated all PGY groups within a 3 months window, in each phase of the study. Each consented resident received oral instruction and printed materials explaining the study objectives (of evaluating teaching or learning errors), and assurance that results were confidential and had no impact on their residency program evaluations. All residents had prior orientation to the high-fidelity Human Patient Simulator as a part of their curriculum. Practice and assessment of clinical skills in a simulator environment was not novel to the participants.

In Phase 2: 47 PGY2-4 residents participated 80 times.

In Phase 3: 35 PGY-3 and -4 residents (of 50 in the residency program) were tested 50 times during two subsequent years, and 18 of those 35 residents were evaluated in both years (first as PGY3 then as PGY4). Thus, we studied 35 residents (18 of these 35 residents were the same = participated in both phase 2 and 3) across 2 successive years as they graduated to the next level 1 year later. Eighteen examinees were evaluated in the same domain and in identical or similar scenarios during 2 consecutive years (2011–2012 and 2012–2013).

Assessment tools

In all phases; Full description of the assessment model and Tools in all 3 phases of our investigation, our study protocol, assessors, and the scoring system -- appear in Appendix 1 [see also Figure [67], and our previous and other publications. [18,33],[26-29],[32,34,35-37]. This model integrates four progressive capabilities: understanding (knows), application (knows how), integration (shows how) and practice (does) [19]. This checklist scores performance using the item-based Angoff method [35,36] (see Appendix 1).

Feedback

In all phases, the residents completed questionnaires on realism of each scenario, including the perceived relevance of the scenario(s) and the residents’ satisfaction from their performance in the simulation.

Calculations (Appendix 2)

For every item in each scenario, the following parameters were calculated as previously described [26-28], and compared between PGY groups, calculating Group (PGY) Error rate; Item performance grade; and Individual (Resident) Success Rate;

Statistics

Each checklist and script included identification of the resident PGY level. Results were analysed using a SAS9.2 statistical software package. Checklist results were manually entered into an Excel (Microsoft, Redmond, WA) spreadsheet. A non-inferiority test (for proportion correct scores) was conducted between each pair of scenarios in each field to test equivalence between scenarios, assuming an allowable difference of ≤30% in performance or difficulty grades, while checking power for range of difference [38]. The non-inferiority test was performed in order to determine that the two scenarios in each of the clinical areas (within the same type or field) were not inferior to each other; A subsequent equivalence test (for proportion correct scores) was conducted between each pair of scenarios to evaluate similarity between them [31]. Equivalence was accepted with 80% certainty if the ratio (log difference) of the grades was within 20% for the pair. The log difference was used because the grades distribute log-normally. The 80% and 20% thresholds were used because these are accepted rates of equivalency tests [39]. Variables are presented as mean ± SD. Differences were considered significant when p ‹ 0.05.

The feedback questionnaires that analyze realism of the scenario, perceived relevance, and satisfaction from own performance in the simulation scored on a scale from 1-5; 5 being the highest) – were scored by the examinees. Correlations between resident satisfaction with their own performance in the simulation and both the total proportion-correct scores and the general-scores, were calculated.

Variables were compared between groups by a random mixedeffect ANOVA model. We calculated means for PGY and field as random variables and the scenario as the fixed variable.

The error and pass rates for scenarios were compared using a 2-prop-z-test.

In phase 2,3: Individual success rates are presented as mean ± SD and grouped error rates are presented as ratios of errors for each scenario within a clinical domain for each PGY level. t- and Kruskal-Wallis tests were used to determine if individual success rates were significantly different between two scenarios within each field. An equivalence test was conducted between scenarios in each domain to test for equivalence 54 between the two scenarios in each domain. Group error rates for nontechnical and technical items were compared for each scenario within each PGY by using a 2-proportional z-test. Scenarios within each domain and PGY level were similarly compared for error rates.

Linear mixed models were used to compare individual success and error rates between PGY groups. PGY level, domain, and scenario were considered fixed effects and identification of the resident was considered a random effect in order to adjust for correlations among observations from the same subject. The Kenward-Roger method was used to calculate the denominator degrees of freedom due to the unbalanced study design. The Tukey-Kramer method was used to adjust for multiple comparisons. For all analyses, alpha was designated as 0.05. Data were analyzed using SAS 9.3 (SAS, Cary, NC).

Cognitive Errors Analysis – All items tested in each scenario script were evaluated, concentrating on the grouped error rates of ›0.7 by the graduating PGY4 group during the first (non-cognitive) stage and the second (cognitive) stage. We then related the deficiencies we observed to a list from a recent publication that identified important cognitive errors in anesthesiology practice [37].

Results

Phase 1

The ANOVA analysis of the different PGY levels was significant, and analysis revealed what drove those differences:All scenarios were compared in the difficulty level (performance grade), and were not different amongst different PGY and clinical domains tested. A non-inferiority test [38] and a subsequent equivalence test [39] (for proportion correct scores) demonstrated the similarity between the 2 scenarios for the OR and Resuscitation. The corresponding P values to the equivalence tests are for Resuscitation = 0.0976 (equivalent at 10% level); Trauma = 0.2712 (not equivalent at 10% level); OR = 0.0619 (equivalent at 10% level); Overall = 0.005 (equivalent at 10% level). Thus, in the case of Trauma the equivalence cannot be said with the same 80% certainty, and scenario 1 has higher grades than scenario 2.

There are no significant differences in the performance grades (calculation of scenario difficulty) within any scenario pair in a domain. The error rate was lower for PGY4 residents compared to PGY2 residents in each domain, and scenario – except in scenario OR #1 and Trauma #2, where the error rate was relatively high for all PGYs. When scenario #1 and #2 in each clinical domain was considered as one unit, the error rate was significantly lower in each domain for PGY4 residents.

The critical items error rate was significantly lower for PGY4 residents compared to PGY3 residents in the OR domain; this rate was also significantly lower for PGY4 residents compared to PGY2 residents in the resuscitation domain.

The final pass rate was significantly higher for PGY3 and PGY4 residents compared to PGY2 residents in the OR (Figure 1) [28]; this rate was also significantly higher for PGY4 residents compared to PGY2 residents when all three clinical domains were combined (11/22=0.50 vs. 2/23=0.09).