PHARM368: Advanced Drug Information and Evidence-Based Medicine

Estimated study time: 1 hr 28 min

Table of contents

Sources and References

Primary textbook — Patrick M. Malone, Karen L. Kier, Joseph E. Stanovich & Megan J. Malone, Drug Information: A Guide for Pharmacists, 6th ed. (McGraw-Hill, 2018).

Supplementary texts — Gordon Guyatt, Drummond Rennie, Maureen Meade & Deborah Cook (eds.), Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice, 3rd ed. (JAMAevidence/McGraw-Hill, 2015). Trisha Greenhalgh, How to Read a Paper: The Basics of Evidence-Based Medicine and Healthcare, 6th ed. (Wiley-Blackwell, 2019). David L. Sackett, Sharon E. Straus, W. Scott Richardson, William Rosenberg & R. Brian Haynes, Evidence-Based Medicine: How to Practice and Teach EBM, 2nd ed. (Churchill Livingstone/Elsevier).

Online resources — Cochrane Handbook for Systematic Reviews of Interventions, version 6.4 (Higgins et al., 2023; training.cochrane.org/handbook). GRADE Working Group, Handbook for Grading the Quality of Evidence and the Strength of Recommendations (gradeworkinggroup.org). CEBM Oxford, Levels of Evidence and Critical Appraisal worksheets (cebm.ox.ac.uk). EQUATOR Network reporting guidelines — CONSORT 2010, PRISMA 2020, STROBE 2007, AGREE II (equator-network.org). CADTH Health Technology Assessment methods and guidelines (cadth.ca). University of Toronto Leslie Dan Faculty of Pharmacy Drug Information course public outline.


Chapter 1: The Drug Information Landscape

A pharmacist in a busy hospital ward receives a call from a nurse: “The patient in room 412 is on warfarin for atrial fibrillation. The physician just ordered ibuprofen for post-surgical pain. Is that safe?” The pharmacist’s answer to that question depends on far more than a single fact — it requires integrating knowledge of drug interactions, patient-specific risk factors, and the quality of the evidence that supports or refutes the combination. This chapter establishes the conceptual foundations that make such integration possible: what drug information actually is, how it is organised, where it lives, and how a pharmacist approaches it systematically.

What Is Drug Information?

Drug information is not simply a fact retrieved from a database. Scholars and practitioners draw a careful distinction between data, information, and knowledge, and understanding that hierarchy is essential for professional practice. Data refers to raw, uninterpreted facts — for example, the in-vitro observation that ibuprofen inhibits cyclooxygenase enzymes. Information is data that has been organised and contextualised: ibuprofen inhibits the prostaglandin-dependent protective mechanisms in gastric mucosa and alters platelet aggregation, thereby potentially interacting with anticoagulants. Knowledge is information that has been interpreted in the light of clinical experience and applied to a specific decision: a patient on warfarin who takes ibuprofen regularly faces a meaningfully elevated risk of gastrointestinal bleeding, sufficient to prompt either avoidance or close monitoring. The pharmacist’s role is to function as a knowledge broker — someone who traverses this hierarchy rapidly, translating raw data into actionable recommendations for a specific patient in a specific clinical context.

This role has grown more demanding as the biomedical literature has expanded exponentially. By 2020, MEDLINE indexed more than 30 million records, with roughly one million new citations added each year. No individual clinician can read, let alone appraise, this volume of literature. The drug information pharmacist therefore serves not merely as a librarian but as an expert interpreter, filtering and synthesising evidence to the point of actionability.

Background and Foreground Questions

A critical first step in any drug information encounter is recognising what kind of question is being asked, because the kind of question determines where to look for the answer. Background questions ask for general knowledge about a drug, disease, or physiological mechanism. They typically have the form “What is the mechanism of action of warfarin?” or “How do NSAIDs affect renal function?” Background questions are best answered by tertiary resources — textbooks, compendia, and narrative reviews — because these sources provide organised, synthesised overviews without requiring the reader to appraise individual trials.

Foreground questions, in contrast, ask about optimal clinical decisions for a specific patient. They typically have the form “In a patient with atrial fibrillation on warfarin, does co-administration of ibuprofen increase the risk of major bleeding compared with acetaminophen?” Foreground questions require primary or secondary literature, because the answer depends on specific evidence about outcomes in specific populations. The PICO framework — Population, Intervention, Comparator, Outcome — was developed precisely to give structure to foreground questions, and it will be revisited throughout this textbook. Recognising whether a question is background or foreground prevents the common error of looking for a nuanced clinical answer in a drug monograph, or conversely, wasting time performing a database search for a question that a standard reference answers in thirty seconds.

Classification of Resources

Drug information resources are conventionally classified into three tiers, each with distinct characteristics, strengths, and limitations.

Tertiary resources are synthesised, secondary-source compilations: textbooks, compendia, and point-of-care tools. Micromedex (IBM Merative), Lexicomp (Wolters Kluwer), Clinical Pharmacology (Elsevier), UpToDate (Wolters Kluwer), and Martindale’s Complete Drug Reference are the canonical examples in pharmacy practice. Tertiary resources are highly efficient — a pharmacist can retrieve a drug interaction summary in under a minute — but they carry a significant limitation: they are always behind the evidence. By the time a new finding is incorporated into a compendium, reviewed by an editorial board, and published in an updated edition, the evidence may be a year or more old. For a new drug or a rapidly evolving evidence base, tertiary sources may not reflect current best practice. Additionally, tertiary resources vary in their citation transparency; some provide detailed references for every interaction rating, while others offer summary judgements without traceable evidence.

Secondary resources are bibliographic databases that provide access to the primary literature. MEDLINE (accessible via PubMed), Embase (Elsevier), and International Pharmaceutical Abstracts (IPA) are the major pharmacyrelevant databases. Secondary resources do not contain the full text of articles in most cases; rather, they index citations and abstracts and provide links to publishers. Their strength lies in currency and comprehensiveness — a well-constructed MEDLINE search can retrieve trials published within the past week. Their limitation is that they require significant appraisal skills: retrieving a study is not the same as understanding whether its results are valid and applicable.

Primary resources are the original research articles themselves — randomised controlled trials, observational studies, systematic reviews, pharmacokinetic studies, case reports. Primary resources represent the most current and most granular evidence, but they also require the greatest expertise to appraise. A single trial can be misleading; biased methods, underpowered samples, surrogate outcomes, and selective reporting can all distort findings in ways that are invisible to an uncritical reader. The systematic approach to drug information exists precisely to ensure that primary resources are not merely retrieved but properly evaluated.

The Systematic Approach

The systematic approach to answering drug information requests, developed in the foundational work of Malone and colleagues, provides a structured framework that prevents common errors of omission and premature closure. The approach involves six sequential steps, each of which builds on the previous one.

The first step is to classify the request: what category of drug information is needed? Is this a drug interaction question, a dosing question in organ impairment, an adverse effect question, or a question about therapeutic equivalence? Correct classification immediately suggests which resources and which type of evidence are most relevant.

The second step is to gather background information about the patient and the situation. This includes identifying the patient’s age, sex, weight, renal and hepatic function, diagnosis, current medications, allergies, and any other factors that might modify the answer. A drug interaction question about warfarin and ibuprofen has a different answer for a 35-year-old with normal renal function than for an 80-year-old with a history of peptic ulcer disease.

The third step is to determine the ultimate question — which is frequently different from the question that was initially asked. A nurse asking “can the patient take ibuprofen?” is really asking “what is the safest analgesic option for this warfarin-treated patient?” Reformulating the question in PICO terms makes the search strategy more precise and ensures the answer addresses the actual clinical need.

The fourth step is to search efficiently, beginning with tertiary sources to obtain background context and then proceeding to secondary databases to retrieve primary literature relevant to the foreground question. The search strategy should be documented, reproducible, and comprehensive enough to avoid missing key evidence.

The fifth step is to evaluate the evidence retrieved, using critical appraisal skills to assess validity, results, and applicability. Later chapters of this textbook provide detailed frameworks for appraising different study designs. The sixth and final step is to formulate and communicate a response that is appropriate to the audience — a different level of detail and different framing for a nurse at the bedside compared with a physician requesting a formal consult or a patient asking about their own medication.

Worked Example 1.1 — The Warfarin and Ibuprofen Call

A nurse telephones the pharmacy and asks whether it is safe to give ibuprofen 400 mg orally to a patient on warfarin 5 mg daily for atrial fibrillation who has developed post-surgical musculoskeletal pain. The systematic approach proceeds as follows.

Step 1 — Classify: This is a drug interaction question with a secondary component around analgesic selection in a high-risk patient.

Step 2 — Background information: The nurse provides that the patient is 72 years old, weighs 68 kg, has a creatinine of 95 µmol/L (mildly reduced for age), and has no documented history of peptic ulcer disease. The current INR is 2.4 (within therapeutic range for AF).

Step 3 — Ultimate question: What is the comparative risk of major bleeding with ibuprofen versus acetaminophen as an analgesic in this elderly warfarin-treated patient, and what monitoring or alternatives are recommended?

Step 4 — Search: Check Lexicomp or Micromedex for the warfarin–ibuprofen interaction rating and mechanism. Then search PubMed using PICO terms (warfarin AND NSAIDs AND bleeding) to locate observational studies or systematic reviews.

Step 5 — Evaluate: Tertiary resources uniformly flag this as a major interaction. NSAIDs inhibit COX-1-dependent thromboxane A2 in platelets, impairing platelet aggregation, and simultaneously irritate gastric mucosa, increasing GI bleeding risk. Some NSAIDs (particularly ibuprofen) may also transiently displace warfarin from plasma proteins and inhibit CYP2C9-mediated warfarin metabolism, though these pharmacokinetic effects are modest compared to the pharmacodynamic interaction. Primary literature, including the work of Shorr et al. (1993) and subsequent registry studies, demonstrates a two- to three-fold increase in hospitalisation for GI bleeding in elderly patients using warfarin plus an NSAID.

Step 6 — Response: Recommend against ibuprofen and suggest acetaminophen at doses not exceeding 2 g/day as the preferred analgesic. If an NSAID is considered essential, recommend concurrent proton pump inhibitor therapy, an INR check within 3–5 days, and patient counselling on bleeding warning signs. Document the consultation with sources cited.

Worked Example 1.2 — A Background vs. Foreground Distinction

A student pharmacist asks: “How does metformin work?” This is a background question. The appropriate first-line resource is a tertiary reference — a pharmacology textbook or UpToDate — which can provide a concise mechanistic overview (activation of AMP-activated protein kinase, reduction of hepatic gluconeogenesis, improvement of peripheral insulin sensitivity) in the time it takes to read a paragraph. No PubMed search is warranted.

By contrast, the question “In elderly patients with type 2 diabetes and CKD stage 3b, does continuation of metformin at reduced dose compared with discontinuation affect all-cause mortality?” is a foreground question that demands a systematic literature search across MEDLINE and Embase, retrieval of cohort studies and meta-analyses, and careful critical appraisal — because the answer is not summarised in any single textbook and the evidence has evolved significantly in recent years.

Understanding the distinction between these resource tiers and question types is not merely academic. Pharmacists who apply tertiary resources to foreground clinical questions may give outdated answers; those who attempt primary literature searches for every background question will be paralysed by inefficiency. The systematic approach coordinates these resources into a coherent practice discipline.


Chapter 2: Systematic Literature Searching

In 2020, investigators published a systematic review of antifungal prophylaxis in haematological malignancy and identified 93 eligible randomised trials — none of which had been located by the treating team during their informal literature check, which had relied exclusively on a single PubMed keyword search. The review ultimately changed practice recommendations. That story illustrates the central thesis of this chapter: the difference between a good drug information response and a misleading one often lies not in appraisal skill but in whether the right studies were found in the first place. Searching strategy is not a procedural formality; it is a clinical skill.

The PICO Framework and Searchable Terms

Before a single term is typed into a database, the clinical question must be translated into searchable vocabulary. The PICO framework introduced in Chapter 1 is the engine of that translation. Consider the foreground question: “In patients with non-valvular atrial fibrillation on warfarin, does concurrent use of an NSAID increase the risk of major bleeding compared with no NSAID?” The population element (atrial fibrillation, warfarin) maps to disease terms and drug names; the intervention (NSAID) maps to drug class terms; the comparator (no NSAID) typically does not require a separate search term but informs the study design filter; and the outcome (major bleeding) maps to clinical outcome vocabulary. Each element should be brainstormed for synonyms and related terms before the search begins, because databases index articles using standardised controlled vocabulary that may differ substantially from the clinical language used in the question.

MEDLINE via PubMed: Architecture and Search Syntax

MEDLINE is the National Library of Medicine’s flagship bibliographic database, indexing over 37 million citations from more than 5,200 journals in biomedicine and pharmacy. It is searchable through the free PubMed interface, which adds PubMed-specific records (preprints, author manuscripts) to the MEDLINE core. Understanding PubMed’s architecture is essential for building efficient, reproducible searches.

Medical Subject Headings (MeSH) are the controlled vocabulary used by trained human indexers to tag every article at the time of indexing. A single concept — say, “non-steroidal anti-inflammatory drugs” — may appear in the literature under dozens of synonyms: ibuprofen, naproxen, celecoxib, NSAIDs, COX-2 inhibitors, and so on. The MeSH term “Anti-Inflammatory Agents, Non-Steroidal” captures all of these synonymous terms under one heading, regardless of how the authors phrased it. When a MeSH term is applied with the [MeSH Terms] field tag, PubMed searches the indexed MeSH field and automatically explodes the term to include all more-specific terms nested beneath it in the MeSH hierarchy — so searching for “Anti-Inflammatory Agents, Non-Steroidal”[MeSH Terms] also retrieves articles indexed with “Ibuprofen” and “Naproxen” without requiring the searcher to list every drug individually. Entry terms are the synonyms that MeSH maps to its preferred headings; the MeSH database can be browsed at mesh.nlm.nih.gov to find the preferred heading for any concept.

Boolean operators connect search terms in logical relationships. AND narrows a search by requiring both terms to be present; OR broadens a search by accepting either term; NOT excludes articles containing a term. The standard search strategy for a PICO question builds separate concept blocks for each PICO element using OR to combine synonyms within each block, then uses AND to combine the concept blocks. Field tags restrict where PubMed looks for the term: [tiab] searches title and abstract fields and is used for free-text terms not captured by MeSH; [mh] searches the MeSH field. Truncation using an asterisk () retrieves all words sharing a root — “anticoagul” retrieves anticoagulant, anticoagulation, anticoagulants, and anticoagulated. Phrase searching using quotation marks (“atrial fibrillation”) forces PubMed to search for the exact phrase rather than the two words appearing anywhere in the record independently.

Limits refine the final search set by characteristics of the publication rather than its content: date ranges, language, species (human), article type (clinical trial, randomised controlled trial, systematic review), and age group. Limits should be applied after the concept blocks have been combined, not before, to avoid inadvertently excluding relevant articles.

Worked Example 2.1 — Building a PubMed Search for the Warfarin-NSAID Question

The PICO question is: In patients with atrial fibrillation on warfarin, does concurrent NSAID use increase the risk of major bleeding versus no NSAID?

Block 1 — Population (atrial fibrillation + warfarin): “Atrial Fibrillation”[MeSH Terms] OR “atrial fibrillation”[tiab] AND “Warfarin”[MeSH Terms] OR warfarin[tiab] OR coumadin[tiab]

Block 2 — Intervention (NSAIDs): “Anti-Inflammatory Agents, Non-Steroidal”[MeSH Terms] OR NSAID*[tiab] OR ibuprofen[tiab] OR naproxen[tiab] OR “COX-2 inhibitor*"[tiab]

Block 3 — Outcome (bleeding): “Hemorrhage”[MeSH Terms] OR bleed*[tiab] OR “gastrointestinal bleeding”[tiab] OR “major bleeding”[tiab]

Combined search: Block 1 AND Block 2 AND Block 3, then limit to Humans, English, published 2000–present.

This strategy is more comprehensive than a simple keyword search for “warfarin ibuprofen bleeding” because it captures all NSAID subclasses through the exploded MeSH term, captures variant spellings through truncation, and uses both controlled vocabulary and free text to maximise recall.

Embase and Its Added Value

Embase (Excerpta Medica database), produced by Elsevier, indexes over 34 million records from more than 8,500 journals worldwide, with particularly strong coverage of European literature, conference abstracts, and drug-related journals that are not indexed in MEDLINE. Its controlled vocabulary, EMTREE, is larger and more granular than MeSH, with over 85,000 preferred terms and particular depth in pharmaceutical and pharmacological terminology. For questions involving specific drugs, drug classes, or drug safety signals, EMTREE often provides more precise retrieval than MeSH. Embase indexes a substantial proportion of conference abstracts — proceedings from major medical and pharmaceutical meetings — which represent data that may never appear in a peer-reviewed journal, making Embase particularly valuable for detecting publication bias.

For systematic reviews, the Cochrane Handbook requires that both MEDLINE and Embase be searched at a minimum, precisely because neither database alone provides complete coverage. Studies have consistently shown that limiting a systematic review search to PubMed alone misses approximately 30–40% of eligible trials. The decision to search both databases adds time but substantially improves the comprehensiveness of the evidence base.

The Cochrane Library

The Cochrane Library is a collection of high-quality databases maintained by Cochrane, an international not-for-profit organisation dedicated to synthesising health evidence. CENTRAL (Cochrane Central Register of Controlled Trials) is its most important component for primary searching: it contains records from MEDLINE, Embase, and additional hand-searching of journals and trial registers, curated specifically to identify randomised and quasi-randomised controlled trials. The Cochrane Database of Systematic Reviews (CDSR) contains Cochrane-produced systematic reviews, which are notable for methodological rigor, transparent protocols, and structured risk of bias assessment. The Database of Abstracts of Reviews of Effects (DARE) contains structured abstracts of non-Cochrane systematic reviews, enabling rapid identification of existing syntheses on a topic. Unlike PubMed, Cochrane Library searches do not use MeSH in the same way; its simple search interface and topic-based browsing are often used to locate an existing Cochrane review before constructing a de novo search.

Grey Literature and Clinical Trial Registries

Grey literature refers to documents produced outside conventional peer-reviewed publishing channels: government reports, regulatory submissions, conference proceedings, dissertations, and institutional reports. It is particularly important for drug information because pharmaceutical companies are not required to publish negative trials in journals, but they are required to submit complete data packages to regulatory agencies as a condition of market authorisation. The U.S. Food and Drug Administration (FDA) posts drug approval packages — including full study reports — through its Drugs@FDA database. Health Canada makes its Summary Basis of Decision (SBD) documents publicly available. These regulatory documents often contain data on subgroup analyses, adverse events, and pharmacokinetic studies that are either not published or published only in truncated form in journals.

ClinicalTrials.gov (U.S. National Library of Medicine) and the WHO International Clinical Trials Registry Platform (ICTRP) provide mandatory registration records for most clinical trials conducted after 2005. These registries allow searchers to identify trials whose results have never been published — a central source of publication bias — and to compare pre-registered primary outcomes against those reported in publications, enabling detection of outcome switching. Canadian Product Monographs, accessible through Health Canada’s Drug Product Database, are the approved labelling documents for drugs marketed in Canada and represent the regulatory synthesis of all available evidence at the time of approval, including safety data from post-marketing commitments.

Worked Example 2.2 — Finding Unpublished Data

A pharmacist preparing a drug information response on the safety of a recently approved direct oral anticoagulant in patients with severe renal impairment notes that the phase III trials explicitly excluded patients with eGFR below 30 mL/min/1.73m². The published trial reports therefore provide no direct efficacy or safety data for this population. The pharmacist searches ClinicalTrials.gov using the drug name and filters for completed trials, locating two phase I pharmacokinetic studies in renally impaired volunteers whose results were posted as required under U.S. law but never published in a journal. The Canadian Product Monograph, retrieved from Health Canada’s website, synthesises these PK data and recommends dose reduction thresholds. The FDA approval package on Drugs@FDA contains the full clinical pharmacology review, which details the renal impairment subgroup analyses from the pivotal trials. None of this information would have been found through a PubMed or Embase search alone.

Google Scholar: Strengths and Responsible Use

Google Scholar is a freely accessible search engine that indexes scholarly literature across disciplines, including conference papers, preprints, theses, and content from publishers who do not provide MEDLINE records. Its two distinctive features are breadth — it indexes more document types than any structured database — and citation tracking, which allows a searcher to identify all articles that have cited a landmark paper (“cited by” function), an approach known as forward citation searching. Google Scholar can be useful for locating the grey literature that bibliographic databases miss, for verifying whether a preprint has subsequently been peer-reviewed and published, and for finding the full text of articles behind paywalls through author self-archiving.

However, Google Scholar carries significant risks for clinical practice. It does not distinguish between peer-reviewed and non-peer-reviewed content; a preprint, a blog post, and a Cochrane review appear in the same results list. Retracted articles continue to appear in Google Scholar results even after retraction, sometimes without a clear retraction notice. The algorithm that ranks results is not transparent and is not optimised for clinical relevance or methodological quality. For these reasons, Google Scholar should never be used as the sole or primary database for a clinical drug information search. When used responsibly — as a supplement to MEDLINE and Embase, for citation tracking, for locating grey literature — it is a useful adjunct. When used as a shortcut to avoid structured searching, it introduces unmeasurable bias into the evidence base.


Chapter 3: Foundations of Evidence-Based Medicine

In 1996, a 60-year-old man presented to a teaching hospital in Hamilton, Ontario, with a second myocardial infarction. His cardiologist noted that despite clear evidence from the ISIS-2 trial (1988) that aspirin and streptokinase each independently halved 35-day mortality in acute MI, the patient had received neither during his first infarction five years earlier. This gap — between what the evidence said and what clinicians did — was precisely the gap that evidence-based medicine (EBM) had been developed to close. Understanding the foundations of EBM, including its definition, its conceptual pillars, and its formal tools for grading evidence, is a prerequisite for the critical appraisal skills developed in subsequent chapters.

Sackett’s Definition Unpacked

David Sackett and colleagues at McMaster University defined evidence-based medicine as “the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients.” Each word in this definition carries deliberate meaning. “Conscientious” implies that EBM requires active effort — the clinician must seek out and engage with evidence rather than relying passively on habit or hearsay. “Explicit” means that the reasoning behind a clinical decision should be transparent and articulable; if asked why a particular treatment was chosen, the clinician should be able to name the evidence and explain how it was appraised. “Judicious” — perhaps the most important qualifier — means that evidence is applied with sound clinical judgement, not mechanically. The same evidence may lead to different decisions in different patients depending on their circumstances, values, and comorbidities. “Current” acknowledges that medicine changes and that the best evidence of five years ago may have been superseded. “Best evidence” does not always mean randomised controlled trial evidence; it means the highest quality evidence available for the specific question being asked.

The Three Pillars

EBM rests on three equally essential pillars: best research evidence, clinical expertise, and patient values. The original framing by Sackett explicitly rejected the notion that evidence alone determines clinical decisions; rather, evidence informs and constrains the decision space, while expertise and patient values determine the final choice within that space.

Best research evidence refers to clinically relevant research, particularly from basic sciences and from patient-centred clinical research, into the accuracy of diagnostic tests, the power of prognostic markers, and the efficacy and safety of therapeutic, rehabilitative, and preventive regimens. Clinical expertise is the ability to use clinical skills and past experience to rapidly identify each patient’s unique health state and diagnosis, their individual risks and benefits from potential interventions, and their personal circumstances and expectations. Without expertise, even valid evidence cannot be applied appropriately; a clinician who cannot assess bleeding risk cannot determine whether the NNT of 20 for a preventive drug justifies its use in a given patient. Patient values are the unique preferences, concerns, and expectations that each patient brings to a clinical encounter. A patient who is a professional musician may weight hand tremor from a beta-blocker as an unacceptable adverse effect even if the cardiovascular evidence strongly favours the drug; a patient with severe needle phobia may prefer an oral drug even if its efficacy is slightly inferior to an injectable option.

The failure mode associated with over-reliance on evidence at the expense of the other pillars is sometimes called evidence fundamentalism — the practice of applying population-level trial results mechanically to individual patients, ignoring their unique biology, comorbidities, and values. The opposite failure — ignoring evidence in favour of personal clinical experience — is clinical anecdotalism. Both extremes produce suboptimal care. EBM positions itself as the synthesis that avoids both errors.

Hierarchies of Evidence

Traditional presentations of EBM use a pyramid diagram to illustrate the relative strength of different study designs. At the apex sit systematic reviews and meta-analyses of randomised controlled trials; below them, individual RCTs; then cohort studies, case-control studies, cross-sectional studies, case series, and expert opinion at the base. The pyramid captures an important truth: well-conducted RCTs are more protected from confounding than observational studies, because random allocation distributes both known and unknown confounders equally between treatment groups. However, the pyramid also encodes a significant oversimplification that has led to misuse.

Not all clinical questions can or should be answered by RCTs. Questions about rare adverse effects (e.g., drug-induced agranulocytosis) require large observational databases because the event is too uncommon to study efficiently in an RCT. Questions about long-term outcomes over decades cannot be answered by RCTs of practical duration. Questions about comparative effectiveness in real-world populations cannot be answered by the highly selected populations of most efficacy trials. Diagnostic accuracy questions require cross-sectional designs with a reference standard, not RCTs. Prognostic questions require prospective cohort studies. Applying the EBM hierarchy rigidly would produce the absurd conclusion that we should doubt the evidence that smoking causes lung cancer (based on cohort studies) because no RCT has randomised participants to smoke. The GRADE framework was developed to replace the simplistic pyramid with a more nuanced system that assesses certainty of evidence based on multiple factors, not study design alone.

The GRADE Framework

The Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework, developed by an international working group beginning in 2000, provides a systematic method for rating the certainty of evidence and the strength of recommendations. It is now used by the World Health Organization, Cochrane, and most major clinical guideline organisations worldwide.

In GRADE, certainty of evidence begins at high for RCT evidence and at low for observational evidence, and then is adjusted upward or downward based on specific factors. Four factors can lower certainty. Risk of bias refers to methodological limitations in the studies — inadequate randomisation, lack of blinding, substantial loss to follow-up — that may make results systematically misleading. Inconsistency refers to unexplained heterogeneity in results across studies: if some trials show a large benefit and others show harm, the pooled estimate may be precise but the true effect is uncertain. Indirectness refers to evidence that does not directly answer the question of interest — for example, evidence from a different population, a surrogate outcome instead of a patient-important outcome, or an indirect comparison between two drugs rather than a head-to-head trial. Imprecision refers to wide confidence intervals that include both clinically important benefit and clinically important harm; when the evidence is compatible with both a large treatment effect and a negligible one, certainty is low regardless of the point estimate.

Three factors can raise certainty above its starting level. A large effect — specifically an RR of less than 0.5 or greater than 2.0 — is less likely to be explained by residual confounding than a modest effect, and therefore increases certainty in observational evidence. A dose-response relationship, in which greater exposure to the intervention produces proportionally greater benefit (or harm), provides biological plausibility that strengthens causal inference. Finally, when all plausible residual confounders would tend to reduce the observed treatment effect rather than exaggerate it, the true effect is at least as large as the observed effect, which increases confidence in the direction if not the magnitude of the finding.

The four levels of GRADE certainty carry specific interpretations. High certainty means the true effect is close to the estimate and future research is very unlikely to change confidence in the estimate — a high-certainty finding can be used as the basis for a strong recommendation. Moderate certainty means the estimate is probably close to the true effect, but there is a meaningful possibility that it could be substantially different. Low certainty means there is limited confidence that the estimate is close to the true effect; the true effect may be substantially different. Very low certainty means there is very little confidence in the estimate; the true effect is likely to be substantially different. These distinctions have direct implications for clinical decision-making: a low-certainty estimate should generate a conditional recommendation, not a strong one.

Worked Example 3.1 — Applying GRADE to a Drug Question

Consider the question of whether aspirin reduces the risk of colorectal cancer in adults. The available evidence includes several large RCTs (e.g., WHS, PHS, ASPREE) and numerous prospective cohort studies. A GRADE assessment would proceed as follows: RCT evidence starts at high certainty. Assessment of risk of bias finds that the RCTs generally have adequate randomisation and blinding, so no downgrading for risk of bias. However, the RCTs were not designed with colorectal cancer as a primary outcome, meaning cancer was a secondary or tertiary endpoint — a form of indirectness (the outcomes were not measured with optimal sensitivity). Additionally, the confidence intervals from individual trials are wide, introducing imprecision. The evidence is therefore downgraded by one level each for indirectness and imprecision, arriving at moderate certainty. The GRADE summary would state: “Moderate certainty evidence suggests that aspirin use reduces the incidence of colorectal cancer, but uncertainty remains about the optimal dose, duration, and the balance with bleeding risk.”

Worked Example 3.2 — Background and Foreground PICO Formulation

A clinical pharmacist is asked by a resident: “Should we start a statin in this 75-year-old patient with no prior cardiovascular events but a 10-year ASCVD risk of 8%?” This is a foreground question. The PICO formulation is: In adults aged 75 and older without prior cardiovascular disease (P), does statin therapy (I) compared with no statin (C) reduce all-cause mortality and cardiovascular events (O)? Adding T (time) — over 5–10 years of follow-up — produces a PICOT question. Adding S (study design) — specifically from RCTs or systematic reviews — produces PICOS. The GRADE-rated evidence for primary prevention with statins in adults over 75 is rated as low to moderate certainty because major RCTs either excluded this age group or had insufficient power in the elderly subgroup. This uncertainty directly informs a conditional rather than a strong recommendation in guidelines, and the clinical pharmacist should communicate this nuance to the resident.


Chapter 4: Critical Appraisal of Randomised Controlled Trials

In 1987, the CAST (Cardiac Arrhythmia Suppression Trial) was halted early when the drugs encainide and flecainide, prescribed because they reliably suppressed ventricular ectopy — a surrogate outcome believed to predict mortality — were found to more than double the risk of sudden cardiac death compared with placebo. The drugs worked on the surrogate outcome; they killed patients on the clinical outcome. The CAST trial became a landmark in the history of clinical research methodology because it demonstrated that well-designed RCTs are not just academic exercises — they are the infrastructure that prevents clinicians from causing systematic harm with plausible-sounding treatments. This chapter equips you to read RCTs with the same critical rigour that the CAST investigators brought to their design.

Why the RCT is the Workhorse of Therapeutic Evidence

The central methodological virtue of the randomised controlled trial is that it controls for confounding — the systematic distortion of the exposure-outcome relationship by a third variable that is associated with both. In observational studies of drug therapy, the patients who receive a drug are almost never exchangeable with those who do not. Sicker patients are more likely to receive more intensive treatment (confounding by indication), and healthier patients are more likely to be prescribed a drug with a favourable risk-benefit profile (channelling bias). These systematic differences make it difficult or impossible to attribute differences in outcomes to the treatment itself. Random allocation removes this problem: when participants are assigned to treatment or control groups by a random process, all characteristics — both measured and unmeasured — are distributed approximately equally between groups, meaning that any difference in outcomes can be causally attributed to the intervention with a quantifiable degree of statistical confidence.

Blinding — concealing treatment assignment from participants, clinicians, and outcome assessors — addresses ascertainment bias, the systematic difference in how outcomes are measured or reported depending on knowledge of treatment assignment. A patient who knows they received the active drug may report symptom improvements more readily than one who received placebo; a clinician who knows a patient received an experimental drug may assess ambiguous clinical signs more optimistically; an adjudication committee that knows treatment assignment may resolve borderline events differently. Double-blind trials conceal assignment from both participants and clinicians; triple-blind trials additionally conceal it from the outcome assessment committee. The feasibility of blinding depends on the nature of the intervention — comparing two pills is easily blinded; comparing surgery with medical management is not.

Intention-to-treat (ITT) analysis preserves the integrity of the randomisation by analysing participants in the groups to which they were originally allocated, regardless of whether they completed treatment or crossed over to the other group. If patients who drop out or switch treatments are excluded from the analysis, the resulting per-protocol population is no longer truly randomised — the very patients most likely to have adverse effects or treatment failures are also the most likely to stop treatment, and removing them selectively from the analysis inflates apparent efficacy and reduces apparent harm. ITT analysis is conservative; it may underestimate the efficacy of a treatment under ideal adherence conditions, but it provides an unbiased estimate of the real-world effect of offering the treatment.

CONSORT 2010

The Consolidated Standards of Reporting Trials (CONSORT) 2010 checklist provides 25 items across six sections — title/abstract, introduction, methods, results, discussion, and other information — that describe the minimum information that must be reported for an RCT to be fully interpretable. CONSORT was developed in response to evidence that poorly reported trials systematically overestimate treatment effects, because trials that lack detailed descriptions of randomisation, blinding, and analysis methods tend to conceal rather than disclose methodological limitations. Most major biomedical journals now require CONSORT-compliant reporting as a condition of publication.

The CONSORT checklist requires, among other things: a clear description of the sequence generation method used for randomisation (e.g., computer-generated random numbers); the allocation concealment mechanism (e.g., sequentially numbered opaque sealed envelopes or centralised telephone randomisation); who was blinded and how blinding was maintained; whether the primary outcome was pre-specified and how it was defined; a CONSORT flow diagram showing how many participants were assessed for eligibility, randomly assigned, treated, and included in the final analysis; baseline demographic and clinical characteristics stratified by allocation; all outcomes pre-specified in the protocol, not just those that were statistically significant; and a structured discussion that includes a statement about the generalisability of the findings and the sources of potential bias.

When appraising a published RCT, the CONSORT checklist functions as a road map for identifying what the investigators did and did not disclose. A trial that does not describe its allocation concealment method or that shows unexplained missing data in the flow diagram warrants downgraded certainty under GRADE.

RoB 2: Risk of Bias Assessment

The Cochrane Risk of Bias tool version 2 (RoB 2) provides a structured framework for assessing the risk of bias in parallel-group randomised trials across five domains. Each domain is evaluated using a series of signalling questions, and the domain is then assigned a judgement of low risk of bias, some concerns, or high risk of bias. The five domains are as follows.

The randomisation process domain assesses whether the sequence was truly random, whether allocation was adequately concealed before assignment (so that investigators and participants could not predict or influence the next assignment), and whether imbalances at baseline suggest potential problems with the randomisation. The deviations from intended interventions domain assesses whether participants and personnel were aware of treatment assignments during the trial and whether any such awareness led to differential application of co-interventions or to differential dropout. For ITT analysis to be appropriate, deviations should be unrelated to the assigned intervention. The missing outcome data domain assesses whether data are available for all or most randomised participants, whether missingness is differential between groups, and whether the statistical analysis used appropriate methods (such as multiple imputation) to handle missing data. The measurement of outcomes domain assesses whether the outcome was measured in a way that could be influenced by knowledge of treatment assignment — most relevant for subjective outcomes like pain scores or quality of life, less relevant for hard outcomes like death. The selection of reported results domain assesses whether the analysis and reporting were consistent with a pre-specified plan: is the primary outcome as analysed the same as the primary outcome pre-registered in the trial protocol? Selective outcome reporting — publishing only statistically significant results — is one of the most pervasive and damaging sources of bias in the clinical literature.

Effect Measures and Their Interpretation

The quantification of treatment effects in RCTs uses a family of measures that are not interchangeable and that carry different implications for clinical decision-making. Understanding these measures is not merely statistical literacy; it is a prerequisite for translating evidence into practice.

The relative risk (RR), also called the risk ratio, is the ratio of the probability of the outcome in the treatment group to the probability of the outcome in the control group. An RR of 0.5 means that participants receiving the treatment have half the risk of the outcome compared with controls. The relative risk reduction (RRR) is 1 − RR, expressed as a percentage; an RR of 0.5 corresponds to an RRR of 50%. The odds ratio (OR) is the ratio of the odds of the outcome in the treatment group to the odds in the control group; it approximates the RR when the outcome is rare (below 10%) but overestimates it when the outcome is common. The hazard ratio (HR) is used in time-to-event (survival) analyses and represents the instantaneous rate of the outcome in the treatment group relative to the control group at any given point in time; like the RR, an HR below 1.0 favours the treatment.

Relative measures are important for understanding the biological plausibility and strength of an effect, but they are misleading for clinical decision-making when presented in isolation, because they do not convey the absolute magnitude of benefit. The absolute risk reduction (ARR), also called the absolute risk difference, is the difference in event rates between groups: ARR = risk in control − risk in treatment. The number needed to treat (NNT) is the reciprocal of the ARR (NNT = 1/ARR) and represents the number of patients who must receive the treatment instead of the comparator for one additional patient to benefit.

Worked Example 4.1 — Calculating Effect Measures

In a hypothetical RCT of treatment X versus placebo for prevention of stroke in patients with atrial fibrillation, the stroke rate in the treatment arm is 5% over 2 years and in the placebo arm is 10% over 2 years.

RR = 5% / 10% = 0.50 (treatment group has half the risk of stroke).

RRR = 1 − 0.50 = 0.50, or 50% (treatment reduces relative stroke risk by 50%).

ARR = 10% − 5% = 5% (absolute reduction of 5 stroke events per 100 patients treated for 2 years).

NNT = 1 / 0.05 = 20 (20 patients must be treated for 2 years to prevent one stroke).

Note that a 50% relative risk reduction sounds dramatic, but the NNT of 20 provides the clinically actionable figure: the pharmacist discussing this drug with a patient can say “if 20 people like you take this drug for 2 years, one stroke will be prevented.” The patient can then weigh this benefit against the drug’s cost, adverse effects, and their own preferences.

Worked Example 4.2 — Reading a CONSORT Flow Diagram

A published RCT of rivaroxaban versus warfarin in AF enrolled 14,264 patients and reports its CONSORT flow diagram as follows: 14,624 patients were assessed for eligibility; 360 were excluded (310 did not meet inclusion criteria, 50 declined participation); 14,264 were randomised (7,131 to rivaroxaban, 7,133 to warfarin); 6,958 rivaroxaban patients and 6,979 warfarin patients completed the study. The appraisal question is: are the 173 and 154 participants who did not complete the study in each arm comparable? Were their baseline characteristics similar to completers? Were their reasons for withdrawal related to treatment (adverse effects) or to factors unrelated to treatment? If a disproportionate number of participants in the rivaroxaban arm withdrew due to bleeding events compared with warfarin, the per-protocol analysis would systematically underestimate rivaroxaban’s bleeding risk, and an ITT analysis would be needed to provide an unbiased estimate.

Confidence Intervals and P-values

A 95% confidence interval (CI) represents a range of values consistent with the data under the assumptions of the statistical model; if the experiment were repeated many times and a 95% CI calculated each time, approximately 95% of those intervals would contain the true population parameter. A common misinterpretation is that the 95% CI contains the true value with 95% probability — this is a Bayesian interpretation not justified by frequentist statistics. The CI is more informative than a p-value because it conveys both statistical significance (a CI that excludes the null value of 1.0 for a ratio or 0 for a difference) and the magnitude and precision of the effect (narrow CI = more precise; wide CI = more uncertainty).

A p-value quantifies the probability of observing a test statistic as extreme as, or more extreme than, the one observed, assuming the null hypothesis is true. A p-value of 0.03 does not mean there is a 3% chance the null hypothesis is true; it means that if the null hypothesis were true, a result this extreme or more extreme would occur 3% of the time by chance alone. The conventional threshold of p < 0.05 as the criterion for statistical significance is arbitrary, has no special scientific meaning, and conflates statistical significance with clinical importance. A trial powered to detect a very small difference may achieve p < 0.001 for an effect too small to matter clinically; a trial underpowered for its sample size may fail to achieve p < 0.05 despite a clinically meaningful treatment effect. Statistical significance and clinical significance must always be evaluated independently.


Chapter 5: Observational Study Designs

In 1961, Frances Kelsey of the U.S. Food and Drug Administration withheld approval of thalidomide in the United States, citing insufficient data on its safety. In Europe and Canada, thalidomide had already been approved, and by 1962 it was apparent from a series of case reports and a case-control study by McBride and Lenz that the drug caused phocomelia — severe limb reduction defects — in the offspring of women who took it during the first trimester of pregnancy. No randomised trial had been conducted, and no randomised trial could ethically have been conducted once the signal emerged. This case illustrates why observational research is not a scientific compromise but an epistemological necessity: for many of the most important questions in pharmacology, observational designs are the only ethical and practical option.

Why Observational Data Are Necessary

Randomised controlled trials are conducted under conditions of equipoise — genuine uncertainty about which treatment is better — and ethical constraints that systematically exclude the patients about whom clinical questions are most urgent. Pregnant women are excluded from virtually all phase III trials for ethical reasons. The elderly with multiple comorbidities are excluded because their complexity would confound interpretation of results, even though they represent the majority of patients who will ultimately receive the drug. Patients with severe renal or hepatic impairment are excluded because pharmacokinetic variability makes dosing unpredictable. Rare adverse effects — occurring in 1 in 10,000 patients or less — cannot be detected in trials of even 20,000 participants but become apparent in pharmacovigilance databases monitoring millions of drug exposures. Long-term outcomes over 10–20 years of follow-up are inaccessible to most RCTs but are routinely estimated from cohort studies. Observational research fills these gaps not by providing weaker evidence but by providing the appropriate evidence for the questions that cannot be answered otherwise.

Cohort Studies

A cohort study identifies a group of individuals at a defined point in time, classifies them by their exposure to a factor of interest (e.g., a medication, a lifestyle behaviour, an environmental exposure), and follows them over time to compare the incidence of outcomes between the exposed and unexposed. Prospective cohort studies define and follow participants forward in time from exposure, while retrospective cohort studies use historical records to reconstruct a cohort from the past and follow them to an outcome that has already occurred.

The Nurses’ Health Study, begun in 1976 and still ongoing, is one of the most productive cohort studies in history, having generated evidence about the long-term effects of oral contraceptives, hormone therapy, diet, and lifestyle on cancer, cardiovascular disease, and mortality in women. The Framingham Heart Study, begun in 1948 in Framingham, Massachusetts, identified the major cardiovascular risk factors — hypertension, hypercholesterolaemia, cigarette smoking, diabetes, obesity — through decades of prospective observation in a community cohort. Both studies illustrate the strengths of cohort designs: they can estimate incidence rates (not just odds), they can examine multiple outcomes from the same exposure, and they can capture temporal relationships between exposure and outcome with high fidelity.

The principal limitation of cohort studies is confounding. Participants who choose or receive a particular drug exposure are almost always systematically different from those who do not. Confounding by indication — in which sicker patients are more likely to receive treatment, making the treatment appear harmful — is the canonical threat in pharmacoepidemiology. A cohort study showing that patients who received prophylactic anticoagulation in the ICU had higher mortality than those who did not is almost certainly confounded by indication: the patients who received anticoagulation were likely at higher thrombotic risk to begin with. Loss to follow-up is a secondary threat; if participants who are lost to follow-up are systematically different from those who remain in the study, the observed outcomes may not reflect those of the full cohort.

Case-Control Studies

A case-control study begins by identifying individuals who have already developed the outcome of interest (cases) and a separate group of individuals who have not (controls), then looks backward in time to compare the frequency of prior exposure between the two groups. Case-control designs are uniquely efficient for studying rare outcomes, because the investigator does not need to wait for rare events to occur in a large cohort — the cases already have the outcome and can be identified through disease registries, hospital records, or clinical databases.

The canonical example is the epidemiological investigation of diethylstilboestrol (DES) and clear-cell adenocarcinoma of the vagina. Herbst and colleagues (1971) identified eight cases of this rare cancer in young women — a cancer previously almost unheard of in women under 40 — and matched them with 32 controls. They found that seven of the eight cases had been exposed to DES in utero, compared with none of the 32 controls. This finding, based on a tiny sample in a perfectly designed case-control study, established a causal link between intrauterine DES exposure and vaginal clear-cell carcinoma that would have been impossible to detect in any other way.

The effect measure in case-control studies is the odds ratio — the ratio of the odds of exposure among cases to the odds of exposure among controls. The OR approximates the RR when the outcome is rare, but overestimates it for common outcomes. Case-control studies are vulnerable to recall bias (cases may remember and report past exposures more completely than controls, especially when exposure is suspected to be related to the outcome) and to selection bias in the choice of controls (controls should be representative of the population from which the cases arise; if they are not, the OR is distorted).

Cross-Sectional Studies and the Ecological Fallacy

Cross-sectional studies measure both exposure and outcome at a single point in time in a defined population, providing estimates of prevalence (rather than incidence) and associations between variables. They are most useful for describing the distribution of diseases and risk factors in populations and for generating hypotheses. Ecologic studies use group-level rather than individual-level data — for example, correlating national average sugar consumption with national rates of type 2 diabetes. Ecologic studies are particularly susceptible to the ecological fallacy: the erroneous inference that an association observed at the group level reflects an association at the individual level. A country with high sugar consumption and high diabetes rates does not imply that the same individuals who consume the most sugar have the highest diabetes risk — the association could be driven entirely by confounders that differ systematically between countries.

Confounding and Its Control

Confounding is the distortion of a true exposure-outcome association by a variable that is associated with both the exposure and the outcome and is not an intermediate step in the causal pathway between them. In pharmacoepidemiology, confounding by indication is ubiquitous: prescribers select drugs based on patient characteristics, and those very characteristics are also determinants of outcomes. A study finding that prophylactic low-molecular-weight heparin is associated with higher mortality in hospitalised patients is almost certainly confounded by indication — patients at high thrombotic risk (who have the worst prognosis) are precisely those selected to receive prophylaxis.

Methods to control confounding at the design stage include restriction (limiting the study to patients within a narrow range of the confounding variable, e.g., studying only patients with the same CHADS₂ score) and matching (selecting controls who share important confounding characteristics with the cases). At the analysis stage, multivariable regression — logistic regression for binary outcomes, Cox proportional hazards regression for time-to-event outcomes — adjusts for measured confounders by including them as covariates in the model. Propensity score methods construct a single score representing the probability of exposure given all measured covariates, then use this score for matching, stratification, or weighting to create a pseudo-randomised comparison. However, all these methods control only for measured confounders; unmeasured confounding remains an irreducible limitation of all observational research.

Worked Example 5.1 — STROBE Appraisal of a Cohort Study

The STROBE (Strengthening the Reporting of Observational studies in Epidemiology) checklist provides 22 items applicable to cohort, case-control, and cross-sectional studies, with modifications for each design. When appraising a cohort study of NSAID use and acute kidney injury in elderly patients, a pharmacist would check: whether the cohort was defined clearly (enrolled from a specific database with a specific entry criterion); whether the exposure (NSAID use) was measured objectively (dispensing records) or by self-report (susceptible to recall bias); whether potential confounders including baseline renal function, diuretic use, and ACE inhibitor use were measured and adjusted; whether loss to follow-up was reported and whether its magnitude and pattern could bias results; and whether the statistical analysis was appropriate for the outcome type (time-to-event analysis with Cox regression for the first episode of AKI).

Worked Example 5.2 — ROBINS-I Assessment

ROBINS-I (Risk of Bias in Non-randomised Studies of Interventions) provides seven domains for assessing bias in observational intervention studies: confounding, selection of participants, classification of interventions, deviations from intended interventions, missing data, measurement of outcomes, and selection of reported results. In a retrospective cohort study comparing new versus established anticoagulant users for stroke prevention in AF, ROBINS-I would flag the confounding domain as the most critical: new users are likely to have different baseline clinical profiles than prevalent users who have tolerated treatment for years (the healthy-user and depletion-of-susceptibles biases). The investigator’s choice to use new-user design and active comparator reduces but does not eliminate confounding, and the assessment would likely result in a “moderate” risk of bias in the confounding domain.


Chapter 6: Systematic Reviews and Meta-Analysis

In 1992, a meta-analysis by Lau and colleagues examined the cumulative data from trials of intravenous streptokinase for acute MI as the trials were published over time. They found that by 1973 — two decades before the ISIS-2 trial was published — the cumulative evidence already showed a statistically significant reduction in mortality that met the conventional threshold. Had a systematic review and meta-analysis been performed in 1973, approximately 40,000 deaths per year in the U.S. and Europe might have been prevented annually during the intervening two decades. This retrospective analysis became the founding argument for both the prospective registration of systematic reviews and the routine commissioning of comprehensive evidence syntheses before new large trials are initiated.

What Is a Systematic Review?

A systematic review is a synthesis of evidence that uses explicit, reproducible methods to identify, select, critically appraise, and summarise all research relevant to a pre-specified question. The defining feature that distinguishes a systematic review from a narrative or expert review is the existence of a protocol — a pre-specified document that describes the question, the eligibility criteria, the search strategy, the data extraction procedures, the risk of bias assessment tool, and the planned synthesis approach — before any data are collected. The protocol is typically registered in PROSPERO (the international prospective register of systematic reviews) to prevent post-hoc changes to methods that could introduce bias.

Narrative reviews, by contrast, are written by experts who select and synthesise literature based on personal knowledge and judgment. Narrative reviews are valuable for providing accessible overviews of a field, but they are systematically biased toward the reviewer’s prior beliefs, toward prominent and easily accessible studies, and toward positive findings. A narrative review and a systematic review on the same topic can reach opposite conclusions because the systematic review identified studies the narrative reviewer had never encountered.

PRISMA 2020

The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 checklist contains 27 items covering every aspect of systematic review conduct and reporting, from title and registration through search strategy, eligibility criteria, data extraction, risk of bias assessment, and synthesis. The PRISMA flow diagram — a four-box diagram showing the number of records identified, the number screened, the number assessed for eligibility, and the number included — is the most immediately recognisable element of systematic review reporting. Reading the PRISMA flow diagram critically reveals whether the review’s coverage was comprehensive (a very small number of records identified relative to the question’s scope is suspicious) and whether exclusion reasons were appropriate and consistently applied.

When Is Meta-Analysis Appropriate?

Meta-analysis is the quantitative pooling of effect estimates from multiple studies into a single summary estimate. It is not always appropriate and should not be performed simply because multiple studies exist on a topic. Before pooling, the reviewer must assess clinical homogeneity (are the populations, interventions, comparators, outcomes, and follow-up periods similar enough that pooling makes sense?) and methodological homogeneity (are the study designs and risk of bias profiles similar?). Pooling five RCTs of different doses of a drug in different populations against different comparators, measured at different time points, produces a summary estimate that is precisely calculated but clinically meaningless.

The fixed-effects model, sometimes called the common-effect model, assumes that all studies in the meta-analysis are estimating the same true underlying effect, and that variation between study results is due entirely to sampling error (chance). Under this assumption, the summary estimate is a weighted average of study-specific estimates, with larger studies given more weight because they have smaller sampling error. The random-effects model, developed by DerSimonian and Laird, assumes that the true effect size varies across studies — that different populations, doses, and follow-up periods genuinely produce different true effects, and that the studies in the meta-analysis are a sample of all possible true effects from a distribution. The random-effects model estimates not only the average effect but also the between-study variance (τ²). In practice, random-effects models are preferred when there is meaningful between-study heterogeneity, because fixed-effects models produce overly precise estimates that do not reflect genuine uncertainty about the treatment effect in the population of interest.

Reading a Forest Plot

The forest plot is the visual centrepiece of a meta-analysis and conveys more information per unit area than almost any other scientific figure. Each row of the plot represents one included study. The square at the horizontal centre of each row represents the study’s point estimate of the effect (e.g., an odds ratio); the area of the square is typically proportional to the weight that study received in the pooled analysis, so larger squares indicate studies that contributed more to the summary estimate. The horizontal line extending from each square represents the 95% confidence interval of that study’s estimate; a longer line indicates greater uncertainty. The vertical line at the centre of the plot represents the null effect (OR = 1.0, or RR = 1.0, or MD = 0); estimates to the left of the null favour the treatment, and estimates to the right favour the control (or vice versa, depending on the outcome direction). The diamond at the bottom of the plot represents the pooled summary estimate; its horizontal width represents the 95% CI of the pooled estimate.

Worked Example 6.1 — Interpreting a Forest Plot

A meta-analysis includes five RCTs examining whether proton pump inhibitors (PPIs) reduce the risk of upper GI bleeding in patients on dual antiplatelet therapy. The forest plot shows: Study 1 (n=200): OR 0.45, 95% CI 0.20–1.02 (confidence interval just crosses 1.0, not significant individually); Study 2 (n=850): OR 0.38, 95% CI 0.25–0.58 (clearly significant, large square due to large sample); Study 3 (n=320): OR 0.55, 95% CI 0.31–0.97; Study 4 (n=110): OR 0.70, 95% CI 0.22–2.20 (very wide CI, small study); Study 5 (n=640): OR 0.42, 95% CI 0.28–0.63. The pooled diamond shows OR 0.43, 95% CI 0.33–0.56. All five estimates lie on the same side of the null and their CIs largely overlap, suggesting relatively low heterogeneity. The pooled estimate favours PPI use (OR < 1.0, CI excludes 1.0). The I² statistic reported below the figure is 18%, indicating low heterogeneity.

Heterogeneity: I² and τ²

The I² statistic quantifies the proportion of total variation across studies attributable to between-study heterogeneity rather than chance. An I² of 0% means all variation is consistent with sampling error; an I² of 75% means 75% of the observed variation reflects genuine differences between studies rather than chance. The conventional thresholds — I² below 25% as low, 25–75% as moderate, and above 75% as high — are guidelines rather than absolute rules; the clinical importance of heterogeneity depends on the direction and magnitude of variation, not just its statistical expression. The τ² (tau-squared) statistic estimates the variance of the true effect distribution in a random-effects model; a large τ² means the true effect varies widely across the types of studies and populations in the meta-analysis, even if the average effect is precisely estimated.

When heterogeneity is high, subgroup analysis and meta-regression can be used to explore potential explanators — for example, whether effect size differs by dose, duration, population age, or risk of bias. These analyses are hypothesis-generating rather than confirmatory, because they are typically not pre-specified and involve multiple comparisons.

Funnel Plots and Publication Bias

The funnel plot graphs each study’s effect estimate on the x-axis against a measure of its precision (typically its standard error or sample size) on the y-axis. In the absence of bias, studies should scatter symmetrically around the pooled estimate in a funnel shape, with small studies (wide spread) at the bottom and large studies (narrow spread) at the top. Asymmetry — specifically, a gap in the lower-left corner of the plot where small studies with negative or null results are missing — suggests either publication bias (small negative studies are not published) or small-study effects (small studies are conducted in higher-risk populations or with higher doses, producing systematically larger effects). Egger’s test provides a statistical test for funnel plot asymmetry. The Trim and Fill method imputes the missing studies, adjusting the pooled estimate to account for potential publication bias. However, the Cochrane Handbook recommends against over-interpreting funnel plots when fewer than ten studies are included, because the test has low power with fewer studies.

Network Meta-Analysis

Network meta-analysis (NMA) extends pairwise meta-analysis to settings where multiple treatments have been compared with each other in a web of direct (head-to-head) and indirect (through a common comparator) comparisons. For example, if Drug A has been compared with placebo in ten trials and Drug B has been compared with placebo in eight trials, but Drug A and Drug B have never been directly compared, NMA can estimate the relative effect of Drug A versus Drug B by using placebo as the linking node in the network. NMA simultaneously incorporates all available evidence and can rank treatments using metrics such as the Surface Under the Cumulative Ranking Curve (SUCRA), which gives each treatment a score between 0 and 100 representing its probability of being the best treatment. NMA requires the assumption of transitivity — that the populations and effect modifiers are similar across all trials in the network, so that indirect comparisons are valid — and consistency, which is testable, meaning that the direct and indirect estimates for any given comparison should not differ beyond what chance would predict.

Worked Example 6.2 — Interpreting a PRISMA Flow Diagram

A systematic review of antithrombotic therapy in AF begins with 4,382 records identified from searching MEDLINE, Embase, CENTRAL, and ClinicalTrials.gov. After removing 1,243 duplicates, 3,139 records are screened by title and abstract; 2,890 are excluded as clearly irrelevant (wrong population, wrong intervention, wrong outcome). The remaining 249 full-text articles are assessed for eligibility; 193 are excluded (112 wrong study design, 41 wrong comparator, 40 wrong outcome definition). Fifty-six studies are included in the qualitative synthesis, of which 32 with comparable interventions are included in the meta-analysis. The appraisal question is whether the exclusions are justified and whether the final 56 studies constitute a representative sample of available evidence. A pharmacist reading this diagram would note that the large number of “wrong comparator” exclusions (41) suggests the review applied a narrow comparator definition; this could be appropriate if specificity was the goal or could represent indirectness if the excluded comparators are clinically relevant.


Chapter 7: Pharmacoeconomics and Pharmacoepidemiology

In 2014, the hepatitis C treatment sofosbuvir was introduced at a price of approximately $84,000 USD for a 12-week course. It was also substantially more effective than existing therapies, achieving sustained virological response rates above 90% compared with approximately 40–50% for interferon-based regimens. Public and private payers faced a decision that could not be resolved by clinical evidence alone: the drug worked, but was its benefit worth its cost given finite healthcare budgets? Answering that question required a formal pharmacoeconomic analysis, and understanding how such analyses are constructed and interpreted is an essential competency for pharmacists serving on formulary committees or advising healthcare systems.

Why Economic Evidence Matters

Healthcare resources are finite in all systems, regardless of funding model. A dollar spent on one intervention is unavailable for another. Economic analyses help decision-makers allocate resources to interventions that produce the greatest health benefit per unit of spending — a goal known as allocative efficiency. Pharmacists serving on Pharmacy and Therapeutics (P&T) committees are routinely asked to evaluate not only whether a drug works but whether it provides value for money relative to alternatives. The tools of pharmacoeconomics provide a formal vocabulary for answering that question transparently and consistently.

Types of Economic Analysis

Cost-minimisation analysis (CMA) assumes that two or more alternatives produce equivalent clinical outcomes and asks only which is cheaper. CMA is appropriate when there is strong evidence of equivalence — for example, comparing two bioequivalent generic formulations of the same drug. If there is any meaningful clinical difference between alternatives, CMA is inappropriate, and the clinical difference must be quantified and incorporated into the analysis.

Cost-effectiveness analysis (CEA) measures outcomes in natural clinical units — events prevented, life-years gained, infections cured, fractures avoided — and computes a cost-effectiveness ratio expressing the cost per unit of clinical outcome. A CEA comparing Drug A and Drug B might conclude that Drug A costs $15,000 more per year but prevents 0.3 additional strokes per patient per year, producing an incremental cost-effectiveness ratio (ICER) of $50,000 per stroke prevented. CEA is widely used for clinical outcomes but cannot be used to compare across disease areas, because “cost per fracture prevented” is not directly comparable to “cost per infection cured.”

Cost-utility analysis (CUA) addresses this limitation by measuring outcomes in quality-adjusted life-years (QALYs), a measure that combines quantity and quality of life. One QALY represents one year of life in perfect health. A year of life with moderate disability may be assigned a utility weight of 0.7, contributing 0.7 QALYs. The ICER in a CUA is expressed as cost per QALY gained, enabling comparison across all disease areas. In Canada, CADTH (the Canadian Agency for Drugs and Technologies in Health) uses a willingness-to-pay threshold of approximately $50,000 CAD per QALY as a general benchmark, though this threshold is not absolute. Many European health technology assessment agencies use similar thresholds.

Cost-benefit analysis (CBA) converts all outcomes into monetary values — assigning a dollar value to a life-year or a prevented adverse event — enabling direct comparison of costs and benefits in a common unit. CBA allows calculation of a benefit-to-cost ratio or a net monetary benefit. However, the monetisation of health outcomes raises ethical objections and is rarely used in clinical pharmacoeconomics.

The ICER is the central metric of CUA and CEA, computed as the difference in cost between two interventions divided by the difference in their effectiveness:

\[ \text{ICER} = \frac{\Delta \text{Cost}}{\Delta \text{Effect}} = \frac{\text{Cost}_\text{new} - \text{Cost}_\text{comparator}}{\text{Effect}_\text{new} - \text{Effect}_\text{comparator}} \]

An intervention is considered cost-effective if its ICER falls below the willingness-to-pay threshold; cost-ineffective if it exceeds it; and dominant (preferred regardless of threshold) if it is both cheaper and more effective than the comparator.

Decision Models: Trees and Markov Models

Because direct evidence on long-term outcomes and costs is rarely available from clinical trials of practical duration, economic analyses typically use mathematical decision models that synthesise evidence from multiple sources into a single coherent framework. Decision trees are best suited for short time horizons with a small number of mutually exclusive branches: each decision node represents a choice, each chance node represents a probabilistic event, and the expected value of each strategy is computed by multiplying probabilities along each branch by their associated outcomes. For longer time horizons — chronic diseases in which patients cycle repeatedly through different health states — Markov models are more appropriate. In a Markov model, patients can occupy one of a finite number of mutually exclusive health states (e.g., “stable AF,” “stroke,” “major bleeding,” “dead”), and transition probabilities govern the likelihood of moving between states during each model cycle. Costs and utilities are assigned to each state and accumulated over time.

Both model types are subject to the principle of “garbage in, garbage out”: the outputs are only as reliable as the input parameters. Sensitivity analysis addresses this limitation by systematically varying parameters over plausible ranges to test whether the conclusion (cost-effective vs. not) is robust. One-way sensitivity analyses vary one parameter at a time; tornado diagrams display which parameters have the greatest impact on the ICER. Probabilistic sensitivity analysis (PSA) assigns probability distributions to all parameters and runs Monte Carlo simulations, generating a cloud of ICER estimates that reflects overall parameter uncertainty.

Worked Example 7.1 — P&T Committee ICER Calculation

A P&T committee is evaluating a new oral anticoagulant (NOAC) versus warfarin for non-valvular AF. Based on published trial data and costing information, the NOAC group has an annual stroke rate of 1.7% and costs $3,200/year in drug costs plus monitoring. Warfarin has an annual stroke rate of 2.5% and costs $800/year in drug costs plus an estimated $600/year in INR monitoring, for a total of $1,400/year. Each prevented stroke is estimated to result in an average QALY gain of 0.30 over the patient’s remaining lifetime.

Incremental cost = $3,200 − $1,400 = $1,800/year per patient.

Incremental effect = (2.5% − 1.7%) strokes prevented per year = 0.8 strokes per 100 patients per year = 0.008 strokes per patient per year.

QALY gain = 0.008 × 0.30 = 0.0024 QALYs per patient per year.

ICER = $1,800 / 0.0024 = $750,000 per QALY. This far exceeds the $50,000 threshold, suggesting the NOAC does not represent good value at this price point — a finding that should prompt negotiations with the manufacturer or restriction of use to high-risk patients for whom the absolute benefit is larger.

Pharmacovigilance and Post-Marketing Surveillance

Pharmacovigilance refers to the science and activities relating to the detection, assessment, understanding, and prevention of adverse effects or any other drug-related problem. Because clinical trials are typically powered to detect primary efficacy outcomes and are rarely large enough or long enough to detect uncommon adverse effects, the safety profile of a drug at the time of marketing approval is always incomplete. Post-marketing surveillance fills this gap through spontaneous reporting systems — Canada Vigilance in Canada, MedWatch in the United States, and the Yellow Card scheme in the United Kingdom — to which healthcare professionals and consumers submit reports of suspected adverse drug reactions.

Disproportionality analysis applies mathematical signal detection to spontaneous reporting databases. The proportional reporting ratio (PRR) compares the proportion of all adverse event reports for a given drug that involve a particular adverse event with the proportion of all reports for all other drugs that involve the same event:

\[ \text{PRR} = \frac{(a / (a + b))}{(c / (c + d))} \]

where \(a\) is the number of reports of the adverse event of interest for the drug of interest, \(b\) is the number of all other reports for the drug of interest, \(c\) is the number of reports of the adverse event for all other drugs, and \(d\) is the number of all other reports for all other drugs. A PRR significantly above 1.0 (typically 2.0 or higher with at least three reports) constitutes a signal requiring further investigation. The Weber effect describes the tendency for adverse event reporting rates to peak in the years immediately after a drug’s launch — when the drug is new and clinicians are most vigilant — and to decline thereafter as the drug becomes familiar, independent of any change in the true incidence of adverse events.

Worked Example 7.2 — Real-World Evidence and RECORD

Electronic health records (EHR) and administrative claims databases are major sources of real-world data (RWD) — the raw material for real-world evidence (RWE). A pharmacist reviewing a RWE study on SGLT-2 inhibitor use and risk of amputation would apply the RECORD (REporting of studies Conducted using Observational Routinely-collected health Data) checklist, which extends STROBE with specific items relevant to database studies: whether the database was described adequately, how the drug exposure was defined (dispensing records vs. prescribing records), how the outcome (amputation) was identified (ICD codes, and whether they were validated against clinical records), and what steps were taken to address the time-varying nature of drug exposure. RWE from EHR databases can generate regulatory-grade safety evidence when its methods are sufficiently rigorous, as demonstrated by the FDA’s Sentinel System.


Chapter 8: Clinical Practice Guidelines and Communicating Evidence

In 2008, a 72-year-old woman with type 2 diabetes, hypertension, and chronic kidney disease stage 3 was discharged from a tertiary care hospital with prescriptions for fourteen medications, each supported by a guideline recommendation drawn from a disease-specific randomised trial. No individual guideline accounted for the interaction between her multiple conditions, the cumulative pill burden, or her stated preference to take no more than four medications per day. This scenario — the multimorbid patient caught between single-disease guidelines — illustrates why clinical practice guidelines must be understood not merely as authoritative pronouncements but as tools that require skilled interpretation and communication.

What Is a Clinical Practice Guideline?

A clinical practice guideline (CPG) is a systematically developed statement intended to assist practitioner and patient decisions about appropriate healthcare for specific clinical circumstances. This definition from the U.S. Institute of Medicine (now the National Academy of Medicine) distinguishes CPGs from several related document types that are sometimes confused with them. Consensus statements are expert opinion documents that represent the agreement of a panel rather than a systematic review of evidence; they may be appropriate when evidence is sparse but carry less epistemic weight than evidence-based CPGs. Position statements express the official stance of a professional organisation on a policy question. Treatment protocols are facility-specific operational documents that implement guideline recommendations in a particular clinical context; they are more prescriptive and less transferable than guidelines.

The AGREE II Instrument

The Appraisal of Guidelines for Research and Evaluation II (AGREE II) instrument is the international standard for evaluating the quality of CPG development methodology. It contains 23 items organised into six domains, each rated on a 7-point scale from “strongly disagree” (1) to “strongly agree” (7), and it includes two global rating items asking whether the guideline should be recommended for use. The six domains are: scope and purpose (items 1–3: whether the overall objective, specific health questions, and target population are clearly described); stakeholder involvement (items 4–6: whether the guideline development group includes all relevant professional groups and whether patient views and preferences were incorporated); rigour of development (items 7–14: whether systematic methods were used to search and select evidence, whether methods for formulating recommendations are clear, whether health benefits, side effects, and risks were considered in formulating recommendations, and whether the guideline has been externally reviewed prior to publication); clarity of presentation (items 15–17: whether recommendations are specific and unambiguous and whether different options for management are clearly presented); applicability (items 18–21: whether the guideline describes facilitators and barriers to implementation and whether monitoring criteria are provided); and editorial independence (items 22–23: whether the funding body influenced the content and whether conflicts of interest among guideline development group members are declared).

When appraising a guideline using AGREE II, the rigor of development domain is typically the most informative for pharmacists, because it reveals whether the recommendations are grounded in a systematic evidence review or in expert opinion. A guideline with a high rigor score that cites GRADE certainty ratings for each recommendation can be used with more confidence than one with a low rigor score that provides no citations.

GRADE Recommendations: Strong vs. Conditional

GRADE distinguishes two strengths of recommendations in addition to four levels of evidence certainty. A strong recommendation means that the guideline panel is confident that the desirable effects of the recommended action outweigh the undesirable effects for most or all patients — and that the same action can be expected to be appropriate for essentially all patients. The clinical implication of a strong recommendation is that it can be implemented without offering patients a choice between alternatives; it represents a standard of care that departure from requires justification.

A conditional recommendation — sometimes called a weak recommendation — means that the panel believes the desirable effects probably outweigh the undesirable effects but acknowledges substantial uncertainty, either because of low-certainty evidence or because the balance depends heavily on individual patient values and circumstances. A conditional recommendation means that different choices will be appropriate for different patients, and that a shared decision-making conversation is expected. The counterintuitive insight of the GRADE framework is that these dimensions are independent: strong evidence can support a conditional recommendation (when the evidence shows that the treatment has a small net benefit that is genuinely preference-sensitive), and weak evidence can support a strong recommendation (when the potential harm of withholding treatment is so catastrophic that even uncertain evidence justifies action).

Guideline Limitations

Even methodologically rigorous guidelines have important limitations that pharmacists must understand. Currency is the first: guideline development typically takes 2–3 years from inception to publication, and during that time new evidence may have emerged that would have changed one or more recommendations. Conflict of interest among guideline authors is a well-documented problem; a 2011 analysis of ACC/AHA guidelines found that 56% of authors had relationships with the pharmaceutical industry. Multimorbidity is a systemic limitation of disease-specific guidelines: each guideline optimises care for one condition in isolation, and no guideline accounts for the cumulative treatment burden or the competing risks that arise when multiple conditions coexist. Guideline-concordant care — meaning care that follows all applicable guidelines — may paradoxically be suboptimal for the multimorbid patient, and clinical judgement is needed to deprioritise or modify recommendations accordingly.

Communicating Evidence to Patients

The way evidence is framed profoundly influences patients’ decisions. Relative risk framing — “this drug reduces your risk of heart attack by 50%” — consistently produces more favourable responses than absolute risk framing — “this drug reduces your risk from 2% to 1% over 5 years” — even when the two statements describe the same evidence. Absolute risk framing is more honest and more useful for patients making genuine value-based decisions, because it conveys the magnitude of benefit in terms that are directly meaningful. Supplementing numerical information with visual aids improves comprehension, particularly for patients with limited health literacy. Icon arrays (Paling palettes) display absolute event frequencies as grids of pictograms; frequency trees display conditional probabilities in branching diagrams that are more intuitive than Bayesian probability notation. The SHARE model (Seek patient participation, Help patients explore and compare options, Assess patient values and preferences, Reach a decision together, Evaluate the decision) provides a structured process for shared decision-making that incorporates evidence communication as an integral component.

Worked Example 8.1 — Applying AGREE II to a Published Guideline

A pharmacy student appraises the CCS (Canadian Cardiovascular Society) 2020 atrial fibrillation guidelines using AGREE II. In the scope and purpose domain, the student assigns high scores because the guideline explicitly defines its target population (adults with non-valvular AF), the clinical questions addressed (stroke prevention, rate control, rhythm control), and the clinical benefits sought (reduction in stroke, systemic embolism, and cardiovascular death). In the rigor of development domain, the guideline describes its use of systematic searches in MEDLINE and Embase with documented strategies, its use of GRADE to rate evidence certainty, its structured process for formulating recommendations by a multidisciplinary panel, and its external review by cardiologists, patient advocates, and pharmacists. Scores in this domain are therefore high. In the applicability domain, the guideline scores lower because it does not provide detailed implementation tools or cost-effectiveness analyses specific to the Canadian healthcare system, and it does not specify monitoring criteria for patients in whom stroke risk assessments are borderline.

Worked Example 8.2 — SBAR Communication to a Prescriber

A hospital pharmacist notices that a patient admitted for a hip fracture has been prescribed diclofenac 75 mg twice daily for pain. The patient is 78 years old, on warfarin for AF (INR therapeutic), and has a creatinine of 140 µmol/L. Using SBAR: Situation — “Dr. Chen, I’m calling about Mrs. Park in room 511 who has been prescribed diclofenac for post-fracture pain.” Background — “She is 78 years old, anticoagulated with warfarin for atrial fibrillation with a current INR of 2.3, and has mildly impaired renal function with a creatinine of 140.” Assessment — “Diclofenac combined with warfarin significantly increases her risk of major GI and intracranial bleeding — observational data suggest a two- to threefold increase in bleeding hospitalisation. Additionally, NSAIDs in elderly patients with reduced renal function risk further acute kidney injury.” Recommendation — “I recommend substituting acetaminophen 500 mg every 6 hours as the first-line analgesic. If stronger analgesia is needed, I can suggest options that minimise these risks. May I put this in the orders?”


Chapter 9: Applying Evidence at the Bedside — Integration and Practice

A landmark 2011 study by Morris and colleagues systematically examined the time from publication of a major clinical research finding to its incorporation into routine clinical practice across multiple therapeutic areas. They found an average of 17 years — nearly two decades — separating discovery from widespread use. This gap is not primarily a problem of knowledge access; in the internet era, a practitioner can retrieve any published finding within seconds. The gap is a problem of knowledge translation: the complex, multi-level process by which research evidence is adapted, adopted, and implemented in the specific contexts of clinical practice. For pharmacists, understanding knowledge translation is not optional — it is the discipline that connects everything covered in the preceding eight chapters to the daily work of patient care.

Knowledge Translation Frameworks

Knowledge translation (KT) is defined by the Canadian Institutes of Health Research as “a dynamic and iterative process that includes synthesis, dissemination, exchange, and ethically sound application of knowledge to improve the health of Canadians.” The Knowledge-to-Action (KTA) cycle, developed by Graham and colleagues at the University of Ottawa, provides a widely used framework that distinguishes knowledge creation (the process of producing and synthesising research) from action (the process of applying it in practice). The KTA cycle acknowledges that knowledge must first be synthesised and distilled into usable tools — guidelines, algorithms, decision aids — before it can be implemented; and that implementation must be adapted to local context, monitored for adherence, sustained over time, and evaluated for outcomes.

The PARIHS (Promoting Action on Research Implementation in Health Services) framework, developed by Rycroft-Malone and colleagues, proposes that the success of research implementation is a function of three factors: the nature of the evidence (how strong, how credible, how consistent with clinical experience and patient preferences), the context (organisational culture, leadership support, evaluation infrastructure), and facilitation (the presence of individuals who enable and support the change process). PARIHS is particularly relevant in institutional pharmacy settings because it acknowledges that even strong evidence will not be implemented if the organisational context — staffing, formulary processes, prescribing culture — is not supportive.

Individualising Evidence

The tension between population-level evidence and individual-level decisions is perhaps the central challenge of EBM in daily practice. An RCT demonstrates that Drug X reduces the risk of stroke by 25% relative to placebo in patients aged 60–80 with non-valvular AF and a CHADS₂ score of 2. The patient in front of you is 83 years old, with a CHADS₂ score of 3, severe frailty, a history of falls, and preserved cognitive function. The RCT was designed to address the average patient; you must determine how that average applies to this specific patient.

The N-of-1 trial is the methodological ideal of individualised evidence: a series of crossover periods in which the same patient alternates between treatment and control conditions under blinded and randomised conditions. For outcomes that are reversible, measurable, and meaningfully variable within an individual (chronic pain, fatigue, hypertension), N-of-1 trials can determine whether a given patient benefits from a treatment that is effective on average. However, N-of-1 trials are logistically demanding and are rarely feasible in routine practice.

A more practical approach to individualisation involves Bayesian updating: starting with a prior probability of benefit derived from the population evidence, and updating it with patient-specific characteristics that serve as effect modifiers. A patient who shares the exact baseline characteristics of the trial population (same age, same CHADS₂ score, same renal function) has an estimated benefit close to the trial average. A patient who differs systematically from the trial population requires clinical judgement to estimate how the difference modifies the expected benefit.

Formulary Decision-Making

The P&T committee is the institutional body responsible for managing the formulary — the list of drugs approved for use within a hospital or healthcare system. P&T committees typically include physicians, pharmacists, nurses, administrators, and increasingly patient representatives. Their decisions about which drugs to add, remove, or restrict are among the most consequential evidence-based decisions in healthcare, because they affect every patient in the institution rather than a single individual.

The drug monograph prepared for P&T review is the pharmacist’s principal contribution to this process. A well-structured drug monograph addresses pharmacology and mechanism of action; pharmacokinetic parameters relevant to dosing; clinical trial evidence with GRADE certainty ratings; comparison with existing formulary alternatives (including head-to-head trials and indirect comparisons from network meta-analysis if available); safety profile from both trials and post-marketing data; pharmacoeconomic data; practical considerations such as storage, preparation, and administration; and a recommendation with a proposed formulary status. Therapeutic interchange policies specify that when a non-formulary drug is ordered, a formulary equivalent may be automatically substituted; these policies require evidence of therapeutic equivalence and clear communication to prescribers.

Biosimilar adoption is an increasingly prominent formulary issue. A biosimilar is a biologic medicine that has been approved on the basis of demonstrated similarity to an already-approved reference biologic, rather than independent pivotal trials. CADTH and Health Canada have established frameworks for biosimilar approval and interchangeability that pharmacists must understand to advise on substitution policies.

Dealing with Uncertainty

Clinical decision-making frequently requires action in the face of incomplete, conflicting, or inapplicable evidence. Recognising and communicating uncertainty honestly is as important a professional skill as finding and appraising evidence. When evidence is absent — as for many drug use questions in pregnancy, severe organ impairment, or rare diseases — the pharmacist should say so explicitly, describing what evidence does exist (e.g., animal toxicology data, pharmacokinetic studies in similar populations) and what its limitations are, rather than defaulting to a blanket recommendation derived from the drug label without further thought.

When evidence is conflicting — when well-designed trials reach opposite conclusions — the pharmacist should attempt to understand why: do the trials differ in population, dose, follow-up duration, or outcome definition? Is there evidence of publication bias? Has a subsequent meta-analysis resolved the conflict? If the conflict is genuine and unresolved, the response should say so and discuss the clinical implications of uncertainty, including whether watchful waiting or a time-limited therapeutic trial is appropriate.

When evidence is inapplicable — when it comes from a population, setting, or intervention that does not map well onto the clinical question — GRADE’s indirectness criterion applies. The pharmacist should describe the nature and degree of inapplicability and judge whether the indirect evidence provides sufficient reassurance to act or whether the degree of uncertainty warrants caution, a conservative default dose, or referral to a specialist.

Drug Information Response Documentation

Documenting drug information responses is not merely an administrative formality; it is a professional and medicolegal responsibility. A documented response demonstrates that the question was received, properly understood, and answered in a traceable and reproducible way. The documentation should include the date and time of the request, the identity of the requestor and their professional role, the patient’s relevant clinical information (de-identified where appropriate), the question as restated in PICO format, the search strategy including databases searched, date range, and number of records retrieved, the key sources consulted and their levels of evidence, the evidence summary, the recommendation, and any relevant caveats or follow-up requirements.

From a quality improvement perspective, documenting responses enables tracking of response accuracy over time. If a documented recommendation is later found to have been based on retracted evidence or superseded by new findings, the documentation enables identification of patients who may have been affected and supports corrective action. From a medicolegal perspective, documented drug information responses demonstrate professional diligence and can substantiate that the pharmacist exercised the standard of care applicable to drug information practice.

Worked Example 9.1 — Rivaroxaban in Child-Pugh B Cirrhosis

A hospitalist telephones the pharmacy asking whether rivaroxaban is safe to use for atrial fibrillation stroke prevention in a patient with Child-Pugh B cirrhosis (moderate hepatic impairment). The systematic approach proceeds through each step.

Classify: Drug use in hepatic impairment — pharmacokinetics, safety, and guideline recommendation.

Background: The patient is 64 years old with compensated cirrhosis secondary to NASH, CHADS₂-VASc score of 4, current INR 1.4 (liver disease-related, not anticoagulant effect), albumin 28 g/L, no prior variceal bleeding.

Ultimate question: In patients with non-valvular AF and Child-Pugh B cirrhosis, is rivaroxaban safe and effective compared with warfarin or no treatment for stroke prevention?

Search: PubMed (rivaroxaban AND “hepatic impairment” AND pharmacokinetics; “atrial fibrillation” AND cirrhosis AND anticoagulation); Canadian Product Monograph via Health Canada; FDA approval package pharmacology review; CCS AF guidelines GRADE recommendations.

Evaluate: The Canadian Product Monograph for rivaroxaban states it is contraindicated in patients with hepatic disease associated with coagulopathy and clinically relevant bleeding risk, including Child-Pugh B and C. This classification is based on a Phase I pharmacokinetic study in hepatically impaired volunteers (reported in the FDA drug approval package), which found that Child-Pugh B patients had AUC values approximately 76% higher than healthy controls due to reduced CYP3A4 activity and reduced plasma protein binding — substantially increasing the risk of bleeding at standard doses. The ROCKET-AF trial, which established rivaroxaban’s efficacy in AF, excluded patients with significant hepatic disease (ALT or AST >3× ULN). No clinical trial evidence supports rivaroxaban use in Child-Pugh B cirrhosis. The CCS 2020 AF guidelines conditionally recommend against DOAC use in cirrhosis with coagulopathy, noting very low certainty evidence.

Response to the hospitalist: “Rivaroxaban is contraindicated in Child-Pugh B cirrhosis per the Canadian Product Monograph, based on substantially elevated drug exposure in PK studies and absence of clinical trial evidence in this population. Warfarin can theoretically be used but requires very careful monitoring and interpretation of the INR, which is elevated at baseline due to impaired hepatic synthesis rather than anticoagulant effect — an elevated INR in cirrhosis does not reliably reflect anticoagulant drug effect. For patients with cirrhosis and AF, guideline consensus favours a case-by-case risk-benefit assessment, typically with multidisciplinary input from hepatology and haematology. I recommend a hepatology consult before initiating anticoagulation.”

Documentation: The response is recorded in the pharmacy drug information log with the date, requestor, patient information, PICO question, databases searched, key references (Product Monograph, FDA approval package, CCS guidelines, relevant PK study), the recommendation and its rationale, and a note that follow-up with hepatology was suggested.

Worked Example 9.2 — Knowledge Translation in Institutional Practice

A clinical pharmacy team at a regional hospital reviews the 2022 update to the AHA/ACC guideline on management of patients with chronic coronary disease, which includes a new strong recommendation (based on moderate-certainty evidence) that all eligible patients with stable ischemic heart disease and LDL above 1.8 mmol/L on a maximally tolerated statin should be considered for ezetimibe or a PCSK9 inhibitor. Applying the KTA cycle: first, the team synthesises the evidence locally by retrieving the IMPROVE-IT and FOURIER trials, appraising their risk of bias (low by RoB 2), and confirming applicability to their patient population. Second, the team assesses the barriers and facilitators to implementation in their institutional context: ezetimibe is on the formulary and inexpensive; PCSK9 inhibitors are not on the formulary and require prior authorisation. The team prepares a drug monograph for P&T review requesting restricted formulary addition of a PCSK9 inhibitor for patients with ASCVD and persistent LDL above 2.5 mmol/L despite maximal statin plus ezetimibe. Third, the team designs a monitoring plan: quarterly LDL testing and documentation of the indication in the medication management record. Fourth, the team communicates the new recommendation to prescribers via a concise drug information bulletin distributed at the next medical staff rounds — using SBAR format, absolute risk data from the trials, and a one-page quick-reference card. This full cycle, from evidence to implementation to monitoring, exemplifies the applied knowledge translation role of the clinical pharmacist.

The integration of drug information skills with evidence-based medicine, critical appraisal, pharmacoeconomics, guideline application, and knowledge translation represents the full scope of advanced pharmacy practice. Each chapter of this textbook has equipped you with a specific set of tools — the systematic approach, PICO, MEDLINE searching, GRADE, RoB 2, CONSORT, PRISMA, AGREE II, ICER — but the value of these tools lies entirely in their coordinated application to real clinical problems. The pharmacist who asks a better question, searches more completely, appraises more rigorously, communicates more clearly, and documents more carefully is not simply a more technically proficient practitioner. That pharmacist is a safer one — and ultimately, the systematic development of these skills is the core purpose of advanced drug information education.

Back to top