This article provides an example of a custom analysis that an analyst might need to carry out at the request of a researcher.

Researcher request

We want to determine the proportion of cases where Haemophilus influenzae type B is in the causal chain of death. The CHAMPS TAC assays data for H. influenzae includes a target found in all serotypes (HIAT_1 or “Haemophilus influenzae”) as well as a target found in serotypes A and B (HITB_2 or “Haemophilus influenzae type B”). The combination of Negative and Positive results result in the following interpretation;

HIAT_1 HITB_2 Interpretation
Positive Negative H. influenzae serotype A,B,C,D,E,F, or NT
Positive Positive H. influenzae serotype A or B
Negative Positive H. influenzae serotype A or B
Negative Negative Negative for H. influenzae

The DeCoDe panel determines each case’s causal result in the dcd data, and they cannot wholly rule-out H. influenzae type A even if the H. influenzae type B test result is positive. Because of this, they will record the case as H. influenzae without specifying type B (HiB). We want to use the CHAMPS data to estimate the count of cases that may be true HiB by finding the cases where H. influenzae is an etiology in the causal chain (dcd), and any positive result for HITB_2 is in any specimen (tac).

Using CHAMPS data with the champs package

library(champs)
library(dplyr)
library(stringr)
d <- load_data("CHAMPS_de_identified_data")

Let’s use the tac_long data to make sure we understand the target in the test and which specimen types exist for influenzae.

d$tac_long %>%
  filter(!duplicated(name), str_detect(name, "hiat|hitb")) %>%
  select(-champs_deid, -result) %>%
  arrange(target) %>%
  knitr::kable(format = "markdown")
name specimen_type target pathogen
bld_hiat_1 Whole blood HIAT_1 Haemophilus influenzae
bld_sp_hiat_1 Plasma or spun blood specimen HIAT_1 Haemophilus influenzae
csf_hiat_1 Cerebrospinal fluid sample HIAT_1 Haemophilus influenzae
lung_hiat_1 Tissue specimen from lung HIAT_1 Haemophilus influenzae
np_op_hiat_1 Nasopharyngeal and Oropharyngeal swab HIAT_1 Haemophilus influenzae
bld_hitb_2 Whole blood HITB_2 Haemophilus influenzae type B
bld_sp_hitb_2 Plasma or spun blood specimen HITB_2 Haemophilus influenzae type B
csf_hitb_2 Cerebrospinal fluid sample HITB_2 Haemophilus influenzae type B
lung_hitb_2 Tissue specimen from lung HITB_2 Haemophilus influenzae type B
np_op_hitb_2 Nasopharyngeal and Oropharyngeal swab HITB_2 Haemophilus influenzae type B

Are there any other pathogen names that we may be missing that are influenzea?

d$tac_long %>%
  filter(!duplicated(name), str_detect(pathogen, "influenzae")) %>%
  select(-champs_deid, -result) %>%
  arrange(target) %>%
  knitr::kable(format = "markdown")
name specimen_type target pathogen
bld_hflu_1 Whole blood HFLU_1 H. influenzae
bld_sp_hflu_1 Plasma or spun blood specimen HFLU_1 H. influenzae
csf_hflu_1 Cerebrospinal fluid sample HFLU_1 H. influenzae
lung_hflu_1 Tissue specimen from lung HFLU_1 H. influenzae
np_op_hflu_1 Nasopharyngeal and Oropharyngeal swab HFLU_1 H. influenzae
bld_hiat_1 Whole blood HIAT_1 Haemophilus influenzae
bld_sp_hiat_1 Plasma or spun blood specimen HIAT_1 Haemophilus influenzae
csf_hiat_1 Cerebrospinal fluid sample HIAT_1 Haemophilus influenzae
lung_hiat_1 Tissue specimen from lung HIAT_1 Haemophilus influenzae
np_op_hiat_1 Nasopharyngeal and Oropharyngeal swab HIAT_1 Haemophilus influenzae
bld_hitb_2 Whole blood HITB_2 Haemophilus influenzae type B
bld_sp_hitb_2 Plasma or spun blood specimen HITB_2 Haemophilus influenzae type B
csf_hitb_2 Cerebrospinal fluid sample HITB_2 Haemophilus influenzae type B
lung_hitb_2 Tissue specimen from lung HITB_2 Haemophilus influenzae type B
np_op_hitb_2 Nasopharyngeal and Oropharyngeal swab HITB_2 Haemophilus influenzae type B

The H. influenza in the HFLU_1 is a ‘multi-target pattern’ defined by the positive-negative table in the researcher request section above. In other words, HIAT_1 and HITB_2 explain HFLU_1, so we don’t need to track that target.

Building pathogen present (TAC) tabulations

We can build our present pathogen counts and ensure that our totals align with our tac subject count of 1473. From the table below, it looks like our counts total the correct amount. We are not interested in the double negative results, but the other three rows are of interest.

hiat_1_positive <- d$tac_long %>%
  filter(result == "Positive", pathogen == "Haemophilus influenzae") %>%
  pull(champs_deid) %>%
  unique()

hitb_2_positive <- d$tac_long %>%
  filter(result == "Positive", pathogen == "Haemophilus influenzae type B") %>%
  pull(champs_deid) %>%
  unique()

# table tallies from TAC
positive_hiat_1_negative_hitb_2 <-  hiat_1_positive[!hiat_1_positive %in% hitb_2_positive]
positive_both <- hiat_1_positive[hiat_1_positive %in% hitb_2_positive]
negative_hiat_1_positive_hitb_2 <- hitb_2_positive[!hitb_2_positive %in% hiat_1_positive]
negative_both <- unique(d$tac_long$champs_deid)[!unique(d$tac_long$champs_deid) %in% 
  c(positive_hiat_1_negative_hitb_2 , positive_both, negative_hiat_1_positive_hitb_2)]
HIAT_1 HITB_2 Interpretation Pathogen present (tac)
Positive Negative H. influenzae serotype A,B,C,D,E,F, or NT 293
Positive Positive H. influenzae serotype A or B 9
Negative Positive H. influenzae serotype A or B 1
Negative Negative Negative for H. influenzae 1170
Total 1473

Building causal chain (DeCoDe) tabulations

Let’s look for the unique values in champs_group_desc and etiol from dcd_long related to influenzae. From the table below, we can see our two variables of interest under eitol. There appear to be 36 unique cases classified as having Haemophilus influenzae in their causal chain. We have no cases classified as Haemophilus influenzae Type B and we are not interested in Haemophilus parainfluenzae for this analysis.

d$dcd_long %>%
  filter(str_detect(etiol, "influenzae")) %>%
  arrange(champs_deid) %>%
  group_by(etiol) %>%
  summarize(values = n(), unique_cases = length(unique(champs_deid)), 
    types = str_c(unique(type), collapse = ", "), cgd_count = length(unique(champs_group_desc))) %>%
  knitr::kable(format = "markdown")
#> `summarise()` ungrouping output (override with `.groups` argument)
etiol values unique_cases types cgd_count
Haemophilus influenzae 48 36 immediate_cause, morbid_condition, underlying_cause 5
Haemophilus parainfluenzae 3 3 underlying_cause, immediate_cause, morbid_condition 1

Sometimes the capitalization and labeling are different from the tac_long pathogens as compared to the dcd_long values in etiol. It is essential to take note of any potential differences. Also, as the DeCoDe data has immediate and underlying causes, we may not have 36 unique cases with the etiols selected. The champs package includes a valid_conditions() function that can recreate the above code. It will check the champs_group_description as well.

lapply(valid_conditions(d), function(x) str_subset(x, "(i|I)nfluenzae"))
#> $champs_group_desc
#> character(0)
#> 
#> $etiol
#> [1] "Haemophilus influenzae"     "Haemophilus parainfluenzae"

In our specific case, there are no causal cases with ‘Haemophilus influenzae’ and ‘Haemophilus influenzae Type B’. We can find the 36 cases with Haemophilus influenzae.

hi_causal <- d$dcd_long %>%
  filter(etiol == "Haemophilus influenzae") %>%
  pull(champs_deid) %>%
  unique()

hib_causal <- d$dcd_long %>%
  filter(etiol == "Haemophilus influenzae Type B") %>%
  pull(champs_deid) %>%
  unique()

Running length(hi_causal) returns 36 values while length(hib_causal) returns 0 values.

Exploring the 36 ‘Haemophilus influenzae’ causal cases

We want to examine the 36 cases determined to have Haemophilus influenzae to see if any of those cases are potential Type B. We will examine the cases with our pathogens of interest from tac_long that we calculated above to see alignment with the DeCoDe causal classification.

hi_causal_phi_nhib <- positive_hiat_1_negative_hitb_2[positive_hiat_1_negative_hitb_2 %in% hi_causal]
length(hi_causal_phi_nhib)
#> [1] 35

So we had 35 of the causal Haemophilus influenzae cases out of the 36 that had a positive HIAT_1 and a negative HITB_2 in their TAC. One case appears to have a positive HITB_2. We can check;

hi_causal_pboth <- positive_both[positive_both %in% hi_causal]
length(hi_causal_pboth)
#> [1] 1

Here is the reported DeCoDe information for the one case that is Haemophilus influenzae causal and positive for both HIAT_1 and HITB_2. We stored the champs_deid in the object hi_causal_pboth.

d$dcd_long %>% 
  filter(champs_deid %in% hi_causal_pboth, !is.na(champs_group_desc)) %>% 
  select(-champs_deid) %>%
  knitr::kable(format = "markdown")
champs_group_desc type etiol etiol_num
Sepsis immediate_cause Haemophilus influenzae 1
Sepsis immediate_cause NA 2
Sepsis immediate_cause NA 3
Malnutrition underlying_cause NA 1
Malnutrition underlying_cause NA 2
Malnutrition underlying_cause NA 3
Lower respiratory infections morbid_condition Haemophilus influenzae 1
Lower respiratory infections morbid_condition NA 2
Lower respiratory infections morbid_condition NA 3

That same case’s TAC data filtered to pathogen’s containing influenzae is shown below.

d$tac_long %>% 
  filter(champs_deid %in% hi_causal_pboth, result == "Positive", str_detect(pathogen, "influenzae")) %>% 
  select(-champs_deid) %>%
  knitr::kable(format = "markdown")
name result specimen_type target pathogen
bld_hiat_1 Positive Whole blood HIAT_1 Haemophilus influenzae
bld_hitb_2 Positive Whole blood HITB_2 Haemophilus influenzae type B
lung_hiat_1 Positive Tissue specimen from lung HIAT_1 Haemophilus influenzae
lung_hitb_2 Positive Tissue specimen from lung HITB_2 Haemophilus influenzae type B
np_op_hiat_1 Positive Nasopharyngeal and Oropharyngeal swab HIAT_1 Haemophilus influenzae
np_op_hitb_2 Positive Nasopharyngeal and Oropharyngeal swab HITB_2 Haemophilus influenzae type B

Exploring the 9 HIAT_1 and HITB_2 positve cases

We discussed one of the nine cases in the previous section. What did the other 8 cases with positive HIAT_1 and HITB_2 have in their casual chain?

For convenience, to remove the long champs_deid, we will define a function that replaces the ID with a unique numeric label, which we will use in several of the following code blocks.

recode_champs_deid <- function(x) {
  x %>%
    mutate(champs_case = factor(champs_deid) %>% as.numeric()) %>%
    select(-champs_deid) %>%
    select(champs_case, everything()) %>%
    arrange(champs_case)
}

We can find the dual positive TAC with no Haemophilus influenzae in the causal chain with an etiol reported (tac_pboth_nhic) and then look at the DeCoDe results.

tac_pboth_nhic <- d$dcd_long %>%
  filter(champs_deid %in% positive_both, !is.na(etiol)) %>%
  pull(champs_deid) %>% unique()

d$dcd_long %>%
  filter(champs_deid %in% tac_pboth_nhic, !is.na(etiol))%>%
  filter(champs_deid != hi_causal_pboth) %>%
  recode_champs_deid() %>%
  knitr::kable(format = "markdown")
champs_case champs_group_desc type etiol etiol_num
1 Malaria underlying_cause Plasmodium falciparum 1
2 Malaria underlying_cause Plasmodium falciparum 1
3 Sepsis immediate_cause Escherichia coli 1
3 Sepsis immediate_cause Candida albicans 2
3 Sepsis immediate_cause Pseudomonas aeruginosa 3
3 HIV underlying_cause Human Immunodeficiency Virus (HIV) 1
3 Lower respiratory infections morbid_condition Candida albicans 1
3 Lower respiratory infections morbid_condition Pseudomonas aeruginosa 2
3 Lower respiratory infections morbid_condition Klebsiella pneumoniae 3
4 Diarrheal Diseases immediate_cause Escherichia coli 1
4 Diarrheal Diseases immediate_cause Shigella 2
4 Diarrheal Diseases underlying_cause Escherichia coli 1
4 Diarrheal Diseases underlying_cause Shigella 2
5 Lower respiratory infections immediate_cause Salmonella enterica sub-species paratyphi A 1
6 Lower respiratory infections immediate_cause Other Etiology/Agent 1
6 Lower respiratory infections immediate_cause Cytomegalovirus (CMV) 2
6 HIV underlying_cause Other Etiology/Agent 1
7 Malaria underlying_cause Plasmodium falciparum 1

Notice that there are only 7 cases in the TAC data table shown above. We expected to have eight cases after removing the one case in hi_causal_pboth. Let’s look at the missing case and see why it is not in our above table.

d$dcd_long %>%
  filter(champs_deid %in% positive_both[!positive_both %in% tac_pboth_nhic], !is.na(champs_group_desc)) %>%
  recode_champs_deid() %>%
  knitr::kable(format = "markdown")
champs_case champs_group_desc type etiol etiol_num
1 Undetermined underlying_cause NA 1
1 Undetermined underlying_cause NA 2
1 Undetermined underlying_cause NA 3

This could be a potential Haemophilus influenzae Type B causal case based on the double-positive in the TAC data for HIAT_1 and HITB_2 and the DeCoDe board’s non-determination. Finally, we have two cases that we could recommend for inclusion in the Haemophilus influenzae Type B causal group.

One other important note is that the champs_group_desc had the pertinent information needed to determine the causal description for this case and that the respective etiol was NA. We need to be careful filtering out missing values in etiol, which is what we did in the table of seven cases.

Checking our counts with calc_ functions

Our calculations above were not broken down by site or age, but we can use the validated calc_ functions to check our results using the marginal totals.

Pathogen presence of ‘Haemophilus influenzae’

We reported 302 cases and can check those values against the calc_cc_detected_by_site_age()

calc_detected_HIAT_1_wc_hi <- calc_cc_detected_by_site_age(d, 
  condition = "Haemophilus influenzae", 
  pathogen = "Haemophilus influenzae",
  specimen_types = valid_specimen_types(d),
  sites = valid_sites(d))
calc_detected_HIAT_1_wc_hi$denominator %>% knitr::kable(format = "markdown")
Bangladesh Kenya Mali Mozambique South Africa Ethiopia Sierra Leone Sum
Stillbirth 1 5 1 0 5 0 1 13
Death in the first 24 hours 1 2 1 4 3 0 1 12
Early Neonate (1 to 6 days) 3 0 0 1 2 1 4 11
Late Neonate (7 to 27 days) 1 2 0 0 3 2 1 9
Infant (28 days to less than 12 months) 2 41 15 11 31 3 2 105
Child (12 months to less than 60 Months) 0 60 12 28 34 6 12 152
Sum 8 110 29 44 78 12 21 302

Pathogen presence of ‘Haemophilus influenzae type B’

We reported 10 cases. We can see that marginal total below.

calc_detected_HITB_2_wc_hib <- calc_cc_detected_by_site_age(d, 
  condition = "Haemophilus influenzae", 
  pathogen = "Haemophilus influenzae type B",
  specimen_types = valid_specimen_types(d),
  sites = valid_sites(d))
calc_detected_HITB_2_wc_hib$denominator %>% knitr::kable(format = "markdown")
Bangladesh Kenya Mali Mozambique South Africa Ethiopia Sierra Leone Sum
Stillbirth 0 0 0 0 0 0 0 0
Death in the first 24 hours 0 0 0 0 0 0 0 0
Early Neonate (1 to 6 days) 0 0 0 0 1 0 0 1
Late Neonate (7 to 27 days) 0 0 0 0 0 0 0 0
Infant (28 days to less than 12 months) 0 1 0 0 0 0 0 1
Child (12 months to less than 60 Months) 0 2 1 1 1 0 3 8
Sum 0 3 1 1 2 0 3 10

Causal presence of ‘Haemophilus influenzae’

We reported 36 cases and can see that value in the marginal totals.

calc_causal_hi_only <- calc_cc_allcases_by_site_age(d, condition = "Haemophilus influenzae", sites = valid_sites(d))
calc_causal_hi_only$numerator %>% knitr::kable(format = "markdown")
Bangladesh Kenya Mali Mozambique South Africa Ethiopia Sierra Leone Sum
Stillbirth 0 0 1 0 0 0 0 1
Death in the first 24 hours 0 0 0 0 2 0 0 2
Early Neonate (1 to 6 days) 0 0 0 0 0 0 0 0
Late Neonate (7 to 27 days) 0 0 0 0 2 0 0 2
Infant (28 days to less than 12 months) 0 1 3 1 7 0 0 12
Child (12 months to less than 60 Months) 0 4 1 5 4 4 1 19
Sum 0 5 5 6 15 4 1 36