vignettes/custom_analyses_h_influenzae.Rmd
custom_analyses_h_influenzae.Rmd
This article provides an example of a custom analysis that an analyst might need to carry out at the request of a researcher.
We want to determine the proportion of cases where Haemophilus influenzae type B is in the causal chain of death. The CHAMPS TAC assays data for H. influenzae includes a target found in all serotypes (HIAT_1
or “Haemophilus influenzae”) as well as a target found in serotypes A and B (HITB_2
or “Haemophilus influenzae type B”). The combination of Negative and Positive results result in the following interpretation;
HIAT_1 | HITB_2 | Interpretation |
---|---|---|
Positive | Negative | H. influenzae serotype A,B,C,D,E,F, or NT |
Positive | Positive | H. influenzae serotype A or B |
Negative | Positive | H. influenzae serotype A or B |
Negative | Negative | Negative for H. influenzae |
The DeCoDe panel determines each case’s causal result in the dcd
data, and they cannot wholly rule-out H. influenzae type A even if the H. influenzae type B test result is positive. Because of this, they will record the case as H. influenzae without specifying type B (HiB). We want to use the CHAMPS data to estimate the count of cases that may be true HiB by finding the cases where H. influenzae is an etiology in the causal chain (dcd
), and any positive result for HITB_2
is in any specimen (tac
).
Let’s use the tac_long
data to make sure we understand the target in the test and which specimen types exist for influenzae.
d$tac_long %>% filter(!duplicated(name), str_detect(name, "hiat|hitb")) %>% select(-champs_deid, -result) %>% arrange(target) %>% knitr::kable(format = "markdown")
name | specimen_type | target | pathogen |
---|---|---|---|
bld_hiat_1 | Whole blood | HIAT_1 | Haemophilus influenzae |
bld_sp_hiat_1 | Plasma or spun blood specimen | HIAT_1 | Haemophilus influenzae |
csf_hiat_1 | Cerebrospinal fluid sample | HIAT_1 | Haemophilus influenzae |
lung_hiat_1 | Tissue specimen from lung | HIAT_1 | Haemophilus influenzae |
np_op_hiat_1 | Nasopharyngeal and Oropharyngeal swab | HIAT_1 | Haemophilus influenzae |
bld_hitb_2 | Whole blood | HITB_2 | Haemophilus influenzae type B |
bld_sp_hitb_2 | Plasma or spun blood specimen | HITB_2 | Haemophilus influenzae type B |
csf_hitb_2 | Cerebrospinal fluid sample | HITB_2 | Haemophilus influenzae type B |
lung_hitb_2 | Tissue specimen from lung | HITB_2 | Haemophilus influenzae type B |
np_op_hitb_2 | Nasopharyngeal and Oropharyngeal swab | HITB_2 | Haemophilus influenzae type B |
Are there any other pathogen names that we may be missing that are influenzea?
d$tac_long %>% filter(!duplicated(name), str_detect(pathogen, "influenzae")) %>% select(-champs_deid, -result) %>% arrange(target) %>% knitr::kable(format = "markdown")
name | specimen_type | target | pathogen |
---|---|---|---|
bld_hflu_1 | Whole blood | HFLU_1 | H. influenzae |
bld_sp_hflu_1 | Plasma or spun blood specimen | HFLU_1 | H. influenzae |
csf_hflu_1 | Cerebrospinal fluid sample | HFLU_1 | H. influenzae |
lung_hflu_1 | Tissue specimen from lung | HFLU_1 | H. influenzae |
np_op_hflu_1 | Nasopharyngeal and Oropharyngeal swab | HFLU_1 | H. influenzae |
bld_hiat_1 | Whole blood | HIAT_1 | Haemophilus influenzae |
bld_sp_hiat_1 | Plasma or spun blood specimen | HIAT_1 | Haemophilus influenzae |
csf_hiat_1 | Cerebrospinal fluid sample | HIAT_1 | Haemophilus influenzae |
lung_hiat_1 | Tissue specimen from lung | HIAT_1 | Haemophilus influenzae |
np_op_hiat_1 | Nasopharyngeal and Oropharyngeal swab | HIAT_1 | Haemophilus influenzae |
bld_hitb_2 | Whole blood | HITB_2 | Haemophilus influenzae type B |
bld_sp_hitb_2 | Plasma or spun blood specimen | HITB_2 | Haemophilus influenzae type B |
csf_hitb_2 | Cerebrospinal fluid sample | HITB_2 | Haemophilus influenzae type B |
lung_hitb_2 | Tissue specimen from lung | HITB_2 | Haemophilus influenzae type B |
np_op_hitb_2 | Nasopharyngeal and Oropharyngeal swab | HITB_2 | Haemophilus influenzae type B |
The H. influenza in the HFLU_1
is a ‘multi-target pattern’ defined by the positive-negative table in the researcher request section above. In other words, HIAT_1
and HITB_2
explain HFLU_1
, so we don’t need to track that target.
We can build our present pathogen counts and ensure that our totals align with our tac
subject count of 1473. From the table below, it looks like our counts total the correct amount. We are not interested in the double negative results, but the other three rows are of interest.
hiat_1_positive <- d$tac_long %>% filter(result == "Positive", pathogen == "Haemophilus influenzae") %>% pull(champs_deid) %>% unique() hitb_2_positive <- d$tac_long %>% filter(result == "Positive", pathogen == "Haemophilus influenzae type B") %>% pull(champs_deid) %>% unique() # table tallies from TAC positive_hiat_1_negative_hitb_2 <- hiat_1_positive[!hiat_1_positive %in% hitb_2_positive] positive_both <- hiat_1_positive[hiat_1_positive %in% hitb_2_positive] negative_hiat_1_positive_hitb_2 <- hitb_2_positive[!hitb_2_positive %in% hiat_1_positive] negative_both <- unique(d$tac_long$champs_deid)[!unique(d$tac_long$champs_deid) %in% c(positive_hiat_1_negative_hitb_2 , positive_both, negative_hiat_1_positive_hitb_2)]
HIAT_1 | HITB_2 | Interpretation | Pathogen present (tac) |
---|---|---|---|
Positive | Negative | H. influenzae serotype A,B,C,D,E,F, or NT | 293 |
Positive | Positive | H. influenzae serotype A or B | 9 |
Negative | Positive | H. influenzae serotype A or B | 1 |
Negative | Negative | Negative for H. influenzae | 1170 |
Total | 1473 |
Let’s look for the unique values in champs_group_desc
and etiol
from dcd_long
related to influenzae. From the table below, we can see our two variables of interest under eitol
. There appear to be 36 unique cases classified as having Haemophilus influenzae in their causal chain. We have no cases classified as Haemophilus influenzae Type B and we are not interested in Haemophilus parainfluenzae for this analysis.
d$dcd_long %>% filter(str_detect(etiol, "influenzae")) %>% arrange(champs_deid) %>% group_by(etiol) %>% summarize(values = n(), unique_cases = length(unique(champs_deid)), types = str_c(unique(type), collapse = ", "), cgd_count = length(unique(champs_group_desc))) %>% knitr::kable(format = "markdown") #> `summarise()` ungrouping output (override with `.groups` argument)
etiol | values | unique_cases | types | cgd_count |
---|---|---|---|---|
Haemophilus influenzae | 48 | 36 | immediate_cause, morbid_condition, underlying_cause | 5 |
Haemophilus parainfluenzae | 3 | 3 | underlying_cause, immediate_cause, morbid_condition | 1 |
Sometimes the capitalization and labeling are different from the tac_long
pathogens as compared to the dcd_long
values in etiol
. It is essential to take note of any potential differences. Also, as the DeCoDe data has immediate and underlying causes, we may not have 36 unique cases with the etiols
selected. The champs package includes a valid_conditions()
function that can recreate the above code. It will check the champs_group_description
as well.
lapply(valid_conditions(d), function(x) str_subset(x, "(i|I)nfluenzae")) #> $champs_group_desc #> character(0) #> #> $etiol #> [1] "Haemophilus influenzae" "Haemophilus parainfluenzae"
In our specific case, there are no causal cases with ‘Haemophilus influenzae’ and ‘Haemophilus influenzae Type B’. We can find the 36 cases with Haemophilus influenzae.
hi_causal <- d$dcd_long %>% filter(etiol == "Haemophilus influenzae") %>% pull(champs_deid) %>% unique() hib_causal <- d$dcd_long %>% filter(etiol == "Haemophilus influenzae Type B") %>% pull(champs_deid) %>% unique()
Running length(hi_causal)
returns 36 values while length(hib_causal)
returns 0 values.
We want to examine the 36 cases determined to have Haemophilus influenzae to see if any of those cases are potential Type B. We will examine the cases with our pathogens of interest from tac_long
that we calculated above to see alignment with the DeCoDe causal classification.
hi_causal_phi_nhib <- positive_hiat_1_negative_hitb_2[positive_hiat_1_negative_hitb_2 %in% hi_causal] length(hi_causal_phi_nhib) #> [1] 35
So we had 35 of the causal Haemophilus influenzae cases out of the 36 that had a positive HIAT_1
and a negative HITB_2
in their TAC. One case appears to have a positive HITB_2
. We can check;
hi_causal_pboth <- positive_both[positive_both %in% hi_causal] length(hi_causal_pboth) #> [1] 1
Here is the reported DeCoDe information for the one case that is Haemophilus influenzae causal and positive for both HIAT_1
and HITB_2
. We stored the champs_deid
in the object hi_causal_pboth
.
d$dcd_long %>% filter(champs_deid %in% hi_causal_pboth, !is.na(champs_group_desc)) %>% select(-champs_deid) %>% knitr::kable(format = "markdown")
champs_group_desc | type | etiol | etiol_num |
---|---|---|---|
Sepsis | immediate_cause | Haemophilus influenzae | 1 |
Sepsis | immediate_cause | NA | 2 |
Sepsis | immediate_cause | NA | 3 |
Malnutrition | underlying_cause | NA | 1 |
Malnutrition | underlying_cause | NA | 2 |
Malnutrition | underlying_cause | NA | 3 |
Lower respiratory infections | morbid_condition | Haemophilus influenzae | 1 |
Lower respiratory infections | morbid_condition | NA | 2 |
Lower respiratory infections | morbid_condition | NA | 3 |
That same case’s TAC data filtered to pathogen
’s containing influenzae is shown below.
d$tac_long %>% filter(champs_deid %in% hi_causal_pboth, result == "Positive", str_detect(pathogen, "influenzae")) %>% select(-champs_deid) %>% knitr::kable(format = "markdown")
name | result | specimen_type | target | pathogen |
---|---|---|---|---|
bld_hiat_1 | Positive | Whole blood | HIAT_1 | Haemophilus influenzae |
bld_hitb_2 | Positive | Whole blood | HITB_2 | Haemophilus influenzae type B |
lung_hiat_1 | Positive | Tissue specimen from lung | HIAT_1 | Haemophilus influenzae |
lung_hitb_2 | Positive | Tissue specimen from lung | HITB_2 | Haemophilus influenzae type B |
np_op_hiat_1 | Positive | Nasopharyngeal and Oropharyngeal swab | HIAT_1 | Haemophilus influenzae |
np_op_hitb_2 | Positive | Nasopharyngeal and Oropharyngeal swab | HITB_2 | Haemophilus influenzae type B |
HIAT_1
and HITB_2
positve casesWe discussed one of the nine cases in the previous section. What did the other 8 cases with positive HIAT_1
and HITB_2
have in their casual chain?
For convenience, to remove the long champs_deid
, we will define a function that replaces the ID with a unique numeric label, which we will use in several of the following code blocks.
recode_champs_deid <- function(x) { x %>% mutate(champs_case = factor(champs_deid) %>% as.numeric()) %>% select(-champs_deid) %>% select(champs_case, everything()) %>% arrange(champs_case) }
We can find the dual positive TAC with no Haemophilus influenzae in the causal chain with an etiol
reported (tac_pboth_nhic
) and then look at the DeCoDe results.
tac_pboth_nhic <- d$dcd_long %>% filter(champs_deid %in% positive_both, !is.na(etiol)) %>% pull(champs_deid) %>% unique() d$dcd_long %>% filter(champs_deid %in% tac_pboth_nhic, !is.na(etiol))%>% filter(champs_deid != hi_causal_pboth) %>% recode_champs_deid() %>% knitr::kable(format = "markdown")
champs_case | champs_group_desc | type | etiol | etiol_num |
---|---|---|---|---|
1 | Malaria | underlying_cause | Plasmodium falciparum | 1 |
2 | Malaria | underlying_cause | Plasmodium falciparum | 1 |
3 | Sepsis | immediate_cause | Escherichia coli | 1 |
3 | Sepsis | immediate_cause | Candida albicans | 2 |
3 | Sepsis | immediate_cause | Pseudomonas aeruginosa | 3 |
3 | HIV | underlying_cause | Human Immunodeficiency Virus (HIV) | 1 |
3 | Lower respiratory infections | morbid_condition | Candida albicans | 1 |
3 | Lower respiratory infections | morbid_condition | Pseudomonas aeruginosa | 2 |
3 | Lower respiratory infections | morbid_condition | Klebsiella pneumoniae | 3 |
4 | Diarrheal Diseases | immediate_cause | Escherichia coli | 1 |
4 | Diarrheal Diseases | immediate_cause | Shigella | 2 |
4 | Diarrheal Diseases | underlying_cause | Escherichia coli | 1 |
4 | Diarrheal Diseases | underlying_cause | Shigella | 2 |
5 | Lower respiratory infections | immediate_cause | Salmonella enterica sub-species paratyphi A | 1 |
6 | Lower respiratory infections | immediate_cause | Other Etiology/Agent | 1 |
6 | Lower respiratory infections | immediate_cause | Cytomegalovirus (CMV) | 2 |
6 | HIV | underlying_cause | Other Etiology/Agent | 1 |
7 | Malaria | underlying_cause | Plasmodium falciparum | 1 |
Notice that there are only 7 cases in the TAC data table shown above. We expected to have eight cases after removing the one case in hi_causal_pboth
. Let’s look at the missing case and see why it is not in our above table.
d$dcd_long %>% filter(champs_deid %in% positive_both[!positive_both %in% tac_pboth_nhic], !is.na(champs_group_desc)) %>% recode_champs_deid() %>% knitr::kable(format = "markdown")
champs_case | champs_group_desc | type | etiol | etiol_num |
---|---|---|---|---|
1 | Undetermined | underlying_cause | NA | 1 |
1 | Undetermined | underlying_cause | NA | 2 |
1 | Undetermined | underlying_cause | NA | 3 |
This could be a potential Haemophilus influenzae Type B causal case based on the double-positive in the TAC data for HIAT_1
and HITB_2
and the DeCoDe board’s non-determination. Finally, we have two cases that we could recommend for inclusion in the Haemophilus influenzae Type B causal group.
One other important note is that the champs_group_desc
had the pertinent information needed to determine the causal description for this case and that the respective etiol
was NA
. We need to be careful filtering out missing values in etiol
, which is what we did in the table of seven cases.
calc_
functionsOur calculations above were not broken down by site or age, but we can use the validated calc_
functions to check our results using the marginal totals.
We reported 302 cases and can check those values against the calc_cc_detected_by_site_age()
calc_detected_HIAT_1_wc_hi <- calc_cc_detected_by_site_age(d, condition = "Haemophilus influenzae", pathogen = "Haemophilus influenzae", specimen_types = valid_specimen_types(d), sites = valid_sites(d)) calc_detected_HIAT_1_wc_hi$denominator %>% knitr::kable(format = "markdown")
Bangladesh | Kenya | Mali | Mozambique | South Africa | Ethiopia | Sierra Leone | Sum | |
---|---|---|---|---|---|---|---|---|
Stillbirth | 1 | 5 | 1 | 0 | 5 | 0 | 1 | 13 |
Death in the first 24 hours | 1 | 2 | 1 | 4 | 3 | 0 | 1 | 12 |
Early Neonate (1 to 6 days) | 3 | 0 | 0 | 1 | 2 | 1 | 4 | 11 |
Late Neonate (7 to 27 days) | 1 | 2 | 0 | 0 | 3 | 2 | 1 | 9 |
Infant (28 days to less than 12 months) | 2 | 41 | 15 | 11 | 31 | 3 | 2 | 105 |
Child (12 months to less than 60 Months) | 0 | 60 | 12 | 28 | 34 | 6 | 12 | 152 |
Sum | 8 | 110 | 29 | 44 | 78 | 12 | 21 | 302 |
We reported 10 cases. We can see that marginal total below.
calc_detected_HITB_2_wc_hib <- calc_cc_detected_by_site_age(d, condition = "Haemophilus influenzae", pathogen = "Haemophilus influenzae type B", specimen_types = valid_specimen_types(d), sites = valid_sites(d)) calc_detected_HITB_2_wc_hib$denominator %>% knitr::kable(format = "markdown")
Bangladesh | Kenya | Mali | Mozambique | South Africa | Ethiopia | Sierra Leone | Sum | |
---|---|---|---|---|---|---|---|---|
Stillbirth | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Death in the first 24 hours | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Early Neonate (1 to 6 days) | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
Late Neonate (7 to 27 days) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Infant (28 days to less than 12 months) | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
Child (12 months to less than 60 Months) | 0 | 2 | 1 | 1 | 1 | 0 | 3 | 8 |
Sum | 0 | 3 | 1 | 1 | 2 | 0 | 3 | 10 |
We reported 36 cases and can see that value in the marginal totals.
calc_causal_hi_only <- calc_cc_allcases_by_site_age(d, condition = "Haemophilus influenzae", sites = valid_sites(d)) calc_causal_hi_only$numerator %>% knitr::kable(format = "markdown")
Bangladesh | Kenya | Mali | Mozambique | South Africa | Ethiopia | Sierra Leone | Sum | |
---|---|---|---|---|---|---|---|---|
Stillbirth | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
Death in the first 24 hours | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 |
Early Neonate (1 to 6 days) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Late Neonate (7 to 27 days) | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 |
Infant (28 days to less than 12 months) | 0 | 1 | 3 | 1 | 7 | 0 | 0 | 12 |
Child (12 months to less than 60 Months) | 0 | 4 | 1 | 5 | 4 | 4 | 1 | 19 |
Sum | 0 | 5 | 5 | 6 | 15 | 4 | 1 | 36 |