1 Overview

This document illustrates the application of a methodology to take into account within-person blood pressure (BP) variability in epidemiological studies to avoid bias in the estimation of the burden of hypertension (HT).

1.1 R-package

The material and R code used in this application are part of the R-package BPpack, available in Github. Use the package devtools to install the package :

require(devtools)
install_github("echatignoux/BPpack")

After installation, the package can be loaded into R.

library(BPpack)

The tidyverse, magrittr, splines and survey packages are also needed to run the following code.

library(tidyverse)
library(magrittr)
library(splines)
library(survey)

1.2 Data sources

The application of the method is illustrated here on the NHANESIII data base, available at (https://wwwn.cdc.gov/nchs/nhanes/nhanes3/default.aspx).

For the purpose of illustration, we restricted the NHANESIII data to patients having at least two visits and two measures of BP per visit. We further selected, for each patients, the first two visits and the last two BP measures per visit, so that the resulting data in dt_nhanes corresponds to an equilibrated design.

The resulting data is stored in a long format, with one line per patient/visit/measure.

dt_nhanes
## # A tibble: 121,280 x 12
## # Groups:   id, BP_typ, visit [60,640]
##      id   age sex   BP_typ visit  meas    bp tt_htn   bmi diab  samp_weight htn  
##   <dbl> <dbl> <fct> <chr>  <dbl> <int> <dbl> <lgl>  <dbl> <lgl>       <dbl> <lgl>
## 1     3    21 1     dia        1     1    60 FALSE   24.4 FALSE        1523 FALSE
## 2     3    21 1     dia        1     2    58 FALSE   24.4 FALSE        1523 FALSE
## 3     3    21 1     dia        2     1    78 FALSE   24.4 FALSE        1523 FALSE
## 4     3    21 1     dia        2     2    76 FALSE   24.4 FALSE        1523 FALSE
## 5     3    21 1     sys        1     1   116 FALSE   24.4 FALSE        1523 FALSE
## # ... with 121,275 more rows

1.3 Main function

The main function of the BPpack package is the correct_htn function, which allows for estimation of the prevalence of hypertension.

Its arguments are (see ?correct_htn):

  • form A formula giving covariates by which HTN prevalence is calculated. If htn is placed in the left hand side of formula, then variation of prevalence according to the covariates given in the right hand side of are estimated with a glm (survey::svyglm if surv_des is not NULL) model with quasibinomial distribution. Else, prevalence is tabulated according to the covariates given in the right hand side (simple tabulation or survey::svyby if surv_des is not NUL).
  • subpop A boolean covariate that defines a subpopulation over which to filter the calculation of prevalence. subpop must be given in a formula form, e.g. ~subpop.
  • n_samp The number of posterior sample of the correction factor to be used in the estimation. Default (NULL) resumes to the maximum number of available posterior samples.
  • data_long Data frame giving the BP measurements of the population under study. Data must be given in a long format, e.g. one raw per BP measure (see dt_nhanes format). The data frame must have the following columns: “id” Patient identifier; “age” Age of the patient in years; “sex” Sex of the patient; “tt_htn” Boolean with value TRUE if the patient is under anti-hypertensive treatment; “visit” : Identifier of the visit; “BP_typ” : type of BP prelevement, “dia” for diastolic, “sys” for systolic; “bp” : Value of blood pressure measurement
  • surv_des If applicable, the survey design of the study, specifyed with svydesign from survey package.
  • correct Boolean set to TRUE (the default) to correct prevalence.
  • tresh A data frame giving the BP thresholds that defines hypertension. Default to 140 for systolic BP, 90 for diastolic.
  • glm Boolean set to TRUE (the default) to evaluate the prevalence of HTN according to the formula specified in form with a glm model with binomial distribution (svyglm if a survey design is specified). Otherwise tabulation of the prevalence of HTN according to the covariates in formula is done.

2 Estimation of the burden of hypertension in NHANESIII data

2.1 Uncorrected estimates

Hypertension is defined within one individual if the mean of its BP measurements (for diastolic or systolic BP) exceeds a threshold, or if the patient takes an treatment for HT.

The crude estimation of the prevalence of hypertension consists in counting hypertensive patients from the data. This can be done using the correct_htn function:

correct_htn(form=htn ~ 1,
            data_long = dt_nhanes,
            correct = FALSE)
## # A tibble: 1 x 4
##     htn      se   low    up
##   <dbl>   <dbl> <dbl> <dbl>
## 1 0.263 0.00357 0.256 0.270

The sampling design of the NHANESIII survey may be taken into account using the surv_des argument of the function:

## First, define the design
d.s <- svydesign(ids=~1,
                 data=dt_nhanes,
                 weights=~samp_weight)

## Then, use it 
correct_htn(form=htn ~ 1,
            data_long = dt_nhanes,
            surv_des = d.s,
            correct = FALSE)
## # A tibble: 1 x 4
##     htn      se   low    up
##   <dbl>   <dbl> <dbl> <dbl>
## 1 0.195 0.00446 0.187 0.204

It worth saying that in example above, the specification of the survey design was loose, as applied to data given in long format, whereas it should have been defined from data in a wide format instead. This doesn’t matter actually, as the correct_htn function only keeps the specification of the survey design, and applies it to data with only one observation per individual.

This point is confirmed bellow:

## First, define the design
dt_nhanes_wide<-
  dt_nhanes%>%
  ungroup()%>%
  mutate(vm=paste("bp",visit,meas,sep="_"))%>%
  select(-visit,-meas)%>%
  spread(vm,bp)
d.s.w <- svydesign(ids=~1,
                 data=dt_nhanes_wide,
                 weights=~samp_weight)

## Then, use it 
correct_htn(form=htn ~ 1,
            data_long = dt_nhanes,
            surv_des = d.s.w,
            correct = FALSE)
## # A tibble: 1 x 4
##     htn      se   low    up
##   <dbl>   <dbl> <dbl> <dbl>
## 1 0.195 0.00446 0.187 0.204

The correct_htn function also allows for estimations of HT prevalence according to covariates, for example, according to body mass index (variable bmi).

Note that this estimation can be done only in the subpopulation with non missing value of BMI, so that we should restrict or estimates to this subpop. This is done using the subpop argument of the correct_htn.

## Define boolen for the subpop with not missing bmi
dt_nhanes%<>%mutate(has_bmi=!is.na(bmi))
## Estimate the variation of HT according to bmi with a natural spline of 3 df
bmi_htn<-correct_htn(form=htn ~ ns(bmi,df=3),
            data_long = dt_nhanes,
            subpop=~has_bmi,
            surv_des = d.s,
            correct = FALSE)

## Plot the results
bmi_htn%>%
  mutate_at(vars(htn,low,up),~.x*100)%>%
  ggplot(data=.)+
  aes(bmi,htn,ymin=low,ymax=up)+
  geom_line()+
  geom_ribbon(fill=NA,colour="black",linetype=5)+
  xlab("Body mass index")+ylab("HT prevalence (in %)")

2.2 Corrected estimates

Taking into account the within-person blood pressure (BP) variability in the estimation of hypertension is simply done passing the correct argument from the correct_htn function to TRUE. The number of samples from the posterior distribution used for the estimation is controlled by the n_samp argument. As the estimation time increases with the number of samples, we’ll keep it low (n_samp=100) in this illustrative application here.

To illustrate the consequences of neglecting within-person blood pressure variability in the estimates, we estimated HT prevalence using the two visits from NHANESIII, or only the first or the second.

dt_nhanes_v1<-dt_nhanes%>%filter(visit==1)
dt_nhanes_v2<-dt_nhanes%>%filter(visit==2)

htn_cor<-correct_htn(form=htn ~ 1,
            data_long = dt_nhanes,
            surv_des = d.s,
            n_samp = 100,
            correct = TRUE)

htn_cor_v1<-correct_htn(form=htn ~ 1,
            data_long = dt_nhanes_v1,
            surv_des = d.s,
            n_samp = 100,
            correct = TRUE)

htn_cor_v2<-correct_htn(form=htn ~ 1,
            data_long = dt_nhanes_v2,
            surv_des = d.s,
            n_samp = 100,
            correct = TRUE)

htn_uncor<-correct_htn(form=htn ~ 1,
            data_long = dt_nhanes,
            surv_des = d.s,
            n_samp = 1,
            correct = FALSE)

htn_uncor_v1<-correct_htn(form=htn ~ 1,
            data_long = dt_nhanes_v1,
            surv_des = d.s,
            n_samp = 1,
            correct = FALSE)

htn_uncor_v2<-correct_htn(form=htn ~ 1,
            data_long = dt_nhanes_v2,
            surv_des = d.s,
            n_samp = 1,
            correct = FALSE)

The results are reported in the table below:

Table 2.1: Corrected and un-corrected estimates of HT according to the visits used
Visit Un-corrected HT Corrected HT
All 19.53 [18.67 - 20.42] 17.80 [17.00 - 18.64]
Visit 1 22.84 [21.89 - 23.82] 19.34 [18.49 - 20.22]
Visit 2 19.94 [19.08 - 20.84] 16.91 [16.12 - 17.72]

The table shows the importance of accounting for BP variability when estimating HT prevalence. If no correction is performed, HTN estimated with visit 1 or visit 2 data only are larger than HTN estimated with the two visits. This latter however also exhibits some bias, corrected HTN (i.e. HT prevalence estimated with BP collected over an infinite number visits and measures in the population) being 9% lower than uncorrected HTN. As expected, the overestimation of HTN is higher (~15%) when only one visit is used for the estimation.

Some differences however still remains between visit 1 and visit 2 estimates, probably due to lower BP reactivity to physicians at the second visit (lower white coat effect) and to differences in conditions under which BP was measured in the two visits. Without additional information on the reasons for those differences however, there is no indication about which visit to favor and no obvious additional correction to apply.

Similarly, correct_htn can be used to estimating the level of corrected HT prevalence at different BMI levels with the same syntax as in the previous section, but specifying correct = TRUE in the arguments.

The corrected curve (in blue) is superimposed with the previous uncorrected curve (in red).

bmi_htn_correct<-correct_htn(form=htn ~ ns(bmi,df=3),
            data_long = dt_nhanes,
            subpop=~has_bmi,
            surv_des = d.s,
            n_samp = 100,
            correct = TRUE)