When fitting both the disjoint and k-order neighbour models with the bigDM package (Orozco-Acosta et al., 2021, 2023) parallel or distributed computation strategies can be performed to speed up computations by using the future package (Bengtsson, 2021).
Different computation strategies can be specified through the plan
argument of the CAR_INLA()
, STCAR_INLA()
and MCAR_INLA()
functions (available for bigDM version 0.5 and higher). For further details about these strategies see the documentation of the future package and its corresponding vignettes.
If plan="sequential"
(default), the models are fitted sequentially and in the current R session (local machine).
If plan="cluster"
, multiple models can be fitted in parallel on external R sessions (local machine) or distributed in remote compute nodes. When using this option, the identifications of the local or remote workers where the models are going to be processed must be specified through the workers
argument. The following options are available:
## single local machine
workers <- "localhost"
# or
workers <- Sys.info()[["nodename"]]
## multiple local machines
workers <- rep("localhost", 2) # two local machines
workers <- future::availableWorkers() # total number of available workers
## single remote machine
workers <- "172.0.0.1" # using IP address
workers <- "machine1.remote.es" # using hostname
## multiple remote machines
workers <- c("172.0.0.1","172.0.0.2")
# or
workers <- c("machine1.remote.es","machine2.remote.es")
workers <- c("localhost","172.0.0.1","172.0.0.2")
## two models per remote machine
workers <- rep(c("172.0.0.1","172.0.0.2"), each=2)
## two models per local and remote machines
workers <- rep(c("localhost","172.0.0.1","172.0.0.2"), each=2)
Important note: To work with remote workers, public and private keys between the different machines must be previously configured. In addition, the bigDM package needs to be installed in all the local/remote machines. More details can be found at this vignette and in the documentation of the makeClusterPSOCK
function.
In what follows, we show how to fit the Disjoint model for the Spanish colorectal cancer mortality data using different computation strategies. A full description of the data and usage of the CAR_INLA()
function is given in the vignette bigDM: fitting spatial models.
library(INLA)
#> Warning: package 'INLA' was built under R version 4.2.2
library(bigDM)
data(Carto_SpainMUN)
head(Carto_SpainMUN)
#> Simple feature collection with 6 features and 10 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 485318 ymin: 4727428 xmax: 543317 ymax: 4779153
#> Projected CRS: ETRS89 / UTM zone 30N
#> ID name lat long area
#> 1 01001 Alegria-Dulantzi 4742974 4742974 19913794
#> 2 01002 Amurrio 4765886 4765886 96145595
#> 3 01003 Aramaio 4766225 4766225 73338806
#> 4 01004 Artziniega 4775453 4775453 27506468
#> 5 01006 Arminon 4729884 4729884 10559721
#> 6 01008 Arrazua-Ubarrundia (San Martin de Ania) 4752714 4752714 57502811
#> perimeter obs exp SMR region geometry
#> 1 34372.11 2 3.0237149 0.6614380 Pais Vasco MULTIPOLYGON (((538259 4737...
#> 2 63352.32 28 20.8456682 1.3432047 Pais Vasco MULTIPOLYGON (((503520 4760...
#> 3 41430.46 6 3.7527301 1.5988360 Pais Vasco MULTIPOLYGON (((533286 4759...
#> 4 22605.22 3 3.2093191 0.9347777 Pais Vasco MULTIPOLYGON (((491260 4776...
#> 5 17847.35 0 0.4817391 0.0000000 Pais Vasco MULTIPOLYGON (((509851 4727...
#> 6 64968.81 2 1.9643891 1.0181282 Pais Vasco MULTIPOLYGON (((534678 4746...
For our example data in Carto_SpainMUN
the \(D=15\) Autonomous Regions of Spain are used as a partition of the \(n=7907\) municipalities (region
variable of the sf
object).
1. Run the models sequentially in the current R session (default option)
Model1 <- CAR_INLA(carto=Carto_SpainMUN, ID.area="ID", ID.group="region", O="obs", E="exp",
model="partition", k=0, seed=1234, strategy="simplified.laplace",
plan="sequential", workers=NULL)
#> STEP 1: Pre-processing data
#> STEP 2: Fitting partition (k=0) model with INLA
#> + Model 1 of 15
#> + Model 2 of 15
#> + Model 3 of 15
#> + Model 4 of 15
#> + Model 5 of 15
#> + Model 6 of 15
#> + Model 7 of 15
#> + Model 8 of 15
#> + Model 9 of 15
#> + Model 10 of 15
#> + Model 11 of 15
#> + Model 12 of 15
#> + Model 13 of 15
#> + Model 14 of 15
#> + Model 15 of 15
#> STEP 3: Merging the results
summary(Model1)
#>
#> Call:
#> c("inla.core(formula = formula, family = family, contrasts = contrasts,
#> ", " data = data, quantiles = quantiles, E = E, offset = offset, ", "
#> scale = scale, weights = weights, Ntrials = Ntrials, strata = strata,
#> ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose =
#> verbose, ", " lincomb = lincomb, selection = selection, control.compute
#> = control.compute, ", " control.predictor = control.predictor,
#> control.family = control.family, ", " control.inla = control.inla,
#> control.fixed = control.fixed, ", " control.mode = control.mode,
#> control.expert = control.expert, ", " control.hazard = control.hazard,
#> control.lincomb = control.lincomb, ", " control.update =
#> control.update, control.lp.scale = control.lp.scale, ", "
#> control.pardiso = control.pardiso, only.hyperparam = only.hyperparam,
#> ", " inla.call = inla.call, inla.arg = inla.arg, num.threads =
#> num.threads, ", " blas.num.threads = blas.num.threads, keep = keep,
#> working.directory = working.directory, ", " silent = silent, inla.mode
#> = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame =
#> .parent.frame)")
#> Time used:
#> Pre = 8.73, Running = 80.1, Post = 5.21, Merging = 28.2
#> Fixed effects:
#> mean sd 0.025quant 0.5quant 0.975quant mode
#> (Intercept) -0.016 0.005 -0.026 -0.016 -0.007 -0.017
#>
#> Random effects:
#> Name Model
#> ID.area Generic1 model
#>
#> Deviance Information Criterion (DIC) ...............: 27451.37
#> Deviance Information Criterion (DIC, saturated) ....: 8387.93
#> Effective number of parameters .....................: 402.56
#>
#> Watanabe-Akaike information criterion (WAIC) ...: 27433.36
#> Effective number of parameters .................: 341.45
#>
#> is computed
#> Posterior summaries for the linear predictor and the fitted values are computed
#> (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
2. Run the models in parallel on external R sessions
workers <- rep("localhost", 4)
num.threads <- "4:1"
Model2 <- CAR_INLA(carto=Carto_SpainMUN, ID.area="ID", ID.group="region", O="obs", E="exp",
model="partition", k=0, seed=1234, strategy="simplified.laplace",
plan="cluster", workers=workers, num.threads=num.threads)
#> STEP 1: Pre-processing data
#> STEP 2: Fitting partition (k=0) model with INLA
#> + Model 1 of 15
#> + Model 2 of 15
#> + Model 3 of 15
#> + Model 4 of 15
#> + Model 5 of 15
#> + Model 6 of 15
#> + Model 7 of 15
#> + Model 8 of 15
#> + Model 9 of 15
#> + Model 10 of 15
#> + Model 11 of 15
#> + Model 12 of 15
#> + Model 13 of 15
#> + Model 14 of 15
#> + Model 15 of 15
#> STEP 3: Merging the results
summary(Model2)
#>
#> Call:
#> c("inla.core(formula = formula, family = family, contrasts = contrasts,
#> ", " data = data, quantiles = quantiles, E = E, offset = offset, ", "
#> scale = scale, weights = weights, Ntrials = Ntrials, strata = strata,
#> ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose =
#> verbose, ", " lincomb = lincomb, selection = selection, control.compute
#> = control.compute, ", " control.predictor = control.predictor,
#> control.family = control.family, ", " control.inla = control.inla,
#> control.fixed = control.fixed, ", " control.mode = control.mode,
#> control.expert = control.expert, ", " control.hazard = control.hazard,
#> control.lincomb = control.lincomb, ", " control.update =
#> control.update, control.lp.scale = control.lp.scale, ", "
#> control.pardiso = control.pardiso, only.hyperparam = only.hyperparam,
#> ", " inla.call = inla.call, inla.arg = inla.arg, num.threads =
#> num.threads, ", " blas.num.threads = blas.num.threads, keep = keep,
#> working.directory = working.directory, ", " silent = silent, inla.mode
#> = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame =
#> .parent.frame)")
#> Time used:
#> Running = 85.4, Merging = 27.2, Total = 113, NA = NA
#> Fixed effects:
#> mean sd 0.025quant 0.5quant 0.975quant mode
#> (Intercept) -0.016 0.005 -0.026 -0.016 -0.006 -0.016
#>
#> Random effects:
#> Name Model
#> ID.area Generic1 model
#>
#> Deviance Information Criterion (DIC) ...............: 27449.21
#> Deviance Information Criterion (DIC, saturated) ....: 8385.77
#> Effective number of parameters .....................: 400.28
#>
#> Watanabe-Akaike information criterion (WAIC) ...: 27432.40
#> Effective number of parameters .................: 340.33
#>
#> is computed
#> Posterior summaries for the linear predictor and the fitted values are computed
#> (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
The decision of how many models to process in parallel depends on the available computational resources. However, the more the models to be fitted in parallel, the more the R sessions to be opened in the backend, which would increase the computational time for data transfer.
3. Run the models distributed in remote machines
Before evaluating the following code, the user must set appropriate identifications for the remote machines.
## Example of distributed computation with four models per remote machine
workers <- rep(c("172.0.0.1","172.0.0.2","172.0.0.3"), each=4)
num.threads <- "8:1"
Model3 <- CAR_INLA(carto=Carto_SpainMUN, ID.area="ID", ID.group="region", O="obs", E="exp",
model="partition", k=0, seed=1234, strategy="simplified.laplace",
plan="cluster", workers=workers, num.threads=num.threads)
This work has been supported by Project MTM2017-82553-R (AEI/FEDER, UE) and Project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033. It has also been partially funded by the Public University of Navarra (project PJUPNA2001).