When fitting both the disjoint and k-order neighbour models with the **bigDM** package (Orozco-Acosta et al., 2021, 2022) parallel or distributed computation strategies can be performed to speed up computations by using the **future** package (Bengtsson, 2021).

Different computation strategies can be specified through the `plan`

argument of the `CAR_INLA()`

and `STCAR_INLA()`

functions (available for **bigDM** version 0.4 and higher). For further details about these strategies see the documentation of the **future** package and its corresponding vignettes.

If

`plan="sequential"`

(default), the models are fitted sequentially and in the current R session (local machine).If

`plan="cluster"`

, multiple models can be fitted in parallel on external R sessions (local machine) or distributed in remote compute nodes. When using this option, the identifications of the local or remote workers where the models are going to be processed must be specified through the`workers`

argument. The following options are available:##### 1) Using local machines

`## single local machine workers <- "localhost" # or workers <- Sys.info()[["nodename"]] ## multiple local machines workers <- rep("localhost", 2) # two local machines workers <- future::availableWorkers() # total number of available workers`

##### 2) Using remote compute nodes

`## single remote machine workers <- "172.0.0.1" # using IP address workers <- "machine1.remote.es" # using hostname ## multiple remote machines workers <- c("172.0.0.1","172.0.0.2") # or workers <- c("machine1.remote.es","machine2.remote.es")`

##### 3) Using both local and remote machines

`workers <- c("localhost","172.0.0.1","172.0.0.2")`

##### 4) It is also possible to fit more than one model simultaneously in each remote machine by setting

`## two models per remote machine workers <- rep(c("172.0.0.1","172.0.0.2"), each=2) ## two models per local and remote machines workers <- rep(c("localhost","172.0.0.1","172.0.0.2"), each=2)`

**Important note**: To work with remote workers, public and private keys between the different machines must be previously configured. In addition, the*bigDM*package needs to be installed in all the local/remote machines. More details can be found at this vignette and in the documentation of the`makeClusterPSOCK`

function.

In what follows, we show how to fit the *Disjoint model* for the Spanish colorectal cancer mortality data using different computation strategies. A full description of the data and usage of the `CAR_INLA()`

function is given in the vignette **bigDM: fitting spatial models**.

```
library(INLA)
library(bigDM)
data(Carto_SpainMUN)
head(Carto_SpainMUN)
#> Simple feature collection with 6 features and 10 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 485318 ymin: 4727428 xmax: 543317 ymax: 4779153
#> Projected CRS: ETRS89 / UTM zone 30N
#> ID name lat long area
#> 1 01001 Alegria-Dulantzi 4742974 4742974 19913794
#> 2 01002 Amurrio 4765886 4765886 96145595
#> 3 01003 Aramaio 4766225 4766225 73338806
#> 4 01004 Artziniega 4775453 4775453 27506468
#> 5 01006 Arminon 4729884 4729884 10559721
#> 6 01008 Arrazua-Ubarrundia (San Martin de Ania) 4752714 4752714 57502811
#> perimeter obs exp SMR region geometry
#> 1 34372.11 2 3.0237149 0.6614380 Pais Vasco MULTIPOLYGON (((538259 4737...
#> 2 63352.32 28 20.8456682 1.3432047 Pais Vasco MULTIPOLYGON (((503520 4760...
#> 3 41430.46 6 3.7527301 1.5988360 Pais Vasco MULTIPOLYGON (((533286 4759...
#> 4 22605.22 3 3.2093191 0.9347777 Pais Vasco MULTIPOLYGON (((491260 4776...
#> 5 17847.35 0 0.4817391 0.0000000 Pais Vasco MULTIPOLYGON (((509851 4727...
#> 6 64968.81 2 1.9643891 1.0181282 Pais Vasco MULTIPOLYGON (((534678 4746...
```

For our example data in `Carto_SpainMUN`

the \(D=15\) Autonomous Regions of Spain are used as a partition of the \(n=7907\) municipalites (`region`

variable of the `sf`

object).

**1. Run the models sequentially in the current R session (default option)**

```
Model1 <- CAR_INLA(carto=Carto_SpainMUN, ID.area="ID", ID.group="region", O="obs", E="exp",
model="partition", k=0, seed=1234, strategy="simplified.laplace",
plan="sequential", workers=NULL)
#> STEP 1: Pre-processing data
#> STEP 2: Fitting partition (k=0) model with INLA
#> + Model 1 of 15
#> + Model 2 of 15
#> + Model 3 of 15
#> + Model 4 of 15
#> + Model 5 of 15
#> + Model 6 of 15
#> + Model 7 of 15
#> + Model 8 of 15
#> + Model 9 of 15
#> + Model 10 of 15
#> + Model 11 of 15
#> + Model 12 of 15
#> + Model 13 of 15
#> + Model 14 of 15
#> + Model 15 of 15
#> STEP 3: Merging the results
summary(Model1)
#>
#> Call:
#> c("inla.core(formula = formula, family = family, contrasts = contrasts,
#> ", " data = data, quantiles = quantiles, E = E, offset = offset, ", "
#> scale = scale, weights = weights, Ntrials = Ntrials, strata = strata,
#> ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose =
#> verbose, ", " lincomb = lincomb, selection = selection, control.compute
#> = control.compute, ", " control.predictor = control.predictor,
#> control.family = control.family, ", " control.inla = control.inla,
#> control.fixed = control.fixed, ", " control.mode = control.mode,
#> control.expert = control.expert, ", " control.hazard = control.hazard,
#> control.lincomb = control.lincomb, ", " control.update =
#> control.update, control.lp.scale = control.lp.scale, ", "
#> control.pardiso = control.pardiso, only.hyperparam = only.hyperparam,
#> ", " inla.call = inla.call, inla.arg = inla.arg, num.threads =
#> num.threads, ", " blas.num.threads = blas.num.threads, keep = keep,
#> working.directory = working.directory, ", " silent = silent, inla.mode
#> = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame =
#> .parent.frame)")
#> Time used:
#> Pre = 8.34, Running = 76.2, Post = 4.37, Merging = 18.7
#> Random effects:
#> Name Model
#> ID.area Generic1 model
#>
#> Deviance Information Criterion (DIC) ...............: 27449.04
#> Deviance Information Criterion (DIC, saturated) ....: 8385.59
#> Effective number of parameters .....................: 400.88
#>
#> Watanabe-Akaike information criterion (WAIC) ...: 27433.89
#> Effective number of parameters .................: 341.94
#>
#> is computed
#> Posterior summaries for the linear predictor and the fitted values are computed
#> (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
```

**2. Run the models in parallel on external R sessions**

```
workers <- rep("localhost", 4)
Model2 <- CAR_INLA(carto=Carto_SpainMUN, ID.area="ID", ID.group="region", O="obs", E="exp",
model="partition", k=0, seed=1234, strategy="simplified.laplace",
plan="cluster", workers=workers)
#> STEP 1: Pre-processing data
#> STEP 2: Fitting partition (k=0) model with INLA
#> + Model 1 of 15
#> + Model 2 of 15
#> + Model 3 of 15
#> + Model 4 of 15
#> + Model 5 of 15
#> + Model 6 of 15
#> + Model 7 of 15
#> + Model 8 of 15
#> + Model 9 of 15
#> + Model 10 of 15
#> + Model 11 of 15
#> + Model 12 of 15
#> + Model 13 of 15
#> + Model 14 of 15
#> + Model 15 of 15
#> STEP 3: Merging the results
summary(Model2)
#>
#> Call:
#> c("inla.core(formula = formula, family = family, contrasts = contrasts,
#> ", " data = data, quantiles = quantiles, E = E, offset = offset, ", "
#> scale = scale, weights = weights, Ntrials = Ntrials, strata = strata,
#> ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose =
#> verbose, ", " lincomb = lincomb, selection = selection, control.compute
#> = control.compute, ", " control.predictor = control.predictor,
#> control.family = control.family, ", " control.inla = control.inla,
#> control.fixed = control.fixed, ", " control.mode = control.mode,
#> control.expert = control.expert, ", " control.hazard = control.hazard,
#> control.lincomb = control.lincomb, ", " control.update =
#> control.update, control.lp.scale = control.lp.scale, ", "
#> control.pardiso = control.pardiso, only.hyperparam = only.hyperparam,
#> ", " inla.call = inla.call, inla.arg = inla.arg, num.threads =
#> num.threads, ", " blas.num.threads = blas.num.threads, keep = keep,
#> working.directory = working.directory, ", " silent = silent, inla.mode
#> = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame =
#> .parent.frame)")
#> Time used:
#> Running = 82.9, Merging = 18.2, Total = 101, NA = NA
#> Random effects:
#> Name Model
#> ID.area Generic1 model
#>
#> Deviance Information Criterion (DIC) ...............: 27451.33
#> Deviance Information Criterion (DIC, saturated) ....: 8387.88
#> Effective number of parameters .....................: 402.53
#>
#> Watanabe-Akaike information criterion (WAIC) ...: 27436.31
#> Effective number of parameters .................: 343.59
#>
#> is computed
#> Posterior summaries for the linear predictor and the fitted values are computed
#> (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
```

The decision of how many models to process in parallel depends on the available computational resources. However, the more the models to be fitted in parallel, the more the R sessions to be opened in the backend, which would increase the computational time for data transfer.

**3. Run the models distributed in remote machines**

Before evaluating the following code, the user must set appropriate identifications for the remote machines.

```
## Example of distributed computation with four models per remote machine
workers <- rep(c("172.0.0.1","172.0.0.2","172.0.0.3"), each=4)
Model3 <- CAR_INLA(carto=Carto_SpainMUN, ID.area="ID", ID.group="region", O="obs", E="exp",
model="partition", k=0, seed=1234, strategy="simplified.laplace",
plan="cluster", workers=workers)
#> STEP 1: Pre-processing data
#> STEP 2: Fitting partition (k=0) model with INLA
#> starting worker pid=1444 on localhost:11204 at 13:03:13.910
#> starting worker pid=1465 on localhost:11204 at 13:03:14.123
#> starting worker pid=1489 on localhost:11204 at 13:03:14.330
#> starting worker pid=1509 on localhost:11204 at 13:03:14.544
#> starting worker pid=14966 on localhost:11208 at 13:03:15.548
#> starting worker pid=14998 on localhost:11209 at 13:03:16.545
#> starting worker pid=15028 on localhost:11210 at 13:03:17.514
#> starting worker pid=15058 on localhost:11211 at 13:03:18.544
#> starting worker pid=12722 on localhost:11212 at 13:03:19.010
#> starting worker pid=12750 on localhost:11213 at 13:03:19.430
#> starting worker pid=12778 on localhost:11214 at 13:03:19.764
#> starting worker pid=12806 on localhost:11215 at 13:03:20.088
#> + Model 1 of 15
#> + Model 2 of 15
#> + Model 3 of 15
#> + Model 4 of 15
#> + Model 5 of 15
#> + Model 6 of 15
#> + Model 7 of 15
#> + Model 8 of 15
#> + Model 9 of 15
#> + Model 10 of 15
#> + Model 11 of 15
#> + Model 12 of 15
#> + Model 13 of 15
#> + Model 14 of 15
#> + Model 15 of 15
#> STEP 3: Merging the results
summary(Model3)
#> Call:
#> c("inla(formula = formula, family = \"poisson\", data = data.INLA, ", " E = E,
#> control.compute = list(dic = TRUE, cpo = TRUE, waic = TRUE, ", " config = TRUE),
#> control.predictor = list(compute = TRUE, ", " cdf = c(log(1))), control.inla =
#> list(strategy = strategy))" )
#> Time used:
#> Running = 111, Merging = 16.7, Total = 128, NA = NA
#> Random effects:
#> Name Model
#> ID.area Generic1 model
#>
#> Deviance Information Criterion (DIC) ...............: 27450.60
#> Deviance Information Criterion (DIC, saturated) ....: 8387.16
#> Effective number of parameters .....................: 403.90
#>
#> Watanabe-Akaike information criterion (WAIC) ...: 27433.63
#> Effective number of parameters .................: 343.31
#>
#> Posterior summaries for the linear predictor and the fitted values are computed
#> (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
```

This work has been supported by the Spanish Ministry of Economy, Industry, and Competitiveness (project MTM2017-82553-R, AEI/FEDER, UE), and partially funded by la Caixa Foundation (ID 1000010434), Caja Navarra Foundation and UNED Pamplona, under agreement LCF/PR/PR15/51100007 (project REF P/13/20).

Bengtsson, H. (2021). A unifying framework for parallel and distributed processing in R using futures. *The R Journal (Accepted Article)*. https://journal.r-project.org/archive/2021/RJ-2021-048/index.html

Orozco-Acosta, E., Adin, A., & Ugarte, M. D. (2021). Scalable Bayesian modeling for smoothing disease mapping risks in large spatial data sets using INLA. *Spatial Statistics*, *41*, 100496. https://doi.org/10.1016/j.spasta.2021.100496

Orozco-Acosta, E., Adin, A., & Ugarte, M. D. (2022). *Parallel and distributed Bayesian modelling for analysing high-dimensional spatio-temporal count data*. https://arxiv.org/abs/2201.08323