Fit MOFAFlex Models from R via basilisk and reticulate • MofaflexR

Fit MOFAFlex models from R via basilisk + reticulate.

MofaflexR provides an R-native bridge between Bioconductor assay containers and the Python MOFAFlex package for multi-omics factor analysis. The package converts assay matrices from SummarizedExperiment / SingleCellExperiment objects to NumPy arrays or SciPy sparse matrices using matrix-type-aware dispatch, and zero-copy is used where possible depending on the matrix representation:

Sparse matrices stored as dgCMatrix / dgRMatrix are bridged to the matching SciPy format by exposing the backing slot arrays via the C buffer protocol (flags.owndata = False).
Dense matrices (dgeMatrix, base matrix) are converted to Fortran-order NumPy arrays via a buffer-protocol path that avoids copying where possible.
COO sparse arrays (SparseArray::COO_SparseMatrix) are supported; value arrays are zero-copy, coordinate arrays require a 1→0 base conversion.

The Python environment (numpy, scipy, anndata, mofaflex) is managed automatically by basilisk; users do not need to install Python manually.

MOFAFlex was developed by:

Qoku A, Rohbeck M, Walter FC, Kats I, Stegle O, and Buettner F.

Citation

Qoku A, Rohbeck M, Walter FC, Kats I, Stegle O, Buettner F.
MOFA-FLEX: A Factor Model Framework for Integrating Omics Data with Prior Knowledge.
bioRxiv (2025).
https://doi.org/10.1101/2025.11.03.686250

Documentation

Official MOFAFlex documentation:
https://mofaflex.readthedocs.io

License

MOFAFlex is licensed under the BSD 3-Clause License.

The docs can be found under

Installation

# Bioconductor + CRAN dependencies
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(c("SingleCellExperiment", "basilisk", "basilisk.utils",
                        "anndataR"))

# Install MofaflexR from source (once on CRAN/Bioconductor, use that)
# remotes::install_github("MaximilianNuber/MofaflexR")

Quick start

library(MofaflexR)
library(SingleCellExperiment)
library(Matrix)

# ----- 1. Build a toy SCE -----
set.seed(42)
counts <- rsparsematrix(200, 50, density = 0.1, repr = "C")
counts <- abs(round(counts))   # non-negative integer counts
sce <- SingleCellExperiment(assays = list(counts = counts))
colnames(sce) <- paste0("cell", seq_len(ncol(sce)))
rownames(sce) <- paste0("gene", seq_len(nrow(sce)))

# ----- 2. Convert to ReticulateAnnData (matrix-aware bridge) -----------
# The dgCMatrix counts assay is identified as sparse and is bridged to
# scipy.sparse.csc_matrix via zero-copy slot views (owndata = FALSE).
# A dense assay would be converted to a NumPy ndarray instead.
rada <- sce_to_reticulate_anndata(sce)
print(rada)
#> ReticulateAnnData object with n_obs × n_vars = 50 × 200

# ----- 3. Access the Python AnnData directly -----
py_adata <- rada$py_anndata()

sp <- reticulate::import("scipy.sparse", convert = FALSE)
cat("Shape:   ", reticulate::py_to_r(py_adata$shape)[[1]], "x",
    reticulate::py_to_r(py_adata$shape)[[2]], "\n")
cat("Sparse:  ", reticulate::py_to_r(sp$issparse(py_adata$X)), "\n")
cat("owndata: ", reticulate::py_to_r(py_adata$X$data$flags$owndata), "\n")
# Shape:    50 x 200
# Sparse:   TRUE
# owndata:  FALSE   ← no copy of the underlying data arrays

Low-level assay conversion

The package dispatches by matrix class. For a dgCMatrix assay, the low-level helper sce_assay_to_scipy_csc() returns a csc_matrix in features × cells orientation (matching the R assay):

# Returns a scipy.sparse.csc_matrix (shape = genes × cells)
# Other sparse classes (dgRMatrix, COO_SparseMatrix) are handled
# by analogous internal converters.
csc <- sce_assay_to_scipy_csc(sce, assay = "counts")

shape_r <- reticulate::py_to_r(csc$shape)
cat("Shape:", shape_r[[1]], "x", shape_r[[2]], "\n")  # 200 x 50
cat("nnz:  ", reticulate::py_to_r(csc$nnz), "\n")

For general dispatch across all supported matrix types, the package internally uses se_assay_to_python_matrix(). This function accepts any SummarizedExperiment subclass, extracts the named assay, and routes it to the appropriate NumPy / SciPy converter.

Using the basilisk-managed environment

result <- with_mofaflex_env({
  mf <- reticulate::import("mofaflex", convert = FALSE)
  # ... fit model, return R-friendly result
})

Fit a MOFA-FLEX model

fit_mofaflex() converts an SCE (or an existing ReticulateAnnData / ReticulateMuData) to the required Python data structure, builds and trains a MOFA-FLEX model, and optionally saves the result in MOFA2-compatible HDF5 format.

library(MofaflexR)
library(muscData)   # BiocManager::install("muscData")
library(scran);  library(scuttle)

# Build SCE and normalise (see vignette for full QC)
sce <- Kang18_8vs8()
sce <- sce[, sce$multiplets == "singlet" & !is.na(sce$cell)]
sce <- scuttle::logNormCounts(sce)
hvgs <- scran::getTopHVGs(scran::modelGeneVar(sce), n = 3000)
sce_hvg <- sce[hvgs, ]

# Zero-copy bridge to Python AnnData
adata <- sce_to_reticulate_anndata(sce_hvg, assay = "logcounts")

# Train MOFA-FLEX with two groups (ctrl / stim)
# Passing a single AnnData with group_by automatically wraps it in MuData.
model_path <- tempfile(fileext = ".hdf5")
model <- fit_mofaflex(
  data             = adata,
  data_options     = list(group_by           = "stim",
                          scale_per_group    = TRUE,
                          subset_var         = NULL,
                          plot_data_overview = FALSE),
  model_options    = list(n_factors    = 10L,
                          weight_prior = "Horseshoe"),
  training_options = list(max_epochs             = 5000L,
                          early_stopper_patience = 50L,
                          seed                   = 42L,
                          device                 = "cpu",
                          save_path              = model_path),
  mofa_compat      = "full"   # MOFA2-readable HDF5
)

# Load into MOFA2 for downstream analysis
mofa <- load_mofaflex_model(model_path)
MOFA2::plot_variance_explained(mofa)

Tip. If group_by is specified and data is a single AnnData, fit_mofaflex() automatically wraps it in a mudata.MuData and promotes the grouping column to the top-level obs. The wrapped modality is named rna automatically.

ReticulateMuData: wrapping Python MuData objects

mudata is the Python multi-modal data container used by the scverse ecosystem. ReticulateMuData is an R6 class that wraps a Python mudata.MuData object, mirroring the design of anndataR::ReticulateAnnData.

Critical invariant

Any AnnData extracted from a ReticulateMuData is returned as an anndataR::ReticulateAnnData.

This means scanpy/muon functions receiving rna$py_anndata() work without any extra conversion step.

Quick example

library(MofaflexR)
library(reticulate)

# Import Python modules (convert = FALSE keeps objects Python-native)
mu  <- py_mudata()                             # mudata module
ad  <- import("anndata",       convert = FALSE)
sp  <- import("scipy.sparse",  convert = FALSE)

# Build two AnnData objects
X1  <- sp$csr_matrix(r_to_py(matrix(1:12, 3L, 4L)))
X2  <- sp$csr_matrix(r_to_py(matrix(1:6,  3L, 2L)))
a1  <- ad$AnnData(X = X1)
a2  <- ad$AnnData(X = X2)
a1$obs_names <- r_to_py(c("obs1", "obs2", "obs3"))
a2$obs_names <- r_to_py(c("obs1", "obs2", "obs3"))

# Combine into MuData
py_md <- mu$MuData(py_dict(list("rna" = a1, "prot" = a2), convert = FALSE))

# Wrap in ReticulateMuData
mdata <- ReticulateMuData$new(py_md)
print(mdata)
# ReticulateMuData
#   n_obs     : 3
#   modalities: rna, prot
#     rna:           3 obs x 4 vars
#     prot:          3 obs x 2 vars

# Access a modality – returns ReticulateAnnData automatically
rna <- mdata[["rna"]]
inherits(rna, "ReticulateAnnData")   # TRUE
rna$n_vars()                         # 4L

# Assign a new modality (no data copy)
mdata[["rna"]] <- new_rna_anndata   # accepts ReticulateAnnData or Python AnnData

# Call a Python method with automatic return-type wrapping
mdata_copy <- mdata$py_call("copy")
inherits(mdata_copy, "ReticulateMuData")  # TRUE

# Access the raw Python object for muon/scanpy functions
py_obj <- mdata$py_mudata()
# muon.tl.mofa(py_obj, ...)

ReticulateMuData API reference

Method / function	Description
`ReticulateMuData$new(x)`	Wrap a Python `MuData` or existing wrapper
`mdata[["mod"]]`	Return modality as `ReticulateAnnData`
`mdata[["mod"]] <- adata`	Assign modality (accepts `ReticulateAnnData` or Python AnnData)
`mdata$py_mudata()`	Raw Python `mudata.MuData` object
`mdata$modality_names()`	Character vector of modality names
`mdata$n_obs()`	Number of observations
`mdata$obs_names()`	Observation names
`mdata$py_call(method, ...)`	Call Python method with wrapped return
`is_reticulate_mudata(x)`	Logical test
`as_reticulate_mudata(x)`	Coerce Python MuData or existing wrapper
`reticulate_mudata(py_obj)`	Constructor alias
`py_mudata()`	Import the Python `mudata` module (`convert = FALSE`)

Matrix-aware bridge: how zero-copy works and what can force copies

How it works

The package inspects the class of the assay matrix and routes it to the appropriate converter. The dispatch table is:

R matrix class	Python target	Zero-copy for data arrays?
`matrix`	`numpy.ndarray`	Yes (buffer protocol via `r_to_py`)
`dgeMatrix`	`numpy.ndarray`	Yes (`@x` slot + Fortran reshape)
`dgCMatrix`	`scipy.sparse.csc_matrix`	Yes (`@x`, `@i`, `@p` slot views)
`dgRMatrix`	`scipy.sparse.csr_matrix`	Yes (`@x`, `@j`, `@p` slot views)
`COO_SparseMatrix`	`scipy.sparse.coo_matrix`	Values yes; coordinates no (1→0 base conversion)
Other	`csc_matrix` or `ndarray`	No (one R-level copy during coercion)

For the sparse paths, reticulate::np_array() uses the C buffer protocol to expose R slot vector memory to NumPy without allocation (numpy.ndarray.flags.owndata == False). SciPy stores references to these arrays at construction time; it does not copy them.

The resulting Python object has shape (n_features, n_cells) — matching the R assay. sce_to_anndata() then transposes it once via .T to obtain AnnData’s expected obs × vars = cells × features layout. For SciPy sparse matrices, .T is itself a zero-copy operation.

What CAN force a copy

Trigger	Why
Assay class has no direct converter (fallback)	`as(mat, "dgCMatrix")` allocates a new R object
COO coordinates	1 → 0 base conversion requires new integer vectors
Dense `integer` matrix	`storage.mode(x) <- "double"` copies
`obs` / `var` metadata → pandas `DataFrame`	R columnar `data.frame` → Python row-oriented; unavoidable
`py_to_r(adata$X)`	reticulate converts to a dense R matrix
`adata.X.toarray()` or `adata.X.todense()` in Python	Explicit densification
`.copy()` on any NumPy or SciPy object	Explicit copy
Operations requiring sorted indices in SciPy	May create new index arrays

Lifetime caveat

When zero-copy conversion is used, the Python object holds a borrowed reference to the R vectors backing the matrix slots. You must ensure that:

The source R matrix object (and therefore its slots) remains alive while Python is using the converted object.
No R code modifies the vectors in-place (copy-on-modify semantics usually prevent this for normal R usage).
The R garbage collector has not freed the underlying memory.

In practice, as long as the call site holds a reference to sce (or the extracted matrix) within the same R session, the memory is safe.

API reference

Function	Description
`sce_to_anndata(x, assay, obs, var, ...)`	`SummarizedExperiment` / SCE → Python `anndata.AnnData`; dispatches by matrix class; transposed to cells × features
`sce_to_reticulate_anndata(x, assay, ...)`	Wraps `sce_to_anndata()` result in `anndataR::ReticulateAnnData`
`sce_assay_to_scipy_csc(x, assay)`	Low-level: extract `dgCMatrix` assay → `scipy.sparse.csc_matrix` (features × cells, zero-copy slots)
`fit_mofaflex(data, data_options, model_options, training_options, mofa_compat, ...)`	Fit a MOFA-FLEX model; plain `AnnData` is auto-wrapped in `MuData`
`load_mofaflex_model(path, ...)`	Load a mofaflex HDF5 into MOFA2, patching HDF5 format differences
`with_mofaflex_env(expr)`	Evaluate `expr` inside the managed Python env
`ReticulateMuData$new(x)`	Wrap Python `mudata.MuData` → `ReticulateMuData`
`is_reticulate_mudata(x)`	Logical: is `x` a `ReticulateMuData`?
`as_reticulate_mudata(x)`	Coerce Python MuData or existing wrapper
`reticulate_mudata(py_obj)`	Constructor alias for `ReticulateMuData$new`
`py_mudata()`	Import Python `mudata` module (`convert = FALSE`)

Session info / reproducibility

sessionInfo()
reticulate::py_config()