Fit MOFAFlex models from R via basilisk + reticulate.
MofaflexR provides an R-native bridge between Bioconductor assay containers and the Python MOFAFlex package for multi-omics factor analysis. The package converts assay matrices from SummarizedExperiment / SingleCellExperiment objects to NumPy arrays or SciPy sparse matrices using matrix-type-aware dispatch, and zero-copy is used where possible depending on the matrix representation:
- Sparse matrices stored as
dgCMatrix/dgRMatrixare bridged to the matching SciPy format by exposing the backing slot arrays via the C buffer protocol (flags.owndata = False). - Dense matrices (
dgeMatrix, basematrix) are converted to Fortran-order NumPy arrays via a buffer-protocol path that avoids copying where possible. - COO sparse arrays (
SparseArray::COO_SparseMatrix) are supported; value arrays are zero-copy, coordinate arrays require a 1→0 base conversion.
The Python environment (numpy, scipy, anndata, mofaflex) is managed automatically by basilisk; users do not need to install Python manually.
MOFAFlex was developed by:
Qoku A, Rohbeck M, Walter FC, Kats I, Stegle O, and Buettner F.
Citation
Qoku A, Rohbeck M, Walter FC, Kats I, Stegle O, Buettner F.
MOFA-FLEX: A Factor Model Framework for Integrating Omics Data with Prior Knowledge.
bioRxiv (2025).
https://doi.org/10.1101/2025.11.03.686250
Documentation
Official MOFAFlex documentation:
https://mofaflex.readthedocs.io
Installation
# Bioconductor + CRAN dependencies
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("SingleCellExperiment", "basilisk", "basilisk.utils",
"anndataR"))
# Install MofaflexR from source (once on CRAN/Bioconductor, use that)
# remotes::install_github("MaximilianNuber/MofaflexR")Quick start
library(MofaflexR)
library(SingleCellExperiment)
library(Matrix)
# ----- 1. Build a toy SCE -----
set.seed(42)
counts <- rsparsematrix(200, 50, density = 0.1, repr = "C")
counts <- abs(round(counts)) # non-negative integer counts
sce <- SingleCellExperiment(assays = list(counts = counts))
colnames(sce) <- paste0("cell", seq_len(ncol(sce)))
rownames(sce) <- paste0("gene", seq_len(nrow(sce)))
# ----- 2. Convert to ReticulateAnnData (matrix-aware bridge) -----------
# The dgCMatrix counts assay is identified as sparse and is bridged to
# scipy.sparse.csc_matrix via zero-copy slot views (owndata = FALSE).
# A dense assay would be converted to a NumPy ndarray instead.
rada <- sce_to_reticulate_anndata(sce)
print(rada)
#> ReticulateAnnData object with n_obs × n_vars = 50 × 200
# ----- 3. Access the Python AnnData directly -----
py_adata <- rada$py_anndata()
sp <- reticulate::import("scipy.sparse", convert = FALSE)
cat("Shape: ", reticulate::py_to_r(py_adata$shape)[[1]], "x",
reticulate::py_to_r(py_adata$shape)[[2]], "\n")
cat("Sparse: ", reticulate::py_to_r(sp$issparse(py_adata$X)), "\n")
cat("owndata: ", reticulate::py_to_r(py_adata$X$data$flags$owndata), "\n")
# Shape: 50 x 200
# Sparse: TRUE
# owndata: FALSE ← no copy of the underlying data arraysLow-level assay conversion
The package dispatches by matrix class. For a dgCMatrix assay, the low-level helper sce_assay_to_scipy_csc() returns a csc_matrix in features × cells orientation (matching the R assay):
# Returns a scipy.sparse.csc_matrix (shape = genes × cells)
# Other sparse classes (dgRMatrix, COO_SparseMatrix) are handled
# by analogous internal converters.
csc <- sce_assay_to_scipy_csc(sce, assay = "counts")
shape_r <- reticulate::py_to_r(csc$shape)
cat("Shape:", shape_r[[1]], "x", shape_r[[2]], "\n") # 200 x 50
cat("nnz: ", reticulate::py_to_r(csc$nnz), "\n")For general dispatch across all supported matrix types, the package internally uses se_assay_to_python_matrix(). This function accepts any SummarizedExperiment subclass, extracts the named assay, and routes it to the appropriate NumPy / SciPy converter.
Using the basilisk-managed environment
result <- with_mofaflex_env({
mf <- reticulate::import("mofaflex", convert = FALSE)
# ... fit model, return R-friendly result
})Fit a MOFA-FLEX model
fit_mofaflex() converts an SCE (or an existing ReticulateAnnData / ReticulateMuData) to the required Python data structure, builds and trains a MOFA-FLEX model, and optionally saves the result in MOFA2-compatible HDF5 format.
library(MofaflexR)
library(muscData) # BiocManager::install("muscData")
library(scran); library(scuttle)
# Build SCE and normalise (see vignette for full QC)
sce <- Kang18_8vs8()
sce <- sce[, sce$multiplets == "singlet" & !is.na(sce$cell)]
sce <- scuttle::logNormCounts(sce)
hvgs <- scran::getTopHVGs(scran::modelGeneVar(sce), n = 3000)
sce_hvg <- sce[hvgs, ]
# Zero-copy bridge to Python AnnData
adata <- sce_to_reticulate_anndata(sce_hvg, assay = "logcounts")
# Train MOFA-FLEX with two groups (ctrl / stim)
# Passing a single AnnData with group_by automatically wraps it in MuData.
model_path <- tempfile(fileext = ".hdf5")
model <- fit_mofaflex(
data = adata,
data_options = list(group_by = "stim",
scale_per_group = TRUE,
subset_var = NULL,
plot_data_overview = FALSE),
model_options = list(n_factors = 10L,
weight_prior = "Horseshoe"),
training_options = list(max_epochs = 5000L,
early_stopper_patience = 50L,
seed = 42L,
device = "cpu",
save_path = model_path),
mofa_compat = "full" # MOFA2-readable HDF5
)
# Load into MOFA2 for downstream analysis
mofa <- load_mofaflex_model(model_path)
MOFA2::plot_variance_explained(mofa)Tip. If
group_byis specified anddatais a singleAnnData,fit_mofaflex()automatically wraps it in amudata.MuDataand promotes the grouping column to the top-levelobs. The wrapped modality is namedrnaautomatically.
ReticulateMuData: wrapping Python MuData objects
mudata is the Python multi-modal data container used by the scverse ecosystem. ReticulateMuData is an R6 class that wraps a Python mudata.MuData object, mirroring the design of anndataR::ReticulateAnnData.
Critical invariant
Any
AnnDataextracted from aReticulateMuDatais returned as ananndataR::ReticulateAnnData.
This means scanpy/muon functions receiving rna$py_anndata() work without any extra conversion step.
Quick example
library(MofaflexR)
library(reticulate)
# Import Python modules (convert = FALSE keeps objects Python-native)
mu <- py_mudata() # mudata module
ad <- import("anndata", convert = FALSE)
sp <- import("scipy.sparse", convert = FALSE)
# Build two AnnData objects
X1 <- sp$csr_matrix(r_to_py(matrix(1:12, 3L, 4L)))
X2 <- sp$csr_matrix(r_to_py(matrix(1:6, 3L, 2L)))
a1 <- ad$AnnData(X = X1)
a2 <- ad$AnnData(X = X2)
a1$obs_names <- r_to_py(c("obs1", "obs2", "obs3"))
a2$obs_names <- r_to_py(c("obs1", "obs2", "obs3"))
# Combine into MuData
py_md <- mu$MuData(py_dict(list("rna" = a1, "prot" = a2), convert = FALSE))
# Wrap in ReticulateMuData
mdata <- ReticulateMuData$new(py_md)
print(mdata)
# ReticulateMuData
# n_obs : 3
# modalities: rna, prot
# rna: 3 obs x 4 vars
# prot: 3 obs x 2 vars
# Access a modality – returns ReticulateAnnData automatically
rna <- mdata[["rna"]]
inherits(rna, "ReticulateAnnData") # TRUE
rna$n_vars() # 4L
# Assign a new modality (no data copy)
mdata[["rna"]] <- new_rna_anndata # accepts ReticulateAnnData or Python AnnData
# Call a Python method with automatic return-type wrapping
mdata_copy <- mdata$py_call("copy")
inherits(mdata_copy, "ReticulateMuData") # TRUE
# Access the raw Python object for muon/scanpy functions
py_obj <- mdata$py_mudata()
# muon.tl.mofa(py_obj, ...)ReticulateMuData API reference
| Method / function | Description |
|---|---|
ReticulateMuData$new(x) |
Wrap a Python MuData or existing wrapper |
mdata[["mod"]] |
Return modality as ReticulateAnnData
|
mdata[["mod"]] <- adata |
Assign modality (accepts ReticulateAnnData or Python AnnData) |
mdata$py_mudata() |
Raw Python mudata.MuData object |
mdata$modality_names() |
Character vector of modality names |
mdata$n_obs() |
Number of observations |
mdata$obs_names() |
Observation names |
mdata$py_call(method, ...) |
Call Python method with wrapped return |
is_reticulate_mudata(x) |
Logical test |
as_reticulate_mudata(x) |
Coerce Python MuData or existing wrapper |
reticulate_mudata(py_obj) |
Constructor alias |
py_mudata() |
Import the Python mudata module (convert = FALSE) |
Matrix-aware bridge: how zero-copy works and what can force copies
How it works
The package inspects the class of the assay matrix and routes it to the appropriate converter. The dispatch table is:
| R matrix class | Python target | Zero-copy for data arrays? |
|---|---|---|
matrix |
numpy.ndarray |
Yes (buffer protocol via r_to_py) |
dgeMatrix |
numpy.ndarray |
Yes (@x slot + Fortran reshape) |
dgCMatrix |
scipy.sparse.csc_matrix |
Yes (@x, @i, @p slot views) |
dgRMatrix |
scipy.sparse.csr_matrix |
Yes (@x, @j, @p slot views) |
COO_SparseMatrix |
scipy.sparse.coo_matrix |
Values yes; coordinates no (1→0 base conversion) |
| Other |
csc_matrix or ndarray
|
No (one R-level copy during coercion) |
For the sparse paths, reticulate::np_array() uses the C buffer protocol to expose R slot vector memory to NumPy without allocation (numpy.ndarray.flags.owndata == False). SciPy stores references to these arrays at construction time; it does not copy them.
The resulting Python object has shape (n_features, n_cells) — matching the R assay. sce_to_anndata() then transposes it once via .T to obtain AnnData’s expected obs × vars = cells × features layout. For SciPy sparse matrices, .T is itself a zero-copy operation.
What CAN force a copy
| Trigger | Why |
|---|---|
| Assay class has no direct converter (fallback) |
as(mat, "dgCMatrix") allocates a new R object |
| COO coordinates | 1 → 0 base conversion requires new integer vectors |
Dense integer matrix |
storage.mode(x) <- "double" copies |
obs / var metadata → pandas DataFrame
|
R columnar data.frame → Python row-oriented; unavoidable |
py_to_r(adata$X) |
reticulate converts to a dense R matrix |
adata.X.toarray() or adata.X.todense() in Python |
Explicit densification |
.copy() on any NumPy or SciPy object |
Explicit copy |
| Operations requiring sorted indices in SciPy | May create new index arrays |
Lifetime caveat
When zero-copy conversion is used, the Python object holds a borrowed reference to the R vectors backing the matrix slots. You must ensure that:
- The source R matrix object (and therefore its slots) remains alive while Python is using the converted object.
- No R code modifies the vectors in-place (copy-on-modify semantics usually prevent this for normal R usage).
- The R garbage collector has not freed the underlying memory.
In practice, as long as the call site holds a reference to sce (or the extracted matrix) within the same R session, the memory is safe.
API reference
| Function | Description |
|---|---|
sce_to_anndata(x, assay, obs, var, ...) |
SummarizedExperiment / SCE → Python anndata.AnnData; dispatches by matrix class; transposed to cells × features |
sce_to_reticulate_anndata(x, assay, ...) |
Wraps sce_to_anndata() result in anndataR::ReticulateAnnData
|
sce_assay_to_scipy_csc(x, assay) |
Low-level: extract dgCMatrix assay → scipy.sparse.csc_matrix (features × cells, zero-copy slots) |
fit_mofaflex(data, data_options, model_options, training_options, mofa_compat, ...) |
Fit a MOFA-FLEX model; plain AnnData is auto-wrapped in MuData
|
load_mofaflex_model(path, ...) |
Load a mofaflex HDF5 into MOFA2, patching HDF5 format differences |
with_mofaflex_env(expr) |
Evaluate expr inside the managed Python env |
ReticulateMuData$new(x) |
Wrap Python mudata.MuData → ReticulateMuData
|
is_reticulate_mudata(x) |
Logical: is x a ReticulateMuData? |
as_reticulate_mudata(x) |
Coerce Python MuData or existing wrapper |
reticulate_mudata(py_obj) |
Constructor alias for ReticulateMuData$new
|
py_mudata() |
Import Python mudata module (convert = FALSE) |
Session info / reproducibility
sessionInfo()
reticulate::py_config()