Skip to contents

Fit MOFAFlex models from R via basilisk + reticulate.

MofaflexR provides an R-native bridge between Bioconductor assay containers and the Python MOFAFlex package for multi-omics factor analysis. The package converts assay matrices from SummarizedExperiment / SingleCellExperiment objects to NumPy arrays or SciPy sparse matrices using matrix-type-aware dispatch, and zero-copy is used where possible depending on the matrix representation:

  • Sparse matrices stored as dgCMatrix / dgRMatrix are bridged to the matching SciPy format by exposing the backing slot arrays via the C buffer protocol (flags.owndata = False).
  • Dense matrices (dgeMatrix, base matrix) are converted to Fortran-order NumPy arrays via a buffer-protocol path that avoids copying where possible.
  • COO sparse arrays (SparseArray::COO_SparseMatrix) are supported; value arrays are zero-copy, coordinate arrays require a 1→0 base conversion.

The Python environment (numpy, scipy, anndata, mofaflex) is managed automatically by basilisk; users do not need to install Python manually.

MOFAFlex was developed by:

Qoku A, Rohbeck M, Walter FC, Kats I, Stegle O, and Buettner F.

Citation

Qoku A, Rohbeck M, Walter FC, Kats I, Stegle O, Buettner F.
MOFA-FLEX: A Factor Model Framework for Integrating Omics Data with Prior Knowledge.
bioRxiv (2025).
https://doi.org/10.1101/2025.11.03.686250

Documentation

Official MOFAFlex documentation:
https://mofaflex.readthedocs.io

License

MOFAFlex is licensed under the BSD 3-Clause License.

The docs can be found under
Documentation

Installation

# Bioconductor + CRAN dependencies
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(c("SingleCellExperiment", "basilisk", "basilisk.utils",
                        "anndataR"))

# Install MofaflexR from source (once on CRAN/Bioconductor, use that)
# remotes::install_github("MaximilianNuber/MofaflexR")

Quick start

library(MofaflexR)
library(SingleCellExperiment)
library(Matrix)

# ----- 1. Build a toy SCE -----
set.seed(42)
counts <- rsparsematrix(200, 50, density = 0.1, repr = "C")
counts <- abs(round(counts))   # non-negative integer counts
sce <- SingleCellExperiment(assays = list(counts = counts))
colnames(sce) <- paste0("cell", seq_len(ncol(sce)))
rownames(sce) <- paste0("gene", seq_len(nrow(sce)))

# ----- 2. Convert to ReticulateAnnData (matrix-aware bridge) -----------
# The dgCMatrix counts assay is identified as sparse and is bridged to
# scipy.sparse.csc_matrix via zero-copy slot views (owndata = FALSE).
# A dense assay would be converted to a NumPy ndarray instead.
rada <- sce_to_reticulate_anndata(sce)
print(rada)
#> ReticulateAnnData object with n_obs × n_vars = 50 × 200

# ----- 3. Access the Python AnnData directly -----
py_adata <- rada$py_anndata()

sp <- reticulate::import("scipy.sparse", convert = FALSE)
cat("Shape:   ", reticulate::py_to_r(py_adata$shape)[[1]], "x",
    reticulate::py_to_r(py_adata$shape)[[2]], "\n")
cat("Sparse:  ", reticulate::py_to_r(sp$issparse(py_adata$X)), "\n")
cat("owndata: ", reticulate::py_to_r(py_adata$X$data$flags$owndata), "\n")
# Shape:    50 x 200
# Sparse:   TRUE
# owndata:  FALSE   ← no copy of the underlying data arrays

Low-level assay conversion

The package dispatches by matrix class. For a dgCMatrix assay, the low-level helper sce_assay_to_scipy_csc() returns a csc_matrix in features × cells orientation (matching the R assay):

# Returns a scipy.sparse.csc_matrix (shape = genes × cells)
# Other sparse classes (dgRMatrix, COO_SparseMatrix) are handled
# by analogous internal converters.
csc <- sce_assay_to_scipy_csc(sce, assay = "counts")

shape_r <- reticulate::py_to_r(csc$shape)
cat("Shape:", shape_r[[1]], "x", shape_r[[2]], "\n")  # 200 x 50
cat("nnz:  ", reticulate::py_to_r(csc$nnz), "\n")

For general dispatch across all supported matrix types, the package internally uses se_assay_to_python_matrix(). This function accepts any SummarizedExperiment subclass, extracts the named assay, and routes it to the appropriate NumPy / SciPy converter.

Using the basilisk-managed environment

result <- with_mofaflex_env({
  mf <- reticulate::import("mofaflex", convert = FALSE)
  # ... fit model, return R-friendly result
})

Fit a MOFA-FLEX model

fit_mofaflex() converts an SCE (or an existing ReticulateAnnData / ReticulateMuData) to the required Python data structure, builds and trains a MOFA-FLEX model, and optionally saves the result in MOFA2-compatible HDF5 format.

library(MofaflexR)
library(muscData)   # BiocManager::install("muscData")
library(scran);  library(scuttle)

# Build SCE and normalise (see vignette for full QC)
sce <- Kang18_8vs8()
sce <- sce[, sce$multiplets == "singlet" & !is.na(sce$cell)]
sce <- scuttle::logNormCounts(sce)
hvgs <- scran::getTopHVGs(scran::modelGeneVar(sce), n = 3000)
sce_hvg <- sce[hvgs, ]

# Zero-copy bridge to Python AnnData
adata <- sce_to_reticulate_anndata(sce_hvg, assay = "logcounts")

# Train MOFA-FLEX with two groups (ctrl / stim)
# Passing a single AnnData with group_by automatically wraps it in MuData.
model_path <- tempfile(fileext = ".hdf5")
model <- fit_mofaflex(
  data             = adata,
  data_options     = list(group_by           = "stim",
                          scale_per_group    = TRUE,
                          subset_var         = NULL,
                          plot_data_overview = FALSE),
  model_options    = list(n_factors    = 10L,
                          weight_prior = "Horseshoe"),
  training_options = list(max_epochs             = 5000L,
                          early_stopper_patience = 50L,
                          seed                   = 42L,
                          device                 = "cpu",
                          save_path              = model_path),
  mofa_compat      = "full"   # MOFA2-readable HDF5
)

# Load into MOFA2 for downstream analysis
mofa <- load_mofaflex_model(model_path)
MOFA2::plot_variance_explained(mofa)

Tip. If group_by is specified and data is a single AnnData, fit_mofaflex() automatically wraps it in a mudata.MuData and promotes the grouping column to the top-level obs. The wrapped modality is named rna automatically.


ReticulateMuData: wrapping Python MuData objects

mudata is the Python multi-modal data container used by the scverse ecosystem. ReticulateMuData is an R6 class that wraps a Python mudata.MuData object, mirroring the design of anndataR::ReticulateAnnData.

Critical invariant

Any AnnData extracted from a ReticulateMuData is returned as an anndataR::ReticulateAnnData.

This means scanpy/muon functions receiving rna$py_anndata() work without any extra conversion step.

Quick example

library(MofaflexR)
library(reticulate)

# Import Python modules (convert = FALSE keeps objects Python-native)
mu  <- py_mudata()                             # mudata module
ad  <- import("anndata",       convert = FALSE)
sp  <- import("scipy.sparse",  convert = FALSE)

# Build two AnnData objects
X1  <- sp$csr_matrix(r_to_py(matrix(1:12, 3L, 4L)))
X2  <- sp$csr_matrix(r_to_py(matrix(1:6,  3L, 2L)))
a1  <- ad$AnnData(X = X1)
a2  <- ad$AnnData(X = X2)
a1$obs_names <- r_to_py(c("obs1", "obs2", "obs3"))
a2$obs_names <- r_to_py(c("obs1", "obs2", "obs3"))

# Combine into MuData
py_md <- mu$MuData(py_dict(list("rna" = a1, "prot" = a2), convert = FALSE))

# Wrap in ReticulateMuData
mdata <- ReticulateMuData$new(py_md)
print(mdata)
# ReticulateMuData
#   n_obs     : 3
#   modalities: rna, prot
#     rna:           3 obs x 4 vars
#     prot:          3 obs x 2 vars

# Access a modality – returns ReticulateAnnData automatically
rna <- mdata[["rna"]]
inherits(rna, "ReticulateAnnData")   # TRUE
rna$n_vars()                         # 4L

# Assign a new modality (no data copy)
mdata[["rna"]] <- new_rna_anndata   # accepts ReticulateAnnData or Python AnnData

# Call a Python method with automatic return-type wrapping
mdata_copy <- mdata$py_call("copy")
inherits(mdata_copy, "ReticulateMuData")  # TRUE

# Access the raw Python object for muon/scanpy functions
py_obj <- mdata$py_mudata()
# muon.tl.mofa(py_obj, ...)

ReticulateMuData API reference

Method / function Description
ReticulateMuData$new(x) Wrap a Python MuData or existing wrapper
mdata[["mod"]] Return modality as ReticulateAnnData
mdata[["mod"]] <- adata Assign modality (accepts ReticulateAnnData or Python AnnData)
mdata$py_mudata() Raw Python mudata.MuData object
mdata$modality_names() Character vector of modality names
mdata$n_obs() Number of observations
mdata$obs_names() Observation names
mdata$py_call(method, ...) Call Python method with wrapped return
is_reticulate_mudata(x) Logical test
as_reticulate_mudata(x) Coerce Python MuData or existing wrapper
reticulate_mudata(py_obj) Constructor alias
py_mudata() Import the Python mudata module (convert = FALSE)

Matrix-aware bridge: how zero-copy works and what can force copies

How it works

The package inspects the class of the assay matrix and routes it to the appropriate converter. The dispatch table is:

R matrix class Python target Zero-copy for data arrays?
matrix numpy.ndarray Yes (buffer protocol via r_to_py)
dgeMatrix numpy.ndarray Yes (@x slot + Fortran reshape)
dgCMatrix scipy.sparse.csc_matrix Yes (@x, @i, @p slot views)
dgRMatrix scipy.sparse.csr_matrix Yes (@x, @j, @p slot views)
COO_SparseMatrix scipy.sparse.coo_matrix Values yes; coordinates no (1→0 base conversion)
Other csc_matrix or ndarray No (one R-level copy during coercion)

For the sparse paths, reticulate::np_array() uses the C buffer protocol to expose R slot vector memory to NumPy without allocation (numpy.ndarray.flags.owndata == False). SciPy stores references to these arrays at construction time; it does not copy them.

The resulting Python object has shape (n_features, n_cells) — matching the R assay. sce_to_anndata() then transposes it once via .T to obtain AnnData’s expected obs × vars = cells × features layout. For SciPy sparse matrices, .T is itself a zero-copy operation.

What CAN force a copy

Trigger Why
Assay class has no direct converter (fallback) as(mat, "dgCMatrix") allocates a new R object
COO coordinates 1 → 0 base conversion requires new integer vectors
Dense integer matrix storage.mode(x) <- "double" copies
obs / var metadata → pandas DataFrame R columnar data.frame → Python row-oriented; unavoidable
py_to_r(adata$X) reticulate converts to a dense R matrix
adata.X.toarray() or adata.X.todense() in Python Explicit densification
.copy() on any NumPy or SciPy object Explicit copy
Operations requiring sorted indices in SciPy May create new index arrays

Lifetime caveat

When zero-copy conversion is used, the Python object holds a borrowed reference to the R vectors backing the matrix slots. You must ensure that:

  1. The source R matrix object (and therefore its slots) remains alive while Python is using the converted object.
  2. No R code modifies the vectors in-place (copy-on-modify semantics usually prevent this for normal R usage).
  3. The R garbage collector has not freed the underlying memory.

In practice, as long as the call site holds a reference to sce (or the extracted matrix) within the same R session, the memory is safe.


API reference

Function Description
sce_to_anndata(x, assay, obs, var, ...) SummarizedExperiment / SCE → Python anndata.AnnData; dispatches by matrix class; transposed to cells × features
sce_to_reticulate_anndata(x, assay, ...) Wraps sce_to_anndata() result in anndataR::ReticulateAnnData
sce_assay_to_scipy_csc(x, assay) Low-level: extract dgCMatrix assay → scipy.sparse.csc_matrix (features × cells, zero-copy slots)
fit_mofaflex(data, data_options, model_options, training_options, mofa_compat, ...) Fit a MOFA-FLEX model; plain AnnData is auto-wrapped in MuData
load_mofaflex_model(path, ...) Load a mofaflex HDF5 into MOFA2, patching HDF5 format differences
with_mofaflex_env(expr) Evaluate expr inside the managed Python env
ReticulateMuData$new(x) Wrap Python mudata.MuDataReticulateMuData
is_reticulate_mudata(x) Logical: is x a ReticulateMuData?
as_reticulate_mudata(x) Coerce Python MuData or existing wrapper
reticulate_mudata(py_obj) Constructor alias for ReticulateMuData$new
py_mudata() Import Python mudata module (convert = FALSE)

Session info / reproducibility

sessionInfo()
reticulate::py_config()