Title: | Integrative Analysis of Several Related Data Matrices |
---|---|
Description: | A generalization of principal component analysis for integrative analysis. The method finds principal components that describe single matrices or that are common to several matrices. The solutions are sparse. Rank of solutions is automatically selected using cross validation. The method is described in Kallus et al. (2019) <arXiv:1911.04927>. |
Authors: | Jonatan Kallus [aut], Felix Held [ctb, cre] |
Maintainer: | Felix Held <[email protected]> |
License: | GPL (>= 3) |
Version: | 2.0.3 |
Built: | 2024-11-05 03:11:08 UTC |
Source: | https://github.com/cyianor/mmpca |
Analyzes several related matrices of data.
mmpca( x, inds, k, lambda = NULL, trace = 0, max_iter = 20000, init_theta = NULL, cachepath = NULL, enable_rank_selection = TRUE, enable_sparsity = TRUE, enable_variable_selection = FALSE, parallel = TRUE )
mmpca( x, inds, k, lambda = NULL, trace = 0, max_iter = 20000, init_theta = NULL, cachepath = NULL, enable_rank_selection = TRUE, enable_sparsity = TRUE, enable_variable_selection = FALSE, parallel = TRUE )
x |
List of matrices to analyze |
inds |
Matrix containing view indices. The matrix should have two
columns and the same number of rows as the length of |
k |
Integer giving the maximum rank of the analysis, i.e. the maximum number of principal components for each view. |
lambda |
Vector or matrix of lambda values. The length (or width if it
is a matrix) depends on the number of active penalties (2, 3 or 4). If it
is a matrix, try different lambda values (one try for each row). Default: a
matrix where each column is the sequence |
trace |
Integer selecting the amount of log messages. 0 (default): no output, 3: all output. |
max_iter |
Maximum number of iterations |
init_theta |
NULL, functions or numeric. NULL (default) use initial
values based on ordinary SVD. If init_theta is a list of three functions
( |
cachepath |
Character vector with path to directory to store
intermediate results. If NULL (default) intermediate results are not
stored. For caching to work it is required that the random number
generation seed is constant between calls to mmpca, so |
enable_rank_selection |
Boolean deciding if the second penalty that imposes a low rank model should be enabled. |
enable_sparsity |
Boolean deciding if the third penalty that imposes sparsity in V should be enabled. |
enable_variable_selection |
Boolean deciding if the fourth penalty that increases the tendency for sparsity structure of different V columns to be similar. Defaults to FALSE meaning this penalty is not used. |
parallel |
Boolean deciding if computations should be run on multiple cores simultaneously. |
A list with components
initial |
initial values used in optimization |
cmf |
solution found with CMF (if init_theta == c(CMF, matrix_to_triplets, getCMFopts)) |
training |
solutions for different values of lambda |
solution |
solution for optimal lambda value |
Jonatan Kallus, [email protected]
# Create model with three views, two data matrices of low-rank 3 max_rank <- 3 v <- list( qr.Q(qr(matrix(rnorm(10 * max_rank), 10, max_rank))), qr.Q(qr(matrix(rnorm(11 * max_rank), 11, max_rank))), qr.Q(qr(matrix(rnorm(12 * max_rank), 12, max_rank))) ) d <- matrix( c(1, 1, 1, 1, 1, 0, 1, 0, 1), nrow = max_rank, ncol = 3 ) x <- list( v[[1]] %*% diag(d[, 1] * d[, 2]) %*% t(v[[2]]), v[[1]] %*% diag(d[, 1] * d[, 3]) %*% t(v[[3]]) ) inds <- matrix(c(1, 1, 2, 3), 2, 2) result <- mmpca::mmpca( x, inds, max_rank, parallel = FALSE, lambda = c(1e-3, 1e-5), enable_sparsity = FALSE, trace = 3 ) # Investigate the solution result$solution$D
# Create model with three views, two data matrices of low-rank 3 max_rank <- 3 v <- list( qr.Q(qr(matrix(rnorm(10 * max_rank), 10, max_rank))), qr.Q(qr(matrix(rnorm(11 * max_rank), 11, max_rank))), qr.Q(qr(matrix(rnorm(12 * max_rank), 12, max_rank))) ) d <- matrix( c(1, 1, 1, 1, 1, 0, 1, 0, 1), nrow = max_rank, ncol = 3 ) x <- list( v[[1]] %*% diag(d[, 1] * d[, 2]) %*% t(v[[2]]), v[[1]] %*% diag(d[, 1] * d[, 3]) %*% t(v[[3]]) ) inds <- matrix(c(1, 1, 2, 3), 2, 2) result <- mmpca::mmpca( x, inds, max_rank, parallel = FALSE, lambda = c(1e-3, 1e-5), enable_sparsity = FALSE, trace = 3 ) # Investigate the solution result$solution$D