Discovering Objective Principal Components Across Scientific Domains: Mathematical and Machine Learning Approaches

Co-created by the Catalyzer Think Tank divergent thinking and Gemini Deep Research tool.

I. Introduction: The Quest for Objective Structures Across Scientific Domains

The pursuit of scientific knowledge often involves seeking fundamental principles or structures that remain consistent regardless of the specific viewpoint, measurement technique, or domain of inquiry. The query regarding “objective principal components” probes deeply into this quest, extending beyond the conventional scope of Principal Component Analysis (PCA) as merely a tool for dimensionality reduction via variance maximization. Instead, it points towards identifying fundamental, invariant, or intrinsic features, structures, or generative principles that persist across disparate datasets, measurement modalities, theoretical perspectives, and even distinct scientific disciplines. This notion of “objectivity” implies invariance to the specific “lens”—be it a dataset, a model, or a domain—through which a phenomenon is observed. Such a pursuit is motivated by the desire to uncover unifying principles, facilitate knowledge transfer between fields, and construct more robust and generalizable models of complex phenomena, potentially revealing underlying “laws of nature”.1 However, this concept of objectivity requires careful operationalization, as the structures sought might manifest as invariant generative factors, conserved geometric or topological properties, or shared causal mechanisms.2 The concept of “objective principal components” thus transcends standard dimensionality reduction, suggesting a search for invariant generative factors or fundamental geometric/topological structures conserved across observational contexts.

This search is significantly complicated by the profound heterogeneity inherent in data gathered across the sciences. Datasets may differ fundamentally in type (e.g., imaging data, genomic sequences, interaction graphs, spatio-temporal point clouds), dimensionality, underlying geometric structure (Euclidean, hyperbolic, complex manifolds), and statistical distributions, leading to domain shifts that challenge standard machine learning assumptions.2 Data sources may exhibit distinct noise characteristics, possess varying degrees of domain-specific versus shared information 9, and potentially arise from different underlying causal processes. Examples mentioned in the query, such as hyperbolic 3-manifold lensing and mapping, exemplify this complexity, involving data derived from non-Euclidean geometries or intricate physical processes governed by general relativity.12 This heterogeneity necessitates methods specifically designed to be robust against distribution shifts, differing geometries, and the presence of both shared and domain-specific information; simply pooling data or applying standard algorithms is often inadequate and can even lead to degraded performance, a phenomenon known as negative transfer.2

Addressing this challenge requires leveraging sophisticated mathematical and machine learning paradigms. This report surveys a range of advanced techniques capable of potentially uncovering such objective structures. These include mathematical generalizations of linear methods like PCA designed for multiple datasets; machine learning approaches focused on learning domain-invariant representations and disentangling factors of variation; advanced non-linear dimensionality reduction and manifold learning techniques aimed at revealing intrinsic data geometry; methods grounded in information geometry, topological data analysis, and hyperbolic geometry that offer unique geometric perspectives; and strategies for evaluating the objectivity, invariance, or generalizability of the discovered components. The report will compare these approaches based on their theoretical underpinnings, inherent assumptions, computational demands, and ultimate suitability for the ambitious goal of discovering objective principal components across the diverse landscape of scientific inquiry.

II. Extracting Shared Variance: Mathematical Generalizations of PCA

Principal Component Analysis (PCA) serves as a foundational technique in multivariate data analysis, primarily employed for dimensionality reduction and data exploration.1 Its standard formulation seeks orthogonal directions, the principal components (PCs), that capture the maximum variance within a dataset. These high-variance PCs are often used to represent the data in a lower-dimensional subspace, preserving most of the information while reducing complexity.1 PCA finds applications across various domains, including preprocessing steps for more complex algorithms, such as denoising time series data before feeding it into deep learning models 18 or reducing dimensionality prior to applying methods like JIVE or information geometry techniques.10 Interestingly, the utility of PCA extends beyond variance maximization; the last PCs, representing directions of minimal variance, can be instrumental in discovering constant relationships or equations among variables, essentially identifying potential laws of nature.1 This highlights PCA’s flexibility and its characteristic of treating all variables symmetrically, unlike regression methods that require a predictor-response distinction.1

However, standard PCA operates on a single dataset. Discovering components objective across multiple datasets or perspectives necessitates generalizations that explicitly handle shared and distinct sources of variation.

Methods for Multiple Datasets

Several extensions of PCA have been developed to analyze multiple datasets simultaneously, aiming to identify common structures.

Common PCA / Consensus PCA: These approaches seek a single set of principal components that are shared or represent a consensus across multiple datasets, often derived from different views or domains.9 The underlying motivation aligns with the “consensus principle” in multi-view learning, which posits that multiple views should exhibit some degree of consistency, allowing for the effective capture of common characteristics and patterns.9 Research in multi-view clustering, for instance, aims to find shared partitions or common structures embedded within diverse data representations.20 Effectively integrating information across views requires balancing this consensus principle with the “complementary principle,” which acknowledges that different views also provide unique, non-redundant information.9 This balance is crucial for finding components that are general (consensus) without discarding valuable domain-specific insights (complementarity).
Joint and Individual Variation Explained (JIVE): JIVE provides a more structured decomposition for multiple datasets measured on the same set of subjects but potentially involving different sets of features (modalities).10 It explicitly separates the variation within each dataset into three distinct parts:

Joint Structure (Jk): Captures patterns of variation common across all datasets.
Individual Structure (Ak): Captures structured variation unique to each specific dataset.
Residual Noise (Ek): Represents unstructured variability.

Mathematically, for K datasets Xk (n subjects × pk features), the decomposition is Xk = Jk + Ak + Ek.10 A key assumption is that the joint and individual structures reside in orthogonal subspaces. The “common normalized score” representation makes this explicit: Xk = Z WJk + Bk WIk + Ek, where Z represents common subject scores in the joint subspace, Bk represents subject scores in the individual subspace unique to dataset k, and the loading matrices WJk and WIk relate these scores back to the original features.10 The orthogonality constraint BkᵀZ = 0 ensures the separation of joint and individual variation.10Several algorithms exist for JIVE estimation. The original R.JIVE uses an iterative procedure to estimate the joint and individual components.10 Angle-based JIVE (AJIVE) offers a non-iterative alternative using principal angle analysis between PCA-derived signal subspaces of the datasets to determine the joint rank and structure.10 More recently, Canonical JIVE (CJIVE) reinterprets AJIVE through the lens of Canonical Correlation Analysis (CCA) performed on the PC scores of the datasets.10 CJIVE views the joint scores as related to the average of canonical variables, providing a more intuitive understanding, a permutation test for determining the joint rank’s statistical significance, the ability to predict scores for new subjects, and computational efficiency.10JIVE has proven valuable in integrating multimodal data in fields like neuroimaging (combining fMRI and dMRI) 10 and genomics (integrating gene expression and genetic data).10 Recent work has focused on enhancing its computational efficiency for large-scale single-cell data by employing techniques like partial Singular Value Decomposition (SVD) and optimized matrix operations using libraries like RcppEigen.22

Theoretical Underpinnings and Limitations

These PCA generalizations offer a direct pathway to identifying shared linear subspaces, providing one operationalization of “objective components.” JIVE’s explicit modeling of joint versus individual variation is particularly appealing for disentangling shared phenomena from domain-specific effects.10

However, their effectiveness relies heavily on the assumption that the shared structure is indeed linear and can be adequately captured by common or orthogonal subspaces.1 This linearity assumption may be overly restrictive for complex scientific data, which often exhibit non-linear relationships or arise from diverse geometric structures (e.g., manifolds, hierarchies) where orthogonal decomposition is inappropriate.5 Furthermore, the accuracy of methods like JIVE can be sensitive to the correct estimation of the signal ranks within each dataset; overestimation can lead to inaccurate joint rank estimation.10 While interpretable components are a goal, the interpretation of the resulting subspaces, particularly in JIVE, can sometimes remain challenging.10 These limitations suggest that while valuable, linear methods may not suffice for capturing the full richness of objective structures in many scientific contexts.

III. Learning Domain-Invariant Representations via Machine Learning

Machine learning offers powerful paradigms for learning representations from data, with domain adaptation (DA) and transfer learning (TL) being particularly relevant for handling heterogeneity across scientific datasets.

Core Concepts: Domain Adaptation (DA) and Transfer Learning (TL)

Transfer learning encompasses techniques where knowledge gained from one task or dataset is applied to improve performance on a related but different task or dataset.7 This contrasts with traditional machine learning, which often assumes training and testing data are drawn independently and identically (IID) from the same distribution—an assumption frequently violated when dealing with data from different sources, experiments, or scientific domains.7 TL aims to bridge these distributional gaps by transferring shared knowledge.8

Domain adaptation is a specific subfield of transfer learning focused on adapting a model trained on a labeled source domain to perform well on a target domain where the data distribution differs, and labels may be scarce or absent.2 The motivation is compelling: DA can mitigate the challenges of small sample sizes in individual datasets, reduce the need for expensive data collection and annotation, improve model generalizability, and potentially uncover more fundamental, transferable truths by integrating information across domains.2 DA techniques are crucial when direct application of a source-trained model to a target domain fails due to statistical differences between the domains.2

Learning Domain-Invariant Features

A central strategy in DA is to learn domain-invariant features. The objective is to find a transformation of the input data, z = g(x), such that the resulting representation z has a similar distribution across both source and target domains, while still containing the necessary information to predict the target variable y.2 A predictor y = h(z) trained using labeled source data in this invariant space should then generalize effectively to the unlabeled target data.2 This forces the model g to capture features common to all domains, effectively ignoring domain-specific characteristics.2

Several techniques are employed to achieve this:

Distribution Matching: These methods explicitly minimize a statistical distance metric between the distributions of the source and target representations (P_S(z) and P_T(z)). Common metrics include Maximum Mean Discrepancy (MMD) 27 and Correlation Alignment (CORAL).27
Adversarial Learning: This popular approach involves a minimax game. A feature extractor g attempts to produce representations z that are indistinguishable across domains, while a separate domain discriminator network tries to identify the domain origin of z. The feature extractor is trained to “fool” the discriminator, thereby learning domain-invariant features.27 Domain Adversarial Neural Network (DANN) is a classic example.27

Theoretical Bounds and Challenges

While intuitively appealing, learning domain-invariant representations faces significant theoretical hurdles. Generalization bounds, such as the one developed by Ben-David et al., provide insights into the conditions required for successful DA.6 These bounds typically relate the error on the target domain to three key terms:

The error achievable on the source domain.
A measure of divergence between the source and target domain distributions in the representation space (d(P_S(z), P_T(z))).
The optimal joint error (λ*), representing the minimum error achievable by a single hypothesis across both domains combined.6

These bounds reveal critical challenges:

Challenge 1: Insufficiency of Marginal Alignment: Minimizing the divergence between the marginal distributions P(z) across domains is not sufficient to guarantee low target error.6 If the underlying relationship between the representation and the label, i.e., the conditional distribution P(y|z), differs significantly between domains, aligning P(z) can be ineffective or even detrimental. A transformation might perfectly align the feature distributions but simultaneously make the optimal source predictor perform poorly on the target domain.6 True objectivity or transferability seems to require alignment of the relationship P(y|z), not just the features P(z).
Challenge 2: Impact of Label Shift: A fundamental trade-off exists when the marginal label distributions P(y) differ between source and target domains.6 Information-theoretic lower bounds show that forcing perfect alignment of P(z) (i.e., achieving perfect domain invariance in the representation) can necessarily lead to a large optimal joint error λ* if P_S(y) and P_T(y) are dissimilar.6 This implies that methods relying solely on learning invariant representations may fundamentally struggle when the prevalence of classes changes across domains.
Challenge 3: Negative Transfer: Attempting to transfer knowledge from source domains that are too dissimilar or irrelevant to the target domain can actually degrade performance compared to training only on the target data.15 This necessitates careful domain selection or weighting strategies when dealing with multiple source domains.15
Challenge 4: Nonlinearity: Many complex systems, particularly in biology and other sciences, exhibit highly non-linear behavior. Domain adaptation techniques based on linear transformations or assumptions may prove inadequate for capturing the intricate patterns in such data.2

Invariant Risk Minimization (IRM)

Invariant Risk Minimization (IRM) offers a distinct perspective, aiming to learn a representation Φ(x) such that the optimal predictor conditioned on that representation is invariant across different environments or domains.16 Formally, it seeks a representation Φ such that argmin_w E is the same for all environments e. This is closely related to finding causal relationships.31 The underlying assumption is that the causal mechanism generating the outcome Y from a set of direct causes (captured by Φ(X)) remains invariant across environments, even though the distribution of the inputs X might change.31 By finding such an invariant representation, IRM hopes to identify the stable, causal predictors, distinguishing them from spurious correlations that might hold only within specific environments. In causal inference settings, IRM applied to observational data from multiple environments might help identify a representation that includes true confounders while excluding “bad controls” (variables affected by the outcome or treatment) that would otherwise bias estimates if adjusted for naively.31

The paradigms of domain adaptation and invariant representation learning directly address the quest for “objectivity” by explicitly seeking representations that are robust to shifts in the observational context (domain changes). The connection made by IRM between invariance and causality suggests that these methods might uncover particularly strong, mechanism-based forms of objective structure.31 However, the theoretical analyses underscore a crucial point: simple alignment of feature distributions is insufficient. True objectivity seems to reside in the invariance of the mapping from representation to outcome (P(y|z)) or in finding representations where the optimal predictor itself is invariant, as IRM attempts.6 Furthermore, the success of these approaches is fundamentally constrained by the relationship between the domains; significant differences in label distributions or task relevance can limit the very possibility of finding a single, effective invariant representation applicable across all contexts.6 Objectivity, therefore, might be achievable only relative to a specific set of related domains or perspectives, rather than universally.

IV. Disentangling Factors of Variation Across Domains

Disentangled Representation Learning (DRL) offers another powerful machine learning paradigm for uncovering fundamental structure in data, potentially aligning with the concept of “objective components.”

Goals of Disentangled Representation Learning (DRL)

The primary goal of DRL is to learn representations that identify and separate the distinct, underlying factors of variation responsible for generating the observed data.33 Ideally, this results in an interpretable latent space where different dimensions or subspaces correspond to independent, semantically meaningful attributes of the data (e.g., for images: object identity, pose, lighting, color).3 Such representations are highly desirable due to numerous potential benefits, including improved model explainability, enabling controllable data generation or manipulation, enhancing robustness to irrelevant variations, promoting better generalization to new data, ensuring fairness by separating sensitive attributes, and facilitating transfer learning across tasks or domains.33 If the same fundamental factors of variation operate across different scientific domains, a representation that successfully disentangles these factors could capture a shared, objective generative structure.

VAE and GAN-based Approaches

Generative models, particularly Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), are commonly employed for DRL.

Variational Autoencoders (VAEs): VAEs learn a probabilistic encoder mapping data x to a latent distribution q(z|x) and a decoder mapping latent variables z back to data space p(x|z).33 Disentanglement is typically encouraged by adding a regularization term to the VAE objective (the Evidence Lower Bound, or ELBO). A common regularizer penalizes the Kullback-Leibler (KL) divergence between the learned latent posterior q(z|x) and a factorized prior distribution, usually a standard Gaussian N(0, I).33 Modifications like β-VAE increase the weight (β > 1) on this KL term to enforce stronger factorization, often leading to better disentanglement scores but potentially sacrificing reconstruction quality.38 VAEs provide a principled probabilistic framework but often suffer from generating blurry images compared to GANs.38 Recent work explores discrete latent variables in VAEs as potentially providing a stronger inductive bias for disentanglement.33
Generative Adversarial Networks (GANs): GANs consist of a generator G that creates synthetic data from latent noise z, and a discriminator D that tries to distinguish real data from generated data.34 Disentanglement in GANs is often pursued by structuring the latent space or modifying the objective. InfoGAN, for example, maximizes the mutual information between a subset of the latent variables and the generated output, encouraging these latents to capture salient factors of variation.3 While GANs excel at producing high-fidelity, realistic samples 38, achieving stable training and controlled disentanglement can be more challenging than with VAEs.
Hybrid VAE/GAN Models: Several approaches combine VAEs and GANs to leverage their respective strengths.38 For instance, one might use the VAE’s encoder to learn a structured latent space and the GAN’s generator/discriminator to ensure high-quality synthesis.40 The ID-GAN framework proposes distilling the representation learned by a VAE (focused on disentanglement) into a separate GAN (focused on fidelity) by transferring the inference model, potentially allowing a layered approach to learning major disentangled factors followed by finer, entangled details.38

Cross-Domain Disentanglement

Applying DRL across different domains introduces further challenges, as the goal becomes disentangling factors consistently even as other aspects of the data distribution change. Often, supervision (e.g., knowledge of factors) might only be available in a source domain.3

Methods:

Joint Training/Adaptation: Models like the Cross-Domain Representation Disentangler (CDRD) are trained on data from multiple domains, using partial supervision (e.g., labels only in the source) to learn a shared latent space where factors are disentangled and domain adaptation occurs simultaneously.3
Life-long Learning Approaches: Methods like VASE (Variational Autoencoding Sequential Estimation) are designed for scenarios where data arrives sequentially from different domains.41 VASE uses the Minimum Description Length principle to automatically detect domain shifts, allocate new latent capacity for novel domain-specific factors, reuse latents for shared factors, and employ generative feedback (“dreaming”) to prevent catastrophic forgetting of old knowledge.41 The disentangled nature of its representation is key to its ability to find semantic homologies across domains.41
Contrastive Disentanglement: Approaches like ABCD (Augmentation Based Contrastive Disentanglement) leverage contrastive learning principles.43 By carefully selecting data augmentations known to affect specific factors (content-invariant vs. class-related), they learn disentangled representations without requiring full generative modeling or adversarial training, focusing specifically on class-content disentanglement.43
Explicit Feature Decomposition: Some methods propose explicitly modeling features as combinations of class-generic, class-specific, domain-generic, and domain-specific components, potentially using techniques like feature augmentation (e.g., XDomainMix) to encourage learning of invariant representations.44

Challenges and Theoretical Limits

Despite its promise, DRL faces fundamental challenges. A key theoretical result is that purely unsupervised disentanglement (learning separated factors without any labels, known transformations, or structural assumptions) is fundamentally ill-posed or impossible.33 Some form of inductive bias—whether architectural constraints, assumptions about the factors, or weak supervision—is necessary to guide the learning process towards a meaningful solution. This inherent difficulty suggests that achieving objective disentanglement across diverse scientific domains will likely require leveraging available domain knowledge, such as known physical symmetries, biological pathways, hierarchical structures, or the effects of specific interventions or transformations.25

Furthermore, the often-observed trade-off between the degree of disentanglement achieved and the quality of data reconstruction or generation remains a practical concern.38 Enforcing strong separation of factors might require the model to discard subtle information, potentially limiting the utility of the resulting “objective” components if high-fidelity prediction or reconstruction is the end goal. This tension highlights that a perfectly disentangled representation may not be perfectly informative about all aspects of the original data, requiring careful consideration of the desired balance for a given application. Finally, defining and reliably measuring disentanglement itself remains an active area of research, particularly when factors are correlated or non-independent.49

V. Unveiling Intrinsic Structure: Advanced Dimensionality Reduction and Manifold Learning

While linear methods like PCA and its generalizations capture shared variance in linear subspaces, they often fail to represent the complex, non-linear structures prevalent in scientific data.23 Many high-dimensional datasets encountered in practice are thought to conform to the Manifold Hypothesis: the data points, despite residing in a high-dimensional ambient space, actually lie on or near a lower-dimensional, intrinsically non-linear manifold.4 Manifold learning techniques aim specifically to uncover this hidden low-dimensional geometric structure, offering a powerful lens for discovering intrinsic, potentially objective, features.

Key Techniques

Several manifold learning algorithms have been developed, each making different assumptions about the underlying manifold and preserving different geometric properties:

Isomap (Isometric Mapping): Aims to preserve the geodesic distances between points on the manifold. It approximates these distances using shortest path lengths on a neighborhood graph constructed from the data points and then uses Multidimensional Scaling (MDS) to find a low-dimensional embedding that respects these distances.24
Locally Linear Embedding (LLE): Assumes the manifold is locally linear. Each data point is reconstructed as a weighted linear combination of its neighbors. LLE seeks a low-dimensional embedding where these same reconstruction weights are preserved, thus maintaining local linear relationships.
t-Distributed Stochastic Neighbor Embedding (t-SNE): Primarily designed for visualization, t-SNE focuses on preserving the local neighborhood structure of data points.55 It models similarities between high-dimensional points using Gaussian distributions and similarities between low-dimensional points using heavier-tailed t-distributions, minimizing the divergence between these two distributions. While excellent at revealing local clusters, t-SNE often distorts global distances and structure.4
UMAP (Uniform Manifold Approximation and Projection): A more recent and highly popular technique grounded in Riemannian geometry and algebraic topology.4 UMAP first constructs a high-dimensional graph representing the data, with edge weights indicating the likelihood of points being connected. It then optimizes a low-dimensional embedding to have a similar topological structure (specifically, a similar fuzzy simplicial complex).56 This optimization is typically done by minimizing the cross-entropy between the high-dimensional and low-dimensional similarities, effectively using attractive forces between neighbors and repulsive forces between non-neighbors.4 UMAP often produces high-quality visualizations comparable to t-SNE but tends to preserve more of the global data structure.4 It is also generally faster and more scalable than t-SNE and can be used for general-purpose dimensionality reduction beyond visualization, as it places no computational restrictions on the output dimension.4 It has found use in various applications, including analyzing MRI data in information geometry frameworks and visualizing high-dimensional biological data.19
Diffusion Maps: This technique leverages the concept of a diffusion process, or random walk, on a graph constructed from the data to reveal its underlying geometry.5 A Markov transition matrix P is built, where P(i, j) represents the probability of transitioning from data point i to point j in one step, typically based on local affinities computed using a kernel function (e.g., Gaussian).5 The eigen-decomposition of this matrix yields eigenvalues λ_k and eigenvectors ψ_k. The eigenvectors corresponding to the largest eigenvalues (close to 1) represent the principal modes of variation or the slowest modes of diffusion on the data manifold.5 A low-dimensional embedding is created using the first few non-trivial eigenvectors as coordinates: Ψ_t(i) = (λ_1^t ψ_1(i), λ_2^t ψ_2(i),…, λ_l^t ψ_l(i)), where t is a diffusion time parameter.5 The Euclidean distance between points in this diffusion space approximates the diffusion distance in the original data, which measures connectivity based on random walks.5 Points that are well-connected by many paths will be close in diffusion distance, even if far apart in Euclidean distance, thus revealing the manifold’s intrinsic connectivity.5 The time parameter t allows analysis at different scales; larger t corresponds to longer random walks and reveals coarser structures.5 Diffusion Maps are considered robust to noise and can be related to the graph Laplacian and spectral clustering.24 They can also be viewed within the framework of Graph Signal Processing, where the diffusion operator acts as a Graph Shift Operator (GSO).5

Cross-Domain Manifold Learning

Applying standard manifold learning techniques independently to datasets from different domains may result in incompatible low-dimensional embeddings that do not reveal shared structures. To address this, cross-domain manifold learning methods explicitly incorporate objectives to align structures across domains or ensure that learned transformations preserve manifold properties during transfer.15 For example, the Cross-Domain Manifold Structure Preservation (CDMSP) method aims to map high-dimensional features to a low-dimensional manifold while preserving non-linear relationships, aligning distributions, and maintaining cross-domain manifold consistency, potentially using iterative refinement with confidently labeled target samples.23 Multi-source manifold feature transfer (MMFT) frameworks may incorporate domain selection strategies (like DTE or ROD) to identify and utilize only beneficial source domains, thereby mitigating negative transfer when aligning manifold structures.15

Manifold learning techniques, by focusing on intrinsic geometry rather than ambient coordinates or linear projections, offer a compelling approach to discovering objective structures, assuming these structures manifest as low-dimensional manifolds that are conserved across different scientific observations.4 The choice among techniques like Isomap, LLE, UMAP, or Diffusion Maps implies different assumptions about the nature of this intrinsic geometry (e.g., preservation of geodesic distances, local linearity, topology, or diffusion connectivity).4 Diffusion Maps, with their inherent connection to random walks and the tunable scale parameter t, appear particularly well-suited for exploring complex, multi-scale geometric organization within data.5 However, achieving consistent and comparable structures across domains necessitates methods that explicitly perform cross-domain alignment or learn structure-preserving transformations.15

VI. Geometric Perspectives: Information Geometry and Topological Data Analysis

Beyond manifold learning operating directly on data points, two other powerful geometric frameworks offer distinct perspectives for analyzing data and potentially discovering objective structures: Information Geometry (IG) and Topological Data Analysis (TDA).

Information Geometry (IG)

Information Geometry applies the tools of differential geometry to the space of probability distributions, known as a statistical manifold.19 In this framework, each point on the manifold represents an entire probability distribution (e.g., a Gaussian defined by its mean and covariance). This allows geometric concepts like distance, curvature, and geodesics to be used for analyzing and comparing statistical models and the data they represent.19

A key element in IG is the Fisher Information Metric (FIM). Derived from the Fisher Information Matrix, which quantifies the amount of information data carries about the parameters of a distribution, the FIM defines a Riemannian metric on the statistical manifold.19 This metric provides a natural way to measure the “distance” between nearby probability distributions, reflecting their statistical distinguishability, and endows the manifold with a specific geometric structure.19

Using the FIM, one can compute the geodesic distance between two distributions—the length of the shortest path connecting them on the manifold.19 This geodesic distance serves as an intrinsic measure of dissimilarity between statistical models. For example, the geodesic distance between two multivariate Gaussian distributions under the FIM accounts for differences in both their mean vectors and their covariance matrices, providing a comprehensive measure of how distinct they are.19

IG finds applications in analyzing complex data relationships by studying the geometry of the underlying distributions.19 It can be used to address model uncertainty, analyze mixture models, and has been applied in fields like quantum systems and neural networks.58 In a study on Alzheimer’s disease, IG was used to represent MRI features for different impairment stages as Gaussian distributions; the geodesic distances between these distributions quantified the statistical divergence between stages, revealing significant differences in covariance structures and feature correlations.19 It is important to note that statistical manifolds encountered in practice are not always smooth Riemannian manifolds and may possess boundaries, singularities, or changes in dimension, adding complexity to the analysis.58 By focusing on the geometry of probability distributions themselves, IG provides a unique lens for comparing datasets or conditions based on their overall statistical properties, distinct from methods analyzing individual data points.

Topological Data Analysis (TDA)

TDA employs concepts from algebraic topology to analyze the “shape” of data, focusing on properties that are invariant under continuous deformations like stretching or bending, but not tearing or gluing.59 This focus on topology makes TDA inherently robust to noise and the specific choice of metric used to measure distances between data points.59

The central tool in TDA is Persistent Homology (PH).60 PH works by first building a sequence of nested topological spaces, usually simplicial complexes (generalizations of graphs that include triangles, tetrahedra, etc.), on top of the data points at varying scales.60 This sequence is called a filtration, often constructed using methods like the Vietoris-Rips complex, where simplices are added as a proximity threshold r increases.60 For each complex in the filtration, standard homology theory is used to count topological features of different dimensions:

0-dimensional features (H₀): Connected components.
1-dimensional features (H₁): Loops or holes.
2-dimensional features (H₂): Voids or cavities.
Higher-dimensional features.

PH tracks how these features appear (“birth”) and disappear (“death”) as the scale parameter r increases through the filtration.59 A feature “dies” when it gets filled in by higher-dimensional simplices. The persistence of a feature is the difference between its death scale and birth scale (d – b). Features that persist over a long range of scales are considered significant topological characteristics of the data, while short-lived features are often attributed to noise or sampling artifacts.59 The results of PH are typically summarized visually using persistence diagrams (scatter plots of (birth, death) points) or persistence barcodes (collections of bars representing the lifespan of each feature).60 The Betti numbers (ranks of the homology groups) quantify the number of features of each dimension at a given scale.63

TDA and PH have found applications in diverse scientific areas, including analyzing the structure of biological systems like protein folding dynamics 66 and RNA structures 66, understanding neural activity patterns and brain networks 53, detecting patterns in financial markets 59, and characterizing materials.66

Despite its strengths, TDA faces challenges. Standard PH computations can be sensitive to noise, particularly when data lies on a low-dimensional manifold embedded in a high-dimensional ambient space, as noise in many ambient dimensions can obscure the true topology.54 Combining PH with other techniques, such as using spectral distances (like diffusion distance or effective resistance) computed on a k-nearest-neighbor graph of the data before building the filtration, has been shown to improve robustness in high ambient dimensions.54 Interpreting the practical meaning of persistence diagrams or specific topological features can also require domain expertise.66 Furthermore, standard 1-parameter persistence might not capture all topological features simultaneously if they exist at very different scales 67; alternative theories like “consistent homology” have been proposed to address this by constructing a single graph intended to capture all features.71 Other TDA tools like the Mapper algorithm 63, Reeb graphs, and Morse-Smale complexes offer complementary ways to explore data topology.12

Information Geometry and Topological Data Analysis thus provide powerful, mathematically rigorous frameworks grounded in geometry for uncovering intrinsic data properties. IG examines the structure of the space of statistical models fitting the data, using metrics like FIM to quantify differences based on statistical distinguishability.19 TDA, particularly through persistent homology, probes the multi-scale shape and connectivity of the data points themselves, identifying topological invariants robust to noise and deformation.59 This robustness and focus on fundamental shape properties make TDA, especially PH, a strong candidate for identifying features that could be considered “objective” across different measurements or representations of an underlying phenomenon. However, practical application, especially to high-dimensional scientific data, may necessitate hybrid approaches that leverage manifold structures or spectral methods to mitigate noise sensitivity before applying topological tools.54

VII. Harnessing Negative Curvature: Hyperbolic Geometry in Representation Learning

Recent years have witnessed growing interest in utilizing hyperbolic geometry for machine learning, particularly for representation learning.25 This interest stems from the unique properties of hyperbolic spaces, which possess constant negative curvature.

Why Hyperbolic Geometry?

Unlike Euclidean spaces (zero curvature) or spherical spaces (positive curvature), hyperbolic spaces exhibit distinct geometric characteristics that make them particularly suitable for certain types of data:

Exponential Volume Growth: The volume of a ball in hyperbolic space grows exponentially with its radius, in contrast to the polynomial growth in Euclidean space.25 This property allows hyperbolic spaces to accommodate large amounts of data or complex structures efficiently.
Tree-likeness: Hyperbolic geometry is often described as the “continuous analogue” of discrete tree structures.25 Due to their negative curvature, hyperbolic spaces can embed tree-like or hierarchical data with arbitrarily low distortion, often requiring significantly lower embedding dimensions compared to Euclidean space.25 This makes them naturally suited for representing data with inherent hierarchical organization, such as taxonomies, phylogenetic trees, organizational charts, or scale-free networks, which are common in various scientific domains.

Representing Hierarchical Data

The tree-like nature of hyperbolic space allows for intuitive representation of hierarchies. In commonly used models like the Poincaré disk, the origin naturally serves as the root of the hierarchy.26 Nodes are embedded such that their distance from the origin reflects their depth in the hierarchy (roots closer to the origin, leaves closer to the boundary), and the hyperbolic distance between nodes reflects their distance in the original tree structure.26 This geometric prior has been successfully exploited in applications like natural language processing (capturing word hierarchies and entailment) 26, knowledge graph embedding 26, computer vision tasks involving object-scene relationships or hierarchical classification 73, social network analysis 79, and biological data analysis. If objective structures across scientific domains possess an inherent hierarchical organization, hyperbolic geometry provides a natural geometric language for modeling and representing them.

Hyperbolic Embeddings and Neural Networks

Integrating hyperbolic geometry into deep learning requires adapting standard neural network operations and architectures. Since hyperbolic space is not a vector space, basic Euclidean operations like vector addition and matrix multiplication do not directly apply or preserve the geometry. This has spurred the development of principled generalizations of neural network components for hyperbolic spaces, often within specific models like the Poincaré ball 26 or the Lorentz model.79

Research has led to hyperbolic versions of multinomial logistic regression, feed-forward layers, recurrent neural networks (including GRUs), attention mechanisms, convolutional layers, transformers, and residual networks.75 Libraries such as HypLL aim to facilitate the development and adoption of these hyperbolic deep learning modules.80

Explicitly Embedding Structure

A crucial point emerging from recent research is that simply performing optimization within a hyperbolic space does not automatically guarantee that the desired hierarchical structure will be accurately captured.26 Many early methods implicitly assumed that the optimization process, guided by task-specific losses (like link prediction or reconstruction), would naturally infer and preserve the hierarchy. However, empirical analyses using position-tracking have shown that the resulting embeddings can be suboptimal and may not faithfully reflect the intended hierarchy.26

This observation motivates methods that explicitly incorporate hierarchical information into the learning process:

HypStructure: This approach assumes a known label hierarchy (e.g., a biological taxonomy, a conceptual tree) is available.25 It introduces a regularization term, added to a standard task loss (like cross-entropy for classification), that penalizes deviations between distances in the hyperbolic embedding space and distances in the ground-truth hierarchy tree. This tree-based loss often utilizes metrics like the Cophenetic Correlation Coefficient (CPCC) computed using hyperbolic distances, along with a centering loss to position the hierarchy appropriately.25 By directly enforcing consistency with the known hierarchy, HypStructure leads to embeddings with lower distortion, improved generalization performance (particularly in low embedding dimensions), enhanced out-of-distribution (OOD) detection capabilities, and more interpretable, visually tree-like representations.25
HIE (Hyperbolic Informed Embedding): This method addresses scenarios where an explicit hierarchy is not given.26 It leverages the geometric properties of hyperbolic embeddings themselves—specifically, the hyperbolic distance to the origin (HDO)—to infer an implicit hierarchy (identifying potential roots and relative levels). This inferred structural information is then used, without additional parameters, to guide and improve the representation learning process of existing hyperbolic models.26

Additionally, hyperbolic geometry is being integrated into other frameworks like contrastive learning. Hyperbolic contrastive learning can model hierarchical relationships, for example, by encouraging representations of scenes (e.g., an image of a park) to be close to representations of their constituent objects (e.g., trees, benches) in hyperbolic space, reflecting their compositional relationship.78

The capacity of hyperbolic spaces to embed complex hierarchies efficiently, often in very low dimensions, suggests a potential pathway towards discovering highly compressed yet structurally informative objective representations.25 However, realizing this potential seems to require methods that move beyond implicit assumptions and explicitly incorporate or infer the relevant hierarchical structure during learning.26

VIII. Interdisciplinary Connections: Geometry, Topology, Physics, and Data

The search for objective components benefits from drawing connections between data analysis techniques and fundamental concepts in geometry, topology, and physics. These fields provide rigorous frameworks for describing intrinsic properties and inferring hidden structures, offering valuable perspectives and analogies.

Differential Geometry in Machine Learning

The recognition that much real-world data, particularly in vision, pattern recognition, and biology, does not conform to Euclidean assumptions has spurred the integration of differential geometry into machine learning.52 The core premise is that data often resides on or near non-Euclidean manifolds embedded within high-dimensional observation spaces.82 Exploiting the intrinsic geometry of these manifolds can lead to more accurate data representations, better algorithms, and improved performance.82

Techniques leveraging differential geometry include Riemannian methods adapted for computer vision, statistical analysis directly performed on manifolds, the use of manifold-valued features, and geometric deep learning architectures that respect underlying symmetries or structures.82 Applications span statistical shape analysis, robotics, health analytics, and computational biology.82 For instance, a multiscale differential geometry (MDG) strategy has been proposed for single-cell RNA sequencing analysis, constructing cell-cell interaction manifolds and using curvature-based features to capture complex relationships and classify cell types.52 More speculative approaches like Synthetic Differential Geometry (SDG), which utilizes infinitesimals and categorical logic, propose reimagining the mathematical foundations of ML, potentially offering new ways to model data manifolds, design optimization algorithms, and structure neural networks, although practical validation remains limited.51 The explicit use of differential geometry provides a mathematically rigorous language for describing local intrinsic properties (like curvature and metric tensors) that are independent of specific coordinate systems, aligning well with the search for objective descriptions.

Topology (including 3-Manifolds) in Data Representation

Topology, the study of properties preserved under continuous deformation, offers tools to characterize the global “shape” of data.59 TDA, primarily through persistent homology (PH), extracts topological invariants like Betti numbers (counting connected components, loops, voids, etc.) that are robust to noise and metric choices.60

Data in scientific visualization and other fields are often conceptualized as being sampled from or defined on manifolds, frequently of low dimension (1D, 2D, or 3D).12 TDA tools like PH can, in principle, compute topological invariants for data sampled from such spaces, including 3-manifolds.12 Simplicial complexes serve as discrete approximations of these underlying spaces.12 While computing lower-dimensional homology (H₀, H₁) is relatively standard, robustly computing higher-dimensional homology (H₂, H₃, etc.) from noisy point cloud data presents significant challenges, especially concerning noise sensitivity in high ambient dimensions.54 Connections to areas like knot theory have also been explored within TDA frameworks.83

The relevance of topology to discovering objective components lies in its focus on fundamental, invariant properties of shape. If data generated across different scientific domains or perspectives consistently exhibits the same topological signature (e.g., the same Betti numbers, indicating derivation from homeomorphic underlying spaces), this shared topological structure could represent a deeply objective feature of the phenomenon under study.

Insights from Physics: Gravitational Lensing

Gravitational lensing, a phenomenon predicted by Einstein’s general theory of relativity, provides a compelling physical analogy for the problem of inferring objective structure from diverse observations.14 Massive objects like galaxies or galaxy clusters warp the fabric of spacetime, causing light rays from more distant background sources to bend as they pass nearby.14 This lensing effect can produce multiple distorted images, elongated arcs, or even complete Einstein rings of the background source.14

The statistical analysis of these lensing effects—such as the frequency of multiple imaging events, the distribution of image separations, and image multiplicity—is a powerful tool in cosmology.13 It allows astronomers to constrain cosmological parameters (like the density of matter and dark energy) and, crucially, to map the distribution of mass, including invisible dark matter, within the lensing objects.13

Several aspects of gravitational lensing analysis resonate with the search for objective components in data:

Inferring Latent Structure: Lensing allows inference about the properties of an unobservable structure (the mass distribution of the lens) by observing its effects on probes (the light from background sources) viewed from different perspectives (different source-lens alignments).14 This mirrors the goal of inferring latent objective components by analyzing their consistent influence on observable data across different domains or measurement contexts.
Importance of Correct Geometry: Accurate interpretation of lensing statistics critically depends on using the correct geometric description of spacetime and light propagation.13 In a realistic, clumpy universe, the Dyer-Roeder distance, which accounts for the focusing effect of matter along the line of sight (parameterized by smoothness parameter α), must be used for lensing calculations, rather than the simpler angular diameter distance derived from a perfectly homogeneous Friedmann-Lemaître model.13 Using the wrong geometric assumptions leads to incorrect cosmological inferences. This underscores the profound impact that the choice of mathematical framework (Euclidean, hyperbolic, Riemannian, topological) has on data analysis; applying an inappropriate geometry can distort the inferred structure and mask true objective components.
Statistical Signatures: The specific patterns observed in lensed images (number, brightness, positions) are statistical signatures determined by the lens properties and cosmological geometry.13 Analogously, objective components underlying scientific data might reveal themselves through consistent statistical patterns, transformations, or relationships observed across diverse datasets.

The lensing analogy suggests that objective scientific structures might be identified not by directly observing the structure itself, but by detecting its consistent, predictable influence on measurements made through different “lenses” or from different perspectives. This motivates searching for invariant relationships, common transformation rules, or shared statistical signatures within cross-domain data. Furthermore, the critical role of geometry in cosmology highlights that the choice of geometric tools in data analysis is not merely a technical detail but a fundamental modeling decision that shapes our ability to perceive and interpret objective reality.

IX. Evaluating Objectivity, Invariance, and Generalizability

A significant challenge in the quest for objective components is the evaluation process itself. Since “objectivity” lacks a single, universally agreed-upon mathematical definition in this context, its assessment relies on evaluating proxy properties such as domain invariance, disentanglement, generalizability, robustness, or consistency with causal principles. Consequently, a diverse array of metrics has been developed, each targeting specific facets of these proxies.

Challenges in Defining and Measuring “Objectivity”

The ambiguity of “objectivity” necessitates careful consideration of what specific property is being sought. Is it invariance to measurement modality? Robustness across different experimental conditions? Consistency with underlying physical laws? Independence from specific modeling choices? The choice of evaluation metrics must align with the particular operational definition of objectivity relevant to the scientific question at hand. This often requires moving beyond standard machine learning metrics towards assessments tailored to the specific notion of invariance or structure being investigated (e.g., geometric, topological, causal).

Metrics for Domain Invariance / Generalization

When objectivity is framed as robustness or generalizability across domains, several evaluation strategies are employed:

Target Domain Performance: The most common approach is to train a model on data from one or more source domains and evaluate its performance (e.g., classification accuracy, prediction error, ranking metrics like NDCG) on unseen target domains.16 Performance is often compared against baselines like Empirical Risk Minimization (ERM) trained only on source data.27
IRM-based Evaluation: For methods aiming at Invariant Risk Minimization, evaluation can involve checking if the learned representation Φ(X) indeed satisfies the IRM condition—that the optimal predictor E[Y|Φ(X)] is invariant across environments.31 The NICE framework uses this principle to evaluate representations for causal adjustment.31
CRIC (Cross-Risk Invariance Criterion): This recently proposed criterion aims to directly assess the invariant performance of a learned representation Φ(X), independent of a specific downstream predictor.32 It is based on the theoretical property that for an ideal invariant representation, the expected optimal predictor in one environment should equal the expectation in another environment when weighted by the density ratio (likelihood ratio) between the environments.32 An empirical estimator for CRIC allows practical computation from available multi-environment data.32
Domain Divergence Metrics: These metrics quantify the statistical distance between the distributions of the learned representations z = g(x) from different domains. Examples include MMD, CORAL distance, or the loss of an adversarial domain discriminator.27 Lower divergence scores suggest better alignment and potentially better invariance, although as noted earlier, low divergence alone does not guarantee good target performance.6 These metrics are often used directly as regularization terms during training in DA methods.27
Cross-Domain Evaluation Frameworks: Specific experimental setups are designed to rigorously test generalization across domains, considering factors like domain similarity, the nature of dataset shifts, performance on out-of-distribution (OOD) data, and the impact of domain-specific versus shared information.27

Metrics for Disentanglement

Evaluating the success of DRL methods in separating underlying factors requires specialized metrics, typically assuming access to the ground-truth factors of variation (supervised evaluation):

Taxonomy: Metrics are often categorized into three families 35:

Intervention-based: Measure how latent codes change when one ground-truth factor is varied while others are held constant (e.g., BetaVAE metric, IRS). Often require specific data generation or large sample sizes.9
Predictor-based: Train a simple predictor (e.g., linear model, Lasso, decision tree) to predict ground-truth factors from the learned latent codes. The predictor’s properties (e.g., sparsity of weights, accuracy) indicate disentanglement (e.g., SAP score, DCI score).33
Information-based: Estimate the mutual information (MI) between individual latent codes and individual ground-truth factors (e.g., MIG, FactorVAE metric).37

Properties Measured: These metrics aim to quantify desirable properties of disentanglement 35:

Modularity (Disentanglement): Each latent code should depend on at most one factor.
Compactness (Completeness): Each factor should influence only a small, ideally single, subset of latent codes.
Explicitness (Informativeness): The representation should retain the information about the ground-truth factors.

Specific Metrics: Examples include SAP, DCI (Disentanglement, Completeness, Informativeness) 33, MIG (Mutual Information Gap) 37, FactorVAE metric 49, IRS (Implicit Rank Subspace) 91, D_LSBD (for Linear Symmetry-Based Disentanglement, based on group equivariance) 46, and EDI (Exclusivity Disentanglement Index, using MI estimation via MINE and an “exclusivity” concept).35 Metrics have also been proposed specifically for scenarios with non-independent factors, based on concepts of minimality and sufficiency.49
Challenges: Disentanglement metrics face numerous challenges: sensitivity to hyperparameters (e.g., number of bins for MI estimation, predictor model choice), sensitivity to noise and non-linear factor-code relationships, the inherent difficulty of accurately estimating MI, a lack of consensus on which metric is best, and sometimes poor correlation between different metrics even on the same data.35 Newer metrics like EDI aim to improve stability and calibration.37

Metrics for Representation Quality (Related)

Other areas of representation learning also offer relevant evaluation perspectives:

Metric Learning: Evaluates whether the learned representation space has a meaningful distance structure. Triplet loss accuracy (checking if anchor-positive distance < anchor-negative distance) assesses relative distances, while absolute distance accuracy checks if distances fall below/above a threshold for similar/dissimilar pairs.93
Bisimulation Metrics: Used in reinforcement learning, these metrics quantify the behavioral similarity between states based on rewards and transitions. Learning representations where latent distance matches bisimulation distance encourages task-relevant abstraction and invariance to irrelevant state features.94
Cross-Domain Feature Similarity (CDFS): An approach used in blind image quality assessment found that the similarity between feature representations extracted by networks trained for different tasks (e.g., object recognition vs. quality prediction) on the same image correlated well with human quality judgments.96 This suggests a potential evaluation strategy: comparing representations derived from different analytical perspectives on the same underlying data.

Evaluating “objectivity” thus requires a nuanced approach. No single metric suffices; a combination targeting specific properties like invariance, disentanglement, generalizability, or geometric consistency, chosen based on the scientific context and the hypothesized nature of the objective components, is necessary. While standard ML metrics provide a baseline, many suffer from limitations like sensitivity to hyperparameters or assumptions about linearity or factor independence.35 Newer metrics attempt to address these issues 32, but robust evaluation, especially for complex scientific data, remains challenging. Critically, assessing objective components across scientific domains might demand novel metrics that explicitly check for consistency with known physical laws, conservation principles, symmetries, or geometric/topological constraints relevant to the specific scientific context, moving beyond purely data-driven performance evaluation.52

X. Synthesis: Comparing Mathematical and Machine Learning Approaches

The diverse methodologies explored—ranging from linear algebraic techniques to deep learning and geometric analysis—offer distinct strengths and weaknesses in the pursuit of objective principal components across scientific domains. A comparative analysis based on key criteria can illuminate their relative suitability.

Comparative Framework and Criteria

We compare the main categories of methods based on:

Theoretical Foundations & Assumptions: The underlying mathematical principles (linear algebra, probability, information theory, geometry, topology) and critical assumptions made (linearity, distributional properties, factor independence, manifold smoothness, hierarchy, etc.).
Type of Invariance Captured: What aspect of the data or its generative process is assumed or enforced to be invariant across domains (e.g., linear subspace, feature distribution, causal mechanism, geometric/topological structure, hierarchy).
Handling of Heterogeneity: How explicitly the method addresses differences between domains (e.g., modeling joint/individual variation, robustness to distribution shift, handling multi-modal data).
Interpretability: The extent to which the discovered components or representations lend themselves to meaningful scientific interpretation (e.g., identifiable factors, geometric properties, physical meaning).
Computational Requirements & Scalability: Data size limitations, computational complexity, and feasibility for large-scale scientific datasets.
Need for Supervision/Prior Knowledge: The level of supervision required (unsupervised, weakly supervised, fully supervised) and the ability to incorporate domain-specific prior knowledge (e.g., known transformations, hierarchies, physical laws).

Comparative Analysis

PCA Generalizations (Common PCA, JIVE):

Foundations: Linear algebra, variance decomposition. Assume shared linear structure. JIVE assumes orthogonality of joint/individual spaces.10
Invariance: Shared linear subspace capturing common variance.9
Heterogeneity: JIVE explicitly models joint vs. individual variation.10 Common PCA seeks consensus.9 Primarily handle feature heterogeneity on common subjects.
Interpretability: Components are linear combinations of features; JIVE separation is interpretable but subspace meaning can be complex.10
Scalability: Manageable for moderate dimensions; large-scale versions exist (e.g., for scRNA-seq JIVE 22). Based on SVD/eigen-decomposition.
Supervision: Unsupervised.
Objectivity: Strong for identifying shared linear processes. Limited by linearity for complex phenomena.

Domain Adaptation / Invariant Representation Learning:

Foundations: Statistics, optimization, adversarial learning. Assume relatedness between domains, often target task invariance.2 IRM assumes invariant causal mechanism.31
Invariance: Distributional invariance of features (P(z)) 6, or invariance of optimal predictor given features (E[Y|Φ(X)]) for IRM.31
Heterogeneity: Directly addresses distribution shift (domain gap).2 Can struggle with large shifts or label distribution differences.6
Interpretability: Learned features can be opaque (deep networks). IRM aims for causal features, enhancing interpretability if successful.31
Scalability: Deep learning methods scale well with data size but can be computationally intensive to train.
Supervision: Typically requires labeled source, unlabeled/sparsely labeled target (unsupervised/semi-supervised DA).6 IRM requires multiple environments.
Objectivity: Directly targets robustness to domain shift. IRM connects invariance to causality, a strong form of objectivity. Limited by sufficiency of marginal alignment and label shift issues.6

Disentanglement (VAE, GAN, etc.):

Foundations: Probabilistic modeling (VAE), game theory (GAN), information theory. Often assumes factorized latent space prior.33 Assumes data generated from underlying factors.
Invariance: Aims for representation where axes correspond to invariant underlying factors of variation.34
Heterogeneity: Cross-domain methods exist (e.g., VASE 41), but add complexity. Can model domain-specific vs. shared factors.44
Interpretability: High potential if successful, as latents map to meaningful factors.33
Scalability: VAEs/GANs scale but training can be complex/unstable.
Supervision: Fundamentally challenged in unsupervised setting; needs inductive biases or weak supervision (e.g., known transformations, partial labels).33
Objectivity: Offers potential to find invariant generative factors. Limited by unsupervised learning impossibility and reconstruction trade-offs.33

Manifold Learning (UMAP, Diffusion Maps):

Foundations: Differential geometry, topology, graph theory. Assumes data lies on low-D manifold.53 Different methods preserve different geometric properties (geodesics, local linearity, topology, diffusion).4
Invariance: Captures intrinsic geometric structure invariant to ambient embedding/coordinates.4
Heterogeneity: Standard methods applied independently don’t guarantee alignment. Cross-domain versions needed.15 Robust to noise (esp. Diffusion Maps 24).
Interpretability: Embeddings provide coordinates on the manifold; global structure can be visualized. Diffusion Maps offer multi-scale interpretation.24
Scalability: Varies; UMAP is relatively scalable.4 Graph-based methods depend on graph size/sparsity.
Supervision: Typically unsupervised.
Objectivity: Strong potential for capturing intrinsic, non-linear geometric structure independent of measurement details. Choice of method imposes bias.

Information Geometry (IG):

Foundations: Differential geometry applied to statistical manifolds.19 Uses Fisher Information Metric.19
Invariance: Focuses on intrinsic geometry of probability distributions, invariant to parameterization.58
Heterogeneity: Compares entire distributions, naturally handling differences in statistical properties across domains/datasets.19
Interpretability: Geodesic distances quantify statistical dissimilarity; curvature relates to model complexity/interactions.
Scalability: Depends on complexity of distributions and FIM calculation. Often applied after dimensionality reduction.19
Supervision: Unsupervised analysis of distributions.
Objectivity: Provides intrinsic geometric view of statistical models/data distributions. Less focused on individual data point representations.

Topological Data Analysis (TDA):

Foundations: Algebraic topology (homology theory).60 Focuses on invariants under continuous deformation.59
Invariance: Captures robust topological features (connectivity, loops, voids) independent of metric choice or smooth transformations.59
Heterogeneity: Robustness to noise and metric makes it suitable for diverse data types if a meaningful distance/similarity can be defined.59
Interpretability: Barcodes/diagrams summarize persistent features; identifying corresponding data structures can require effort.66 Betti numbers provide counts.
Scalability: Computing PH can be computationally expensive, especially for large datasets and high dimensions. Efficient algorithms exist but limitations remain.68
Supervision: Unsupervised.
Objectivity: Excellent potential for identifying fundamental, robust “shape” characteristics invariant across scales and perspectives. Can struggle with high ambient dimension noise.54

Hyperbolic Geometry:

Foundations: Non-Euclidean geometry (negative curvature).73 Assumes data has hierarchical/tree-like structure.25
Invariance: Preserves hierarchical relationships with low distortion.25
Heterogeneity: Primarily suited for data where hierarchy is a shared structure across domains.
Interpretability: Embeddings naturally reflect hierarchy (distance from origin, inter-node distances).26
Scalability: Requires specialized optimization and layers; potential for low-D embeddings is advantageous.25
Supervision: Can be unsupervised (implicit hierarchy) but benefits significantly from explicit structure guidance (e.g., known hierarchy for HypStructure 25) or inference (HIE 26).
Objectivity: Ideal for capturing objective hierarchical organization. Less applicable if data lacks such structure.

Comparative Table

Methodology Category	Key Assumptions	Type of Invariance Captured	Handling Heterogeneity	Interpretability	Scalability	Supervision Needs	Strengths for Objectivity	Weaknesses for Objectivity
PCA Generalizations (Common PCA, JIVE)	Linearity, Orthogonality (JIVE), Correct Ranks 10	Shared Linear Subspace / Variance 9	Explicit Joint/Individual (JIVE) 10	Linear Components; JIVE separation clear 10	Moderate; SVD-based 22	Unsupervised	Identifies shared linear processes; Explicit separation 10	Limited by linearity; Subspace meaning can be complex 10
Domain Adaptation / Inv. Rep. Learning	Domain Relatedness, Task Invariance 2	Distributional (P(z) or P(y\	z)), Causal (IRM) 6	Addresses Distribution Shift 2; Sensitive to large shifts/label shift 6	Features often opaque; IRM aims for causal 31	Deep learning scales well; Training intensive	Labeled Source (DA); Multi-Env (IRM) 6	Targets robustness to domain shift; IRM seeks causality 31
Disentanglement (VAE, GAN, etc.)	Factorized Latents, Generative Factors 33	Underlying Generative Factors 34	Cross-domain methods exist 41; Can model shared/specific factors 44	High if successful (factors mapped) 34	VAE/GAN scale; Training complex/unstable	Needs Inductive Bias / Weak Supervision 33	Potential for invariant generative factors 34	Unsupervised limits; Reconstruction trade-off 33
Manifold Learning (UMAP, Diffusion Maps)	Data on Low-D Manifold 53	Intrinsic Geometry / Topology 4	Needs Cross-Domain Alignment 15; Noise robust (Diffusion Maps 24)	Manifold coordinates; Multi-scale (Diffusion Maps 24)	Varies; UMAP scalable 4	Unsupervised	Captures intrinsic non-linear geometry 4	Choice of method imposes bias; Alignment needed across domains
Information Geometry	Data modeled by distributions 19	Intrinsic Geometry of Statistical Manifolds 58	Compares distributions directly 19	Geometric properties of distributions (distance, curvature)	Depends on distribution complexity 19	Unsupervised	Intrinsic view of statistical models/populations 58	Less focused on individual data point representation
Topological Data Analysis (TDA)	Data reflects underlying shape 61	Topological Invariants (Betti numbers) 60	Robust to noise/metric 59; High-D noise challenge 54	Barcodes show persistent features 60	Expensive for large/high-D data 68	Unsupervised	Robust, scale-invariant shape features 59	High-D noise sensitivity; Interpretation 54
Hyperbolic Geometry	Hierarchical/Tree-like Structure 25	Hierarchical Relationships 25	Assumes shared hierarchy	Embeddings reflect hierarchy 26	Specialized layers needed; Low-D potential 25	Benefits from Structure Guidance 25	Natural for objective hierarchies 73	Assumes hierarchical structure; Less general otherwise

This comparison reveals a spectrum of approaches. Linear methods offer interpretable separation of shared variance but are limited by their linearity. Domain adaptation and invariant representation learning directly tackle distributional shifts, with IRM providing a link to causality, but face theoretical limitations regarding conditional distributions and label shifts. Disentanglement aims for fundamental generative factors but struggles with unsupervised learning constraints and potential information loss. Geometric and topological methods (Manifold Learning, IG, TDA, Hyperbolic Geometry) appear theoretically well-aligned with capturing intrinsic, coordinate-independent structures, moving beyond linearity. However, each focuses on different aspects (local geometry, global topology, statistical geometry, hierarchy) and faces its own challenges (e.g., computational cost for TDA, alignment for manifold learning, structural assumptions for hyperbolic geometry).

No single method emerges as universally superior. The optimal choice hinges on the specific nature of the scientific data and, crucially, the hypothesized nature of the objective components being sought. If shared linear dynamics are expected, JIVE might be appropriate. If an intrinsic low-dimensional manifold is suspected, UMAP or Diffusion Maps could be powerful. If hierarchical organization dominates, hyperbolic methods are a natural fit. If fundamental shape characteristics are key, TDA offers robust tools. This suggests that hybrid approaches, combining the strengths of different paradigms—such as using spectral methods to define robust distances for TDA in high dimensions 54, or integrating geometric priors into deep learning architectures—may hold the most promise. Furthermore, the inherent difficulties in purely data-driven discovery, particularly for disentanglement 33, strongly indicate that incorporating prior scientific knowledge (e.g., known hierarchies for HypStructure 25, physical symmetries or conservation laws, known transformations for symmetry-based disentanglement 48) is likely essential to guide these powerful methods towards discovering components that are not only statistically invariant but also scientifically meaningful and truly objective.

XI. Conclusion and Future Research Directions

Summary of Findings

The challenge of discovering “objective principal components”—fundamental, invariant structures or principles—across diverse scientific domains necessitates moving beyond traditional linear methods like PCA. This report has surveyed a wide array of advanced mathematical and machine learning techniques capable of tackling this problem. Generalizations of PCA, such as Common PCA and JIVE, offer ways to extract shared linear variance across multiple datasets, with JIVE explicitly separating joint from individual variation. Machine learning approaches like domain adaptation, invariant representation learning (including IRM), and disentangled representation learning aim to learn feature representations that are robust to domain shifts or that isolate underlying generative factors, connecting invariance to concepts like causality and interpretability. Advanced dimensionality reduction and manifold learning techniques (Isomap, LLE, t-SNE, UMAP, Diffusion Maps) focus on uncovering the intrinsic non-linear geometry of data, assuming it lies on lower-dimensional manifolds. Geometric perspectives from Information Geometry (analyzing statistical manifolds) and Topological Data Analysis (extracting robust shape features via persistent homology) provide powerful, mathematically grounded frameworks for identifying intrinsic properties. Hyperbolic geometry offers a specialized tool for efficiently representing hierarchical structures often present in scientific data.

However, each approach carries its own set of assumptions and limitations. Linear methods struggle with non-linearity. Domain adaptation faces theoretical hurdles related to conditional distribution and label shifts. Unsupervised disentanglement is fundamentally ill-posed without inductive biases. Manifold learning requires careful cross-domain alignment. TDA can be computationally expensive and sensitive to high ambient noise. Hyperbolic methods assume underlying hierarchical structure. Furthermore, defining and evaluating “objectivity” itself remains a significant challenge, requiring careful selection of proxy properties and metrics, many of which have known limitations.

Key Takeaways

Several key themes emerge from this analysis:

No Universal Solution: There is no single “best” method for discovering objective components. The suitability of an approach depends critically on the characteristics of the data (linearity, geometry, hierarchy, noise properties) and the specific nature of the hypothesized objective structure.
Geometric and Topological Promise: Methods grounded in geometry and topology (Manifold Learning, IG, TDA, Hyperbolic Geometry) appear particularly promising as they are inherently designed to capture intrinsic, coordinate-independent properties, aligning well with the philosophical notion of objectivity.
Importance of Prior Knowledge: Given the limitations of purely data-driven discovery (especially in unsupervised settings), the incorporation of prior scientific knowledge—whether physical laws, biological constraints, known symmetries, transformations, or hierarchies—is likely crucial for guiding algorithms towards scientifically meaningful and robust objective components.
Hybrid Approaches: Combining the strengths of different methodologies (e.g., spectral methods with TDA, generative models with explicit structure regularization, manifold alignment within domain adaptation) may offer the most powerful and flexible path forward.
Evaluation Bottleneck: Robustly evaluating the “objectivity” or invariance of discovered components remains a critical challenge, necessitating the development and careful application of appropriate metrics, potentially including science-specific consistency checks.

Future Research Directions

The quest for objective structures across scientific domains remains an active and challenging frontier. Promising avenues for future research include:

Algorithm Development: Enhancing the robustness, scalability, and theoretical guarantees of geometric and topological methods (TDA, manifold learning, IG, hyperbolic methods), particularly for noisy, high-dimensional, large-scale scientific datasets.
Principled Hybrid Frameworks: Developing rigorous frameworks for integrating different approaches, such as incorporating geometric priors into deep learning architectures, using TDA-derived features within domain adaptation models, or combining manifold learning with causal inference techniques.
Advanced Evaluation Metrics: Designing more reliable and informative metrics for evaluating objectivity, invariance, and disentanglement, potentially incorporating checks for consistency with known scientific principles, symmetries, or causal knowledge, and moving beyond simple prediction accuracy on hold-out domains.
Deeper Theoretical Connections: Exploring and leveraging deeper connections between fundamental concepts in mathematics and physics (e.g., gauge theory, category theory, information theory, advanced topology and differential geometry) and machine learning for invariant representation learning and structure discovery.
Causal Representation Learning Across Domains: Developing methods that explicitly model and disentangle shared causal mechanisms underlying data generation across different scientific domains or experimental conditions, potentially leading to the discovery of invariant causal laws.

By pursuing these directions, the fields of mathematics, statistics, and machine learning can continue to develop powerful tools to aid scientists in their fundamental quest to uncover the objective structures and unifying principles that govern the natural world.

Works cited

Principal Component Analysis for Equation Discovery – arXiv, accessed April 9, 2025, https://arxiv.org/html/2401.04797v1
Domain adaptation in small-scale and heterogeneous biological datasets – PMC, accessed April 9, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC11661433/
[1705.01314] Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation – ar5iv, accessed April 9, 2025, https://ar5iv.labs.arxiv.org/html/1705.01314
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction – arXiv, accessed April 9, 2025, https://arxiv.org/abs/1802.03426
arxiv.org, accessed April 9, 2025, https://arxiv.org/pdf/2312.14758
On Learning Invariant Representations for Domain Adaptation …, accessed April 9, 2025, https://blog.ml.cmu.edu/2019/09/13/on-learning-invariant-representations-for-domain-adaptation/
What is transfer learning? – IBM, accessed April 9, 2025, https://www.ibm.com/think/topics/transfer-learning
Trustworthy Transfer Learning: A Survey – arXiv, accessed April 9, 2025, https://arxiv.org/html/2412.14116v1
A review on multi-view learning, accessed April 9, 2025, https://journal.hep.com.cn/fcs/CN/10.1007/s11704-024-40004-w
Interpretive JIVE: Connections with CCA and an … – Frontiers, accessed April 9, 2025, https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2022.969510/full
JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES – PMC, accessed April 9, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC3671601/
Introduction to Topological Data Analysis, accessed April 9, 2025, https://www-apr.lip6.fr/~tierny/stuff/teaching/tierny_topologicalDataAnalysis.pdf
statistics of gravitational lenses in a clumpy Universe | Monthly …, accessed April 9, 2025, https://academic.oup.com/mnras/article/357/2/773/1374935
Gravitational Lensing | HubbleSite, accessed April 9, 2025, https://hubblesite.org/contents/articles/gravitational-lensing
Comparison of Domain Selection Methods for Multi-Source Manifold Feature Transfer Learning in Electroencephalogram Classification – MDPI, accessed April 9, 2025, https://www.mdpi.com/2076-3417/14/6/2326
Domain Invariant Representation Learning with Domain Density Transformations – NIPS papers, accessed April 9, 2025, https://proceedings.neurips.cc/paper_files/paper/2021/file/2a2717956118b4d223ceca17ce3865e2-Paper.pdf
[1801.01602] Principal component analysis for big data – arXiv, accessed April 9, 2025, https://arxiv.org/abs/1801.01602
Revisiting PCA for time series reduction in temporal dimension – arXiv, accessed April 9, 2025, https://arxiv.org/pdf/2412.19423?
Information Geometry and Manifold Learning: A Novel Framework …, accessed April 9, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC11763731/
Learning consensus representations in multi-latent spaces for multi-view clustering | Request PDF – ResearchGate, accessed April 9, 2025, https://www.researchgate.net/publication/380984497_Learning_consensus_representations_in_multi-latent_spaces_for_multi-view_clustering
[2502.01961] Hierarchical Consensus Network for Multiview Feature Learning – arXiv, accessed April 9, 2025, https://arxiv.org/abs/2502.01961
Batch-effect correction in single-cell RNA sequencing data using JIVE – Oxford Academic, accessed April 9, 2025, https://academic.oup.com/bioinformaticsadvances/article/4/1/vbae134/7756895
Cross-domain manifold structure preservation for transferable and cross-machine fault diagnosis – Extrica, accessed April 9, 2025, https://www.extrica.com/article/24067
A short introduction to Diffusion Maps – Stephan Osterburg, accessed April 9, 2025, https://www.stephanosterburg.com/an_introductio_to_diffusion_maps
Learning Structured Representations with Hyperbolic Embeddings – arXiv, accessed April 9, 2025, https://arxiv.org/html/2412.01023v1
Hyperbolic Representation Learning: Revisiting and Advancing, accessed April 9, 2025, https://proceedings.mlr.press/v202/yang23u/yang23u.pdf
Cross-Domain Classification Based on Frequency Component Adaptation for Remote Sensing Images – MDPI, accessed April 9, 2025, https://www.mdpi.com/2072-4292/16/12/2134
Variational Disentanglement for Domain Generalization – OpenReview, accessed April 9, 2025, https://openreview.net/pdf/66656bd8c4ad40cf45266d540ca16035c7828d44.pdf
[1901.09453] On Learning Invariant Representation for Domain Adaptation – arXiv, accessed April 9, 2025, https://arxiv.org/abs/1901.09453
Invariant Representation Learning in Multimedia Recommendation with Modality Alignment and Model Fusion – MDPI, accessed April 9, 2025, https://www.mdpi.com/1099-4300/27/1/56
Invariant Representation Learning for Treatment Effect Estimation*, accessed April 9, 2025, https://proceedings.mlr.press/v161/shi21a/shi21a.pdf
CRIC: A robust assessment for invariant representations – arXiv, accessed April 9, 2025, https://arxiv.org/html/2404.05058v1
Disentanglement with Factor Quantized Variational Autoencoders – arXiv, accessed April 9, 2025, https://arxiv.org/html/2409.14851v1
Disentangled Representation Learning – arXiv, accessed April 9, 2025, https://arxiv.org/html/2211.11695v4
Measuring Disentanglement: A Review of Metrics | Request PDF – ResearchGate, accessed April 9, 2025, https://www.researchgate.net/publication/365397188_Measuring_Disentanglement_A_Review_of_Metrics
A Review of Disentangled Representation Learning for Remote Sensing Data – SciOpen, accessed April 9, 2025, https://www.sciopen.com/article/10.26599/AIR.2022.9150012
arxiv.org, accessed April 9, 2025, https://arxiv.org/abs/2410.03056
High-Fidelity Synthesis with Disentangled Representation – ECVA | European Computer Vision Association, accessed April 9, 2025, https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123710154.pdf
Learning Disentangled Representation by Exploiting Pretrained Generative Models: A Contrastive Learning View – ar5iv, accessed April 9, 2025, https://ar5iv.labs.arxiv.org/html/2102.10543
[2208.04549] Disentangled Representation Learning Using ($β$-)VAE and GAN – arXiv, accessed April 9, 2025, https://arxiv.org/abs/2208.04549
Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies – NIPS papers, accessed April 9, 2025, http://papers.neurips.cc/paper/8193-life-long-disentangled-representation-learning-with-cross-domain-latent-homologies.pdf
[1808.06508] Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies – arXiv, accessed April 9, 2025, https://arxiv.org/abs/1808.06508
Inductive-Biases for Contrastive Learning of Disentangled Representations | OpenReview, accessed April 9, 2025, https://openreview.net/forum?id=QymmlaKpp_8
Cross-Domain Feature Augmentation for Domain Generalization – arXiv, accessed April 9, 2025, https://arxiv.org/html/2405.08586v1
[PDF] A Framework for the Quantitative Evaluation of Disentangled Representations, accessed April 9, 2025, https://www.semanticscholar.org/paper/A-Framework-for-the-Quantitative-Evaluation-of-Eastwood-Williams/adf2ac6b99b7d48b6a9c908532ca249de2cec3ae
Quantifying and Learning Disentangled Representations with Limited Supervision, accessed April 9, 2025, https://openreview.net/forum?id=YZ-NHPj6c6O
Quantifying and Learning Linear Symmetry-Based Disentanglement, accessed April 9, 2025, https://proceedings.mlr.press/v162/tonnaer22a.html
Quantifying and Learning Linear Symmetry-Based Disentanglement, accessed April 9, 2025, https://proceedings.mlr.press/v162/tonnaer22a/tonnaer22a.pdf
Defining and Measuring Disentanglement for non-Independent …, accessed April 9, 2025, https://openreview.net/forum?id=3Mq1tY75nv
Defining and Measuring Disentanglement for non-Independent Factors of Variation, accessed April 9, 2025, https://www.aimodels.fyi/papers/arxiv/defining-measuring-disentanglement-non-independent-factors-variation
Synthetic Differential Geometry in AI: A New Approach to Machine Learning (Mastering … – Amazon.com, accessed April 9, 2025, https://www.amazon.com/Synthetic-Differential-Geometry-AI-Mastering/dp/B0DHYCGJT2
Multiscale differential geometry learning of networks with applications to single-cell RNA sequencing data – PMC, accessed April 9, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC10965033/
Manifold Structure of Artificial and Biological Neural Networks – Math (Princeton), accessed April 9, 2025, https://web.math.princeton.edu/~lz1619/Files/Liu_Honors_Thesis.pdf
Persistent Homology for High-dimensional Data Based on Spectral Methods – arXiv, accessed April 9, 2025, https://arxiv.org/html/2311.03087v3
The Shape of Attraction in UMAP: Exploring the Embedding Forces in Dimensionality Reduction – arXiv, accessed April 9, 2025, https://arxiv.org/html/2503.09101v1
How UMAP Works — umap 0.5.8 documentation, accessed April 9, 2025, https://umap-learn.readthedocs.io/en/latest/how_umap_works.html
Comparison of Systems Using Diffusion Maps – Iowa State University, accessed April 9, 2025, https://home.engineering.iastate.edu/~ugvaidya/papers/Comaprison_diffusion.pdf
Information Geometry and Its Applications: an Overview – The Open University, accessed April 9, 2025, https://university.open.ac.uk/stem/mathematics-and-statistics/sites/www.open.ac.uk.stem.mathematics-and-statistics/files/files/1_IGAIA_an_overview.pdf
Topological Data Analysis with Persistent Homology | by Alexander Del Toro Barba (PhD), accessed April 9, 2025, https://medium.com/@deltorobarba/quantum-topological-data-analysis-the-most-powerful-quantum-machine-learning-algorithm-part-1-c6d055f2a4de
Topological data analysis – Wikipedia, accessed April 9, 2025, https://en.wikipedia.org/wiki/Topological_data_analysis
An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists – Frontiers, accessed April 9, 2025, https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2021.667963/full
Topological Data Analysis Approaches to Uncovering the Timing of Ring Structure Onset in Filamentous Networks – PubMed Central, accessed April 9, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC7811524/
Topological Data Analysis: Unveiling Hidden Patterns and Structures in Complex Datasets | by Siddhartha Pramanik | Medium, accessed April 9, 2025, https://medium.com/@siddharthapramanik771/topological-data-analysis-unveiling-hidden-patterns-and-structures-in-complex-datasets-1a6efa75ef5c
Evaluating State Space Discovery by Persistent Cohomology in the Spatial Representation System – Frontiers, accessed April 9, 2025, https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2021.616748/full
Topological Data Analysis Mastermath, accessed April 9, 2025, https://www.few.vu.nl/~botnan/lecture_notes.pdf
Protein-Folding Analysis Using Features Obtained by Persistent Homology – PMC, accessed April 9, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC7300307/
Consistent manifold representation for topological data analysis, accessed April 9, 2025, https://www.aimsciences.org/article/doi/10.3934/fods.2019001
Topological Data Analysis and Persistent Homology – – YouTube, accessed April 9, 2025, https://www.youtube.com/watch?v=9fL6K5SSYII
Tracking the topology of neural manifolds across populations – PNAS, accessed April 9, 2025, https://www.pnas.org/doi/10.1073/pnas.2407997121
Uncovering Hidden Dimensions in Brain Signals – Simons Foundation, accessed April 9, 2025, https://www.simonsfoundation.org/2019/11/11/uncovering-hidden-dimensions-in-brain-signals/
CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL DATA ANALYSIS Tyrus Berry and Timothy Sauer 1. Introduction. Building a discr – Mathematical Sciences, accessed April 9, 2025, https://math.gmu.edu/~berry/Publications/CkNN.pdf
[2412.01023] Learning Structured Representations with Hyperbolic Embeddings – arXiv, accessed April 9, 2025, https://arxiv.org/abs/2412.01023
Learning Structured Representations with Hyperbolic Embeddings – NIPS papers, accessed April 9, 2025, https://proceedings.neurips.cc/paper_files/paper/2024/file/a5d2da376bab7624b3caeb9f78fcaa2f-Paper-Conference.pdf
NeurIPS Poster Learning Structured Representations with Hyperbolic Embeddings, accessed April 9, 2025, https://neurips.cc/virtual/2024/poster/93170
Hyperbolic Neural Networks – NIPS papers, accessed April 9, 2025, http://papers.neurips.cc/paper/7780-hyperbolic-neural-networks.pdf
Continuous Hierarchical Representations with Poincaré Variational Auto-Encoders – NIPS papers, accessed April 9, 2025, http://papers.neurips.cc/paper/9420-continuous-hierarchical-representations-with-poincare-variational-auto-encoders.pdf
Hyperbolic Knowledge Transfer with Class Hierarchy for Few-Shot Learning – IJCAI, accessed April 9, 2025, https://www.ijcai.org/proceedings/2022/0517.pdf
Hyperbolic Contrastive Learning for Visual Representations Beyond Objects – CVF Open Access, accessed April 9, 2025, https://openaccess.thecvf.com/content/CVPR2023/papers/Ge_Hyperbolic_Contrastive_Learning_for_Visual_Representations_Beyond_Objects_CVPR_2023_paper.pdf
marlin-codes/Awesome-Hyperbolic-Representation-and-Deep-Learning – GitHub, accessed April 9, 2025, https://github.com/marlin-codes/Awesome-Hyperbolic-Representation-and-Deep-Learning
HypLL: The Hyperbolic Learning Library – arXiv, accessed April 9, 2025, https://arxiv.org/html/2306.06154v3
[2306.09118] Hyperbolic Representation Learning: Revisiting and Advancing – arXiv, accessed April 9, 2025, https://arxiv.org/abs/2306.09118
Differential Geometry in Computer Vision and Machine Learning | Frontiers Research Topic, accessed April 9, 2025, https://www.frontiersin.org/research-topics/17080/differential-geometry-in-computer-vision-and-machine-learning/magazine
algebraic topology – TDA and knot theory – Mathematics Stack Exchange, accessed April 9, 2025, https://math.stackexchange.com/questions/3267490/tda-and-knot-theory
CROSS-DOMAIN LEARNING METHODS FOR HIGH-LEVEL VISUAL CONCEPT CLASSIFICATION – Electrical Engineering, accessed April 9, 2025, https://www.ee.columbia.edu/~wjiang/references/jiangicip08.pdf
MTNet: A Neural Approach for Cross-Domain Recommendation with Unstructured Text – SIGKDD, accessed April 9, 2025, https://www.kdd.org/kdd2018/files/deep-learning-day/DLDay18_paper_5.pdf
Learning List-Level Domain-Invariant Representations for Ranking | OpenReview, accessed April 9, 2025, https://openreview.net/forum?id=m21rQusNgb
Cross-Domain Off-Policy Evaluation and Learning for Contextual Bandits – OpenReview, accessed April 9, 2025, https://openreview.net/forum?id=Z8dr422vtr
Cross-Domain Analysis of ML and DL: Evaluating their Impact in Diverse Domains, accessed April 9, 2025, https://www.ijisae.org/index.php/IJISAE/article/view/2951
Cross-Domain Evaluation of a Deep Learning-Based Type Inference System – arXiv, accessed April 9, 2025, https://arxiv.org/pdf/2208.09189
Non-Exemplar Domain Incremental Learning via Cross-Domain Concept Integration – European Computer Vision Association, accessed April 9, 2025, https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06534.pdf
arxiv.org, accessed April 9, 2025, https://arxiv.org/abs/2012.09276
NeurIPS Poster Enriching Disentanglement: From Logical Definitions to Quantitative Metrics, accessed April 9, 2025, https://neurips.cc/virtual/2024/poster/93305
Evaluation of metric and representation learning approaches: Effects of representations driven by relative distance on the performance – PMC, accessed April 9, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC10566582/
Learning Action-based Representations Using Invariance – arXiv, accessed April 9, 2025, https://arxiv.org/html/2403.16369v1
Learning Invariant Representations for Reinforcement Learning without Reconstruction, accessed April 9, 2025, https://openreview.net/forum?id=-2FCwDKRREu

Cross-Domain Feature Similarity Guided Blind Image Quality Assessment – PubMed Central, accessed April 9, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC8795631/