Eigen: A Beginner’s Guide to the C++ Linear Algebra Library

Why Eigen?

Header-only and lightweight: integrating Eigen requires only adding headers; no separate compilation step or linking.
Template-based and expressive: operations use intuitive operator syntax (e.g., A * x), but compile-time types allow heavy optimization.
High performance: Eigen implements expression templates, vectorization (SIMD), cache-aware algorithms, and multi-threading (via OpenMP or internal mechanisms) to achieve competitive speeds.
Flexible: supports dense and sparse matrices, fixed-size and dynamic-size matrices, and a rich set of decompositions and solvers.

Basics: Types, Construction, and Access

Eigen’s core types are Matrix and Array templates. The most common alias for a dynamic dense matrix is Eigen::MatrixXd; for a column vector, Eigen::VectorXd.

Example:

#include <Eigen/Dense> using namespace Eigen; int main() {   MatrixXd A(3,3);   VectorXd b(3);   A << 1,2,3,        4,5,6,        7,8,10;   b << 3, 6, 9;   VectorXd x = A.colPivHouseholderQr().solve(b); }

Key points:

Matrix — you usually use MatrixXd (dynamic) or Matrix3d (fixed 3×3 double).
Array provides element-wise operations; Matrix provides linear-algebra semantics.
Access with parentheses: A(i,j). Use .col(), .row(), .block(), .segment() for subviews; these return lightweight expressions (no copy until needed).

Expression Templates and Lazy Evaluation

Eigen uses expression templates: operations like A + B produce an expression object; evaluation is delayed until assigned or explicitly evaluated. This avoids temporaries and enables loop fusion.

Example of loop fusion:

C = A + B + D; // fused - no intermediate temporaries

But some operations force temporaries (e.g., when sizes mismatch or when assigning to a submatrix). Use .eval() to force evaluation when needed.

Fixed-size vs Dynamic-size: Choose Wisely

Fixed-size matrices (e.g., Matrix3d, Vector4f) allow the compiler to optimize aggressively and unroll loops. Use them when sizes are known at compile time.
Dynamic-size (MatrixXd) is flexible but involves heap allocation and runtime checks.

Rule of thumb:

For small matrices (<= ~16 elements), prefer fixed-size for speed.
For large matrices, dynamic is necessary; focus on blocking and memory layout.

Memory Layout and Alignment

Eigen stores matrices in column-major order by default (like Fortran/Matlab). You can request row-major with the template option.

Column-major is optimal for column-wise operations (e.g., solving Ax=b).
For interop with libraries expecting row-major (e.g., some BLAS implementations), consider Eigen::RowMajor.

Alignment and vectorization:

Eigen aligns data to leverage SIMD. Enable preprocessor macro EIGEN_VECTORIZE (usually on by default) and ensure your compiler supports SSE/AVX.
For dynamic allocations, Eigen aligns allocations to ⁄₃₂-byte boundaries. For custom new/delete or embedded devices, ensure proper alignment.
Use EIGEN_DONT_ALIGN_STATICALLY if alignment causes issues (e.g., embedding Eigen types in packed structs), but note this may hurt performance.

Vectorization and PacketMath

Eigen implements “packet” operations for vectorized math. To ensure vectorization:

Compile with optimization flags (e.g., -O3).
Enable architecture-specific flags: -march=native or -msse4.2 -mavx, depending on target.
Use fixed-size small matrices when possible—vectorized code benefits most there.

Detect vectorization with:

EIGEN_VECTORIZE defined at compile time.
Runtime perf testing (benchmark small matrix multiplies).

Avoiding Unnecessary Copies

Common pitfalls:

Returning large Eigen objects by value can cause copies; prefer return by value with NRVO/RVO or move semantics (modern compilers optimize RVO).
Avoid creating temporaries in loops:

Bad:

for (int i=0;i<N;++i) {   y = A * x; // if A or x change, or y is reused, may allocate repeatedly }

Good:

Precompute static parts outside loops, reuse buffers, and use in-place operations.

Use .noalias() for assigning products to avoid creating temporaries when Eigen can’t prove non-aliasing:

C.noalias() = A * B;

Blocking and Cache-Friendly Algorithms

For large matrices, memory bandwidth and cache misses dominate runtime. Techniques:

Use blocking/tiled multiplication: Eigen internally applies blocking, but custom algorithms benefit from processing contiguous blocks.
Prefer column-major access patterns for column-major matrices to keep memory access sequential.
Use .block(i,j,rows,cols) to operate on sub-blocks without copying (returns expressions).

Example: Multiply large matrices with block loops to improve cache reuse.

Multi-threading: Parallelizing Operations

Eigen supports multi-threaded operations through:

Internal thread pool for some operations (e.g., large matrix products) in recent versions.
OpenMP: compile with -fopenmp and enable Eigen’s OpenMP support.
Explicit threading: split work across threads and use .noalias() to avoid data races.

Control threads with:

EIGEN_DONT_PARALLELIZE to disable.
Eigen::setNbThreads(n) when using Eigen’s internal thread pool.

Note: Threading overhead matters; parallelize only when work per thread is substantial.

Sparse Matrices and Solvers

Eigen’s SparseMatrix and related solvers (SimplicialLDLT, SparseLU, ConjugateGradient, BiCGSTAB) are useful for large, sparse systems.

Tips:

Construct sparse matrices using triplet lists (Eigen::Triplet) and then setFromTriplets().
Use appropriate solver based on matrix properties (SPD vs general).
Preconditioners (IncompleteCholesky, DiagonalPreconditioner) can dramatically speed iterative solvers.

Example:

typedef SparseMatrix<double> SpMat; std::vector<Triplet<double>> triplets; // fill triplets... SpMat A(n,n); A.setFromTriplets(triplets.begin(), triplets.end()); ConjugateGradient<SpMat, Lower|Upper, IncompleteCholesky<double>> cg; cg.compute(A); VectorXd x = cg.solve(b);

Numerical Stability and Decompositions

Choose decompositions based on matrix properties:

Use LU (FullPivLU, PartialPivLU) for general matrices.
Use Cholesky (LLT, LDLT) for symmetric positive-definite matrices—faster and more stable.
Use QR (HouseholderQR, ColPivHouseholderQR) for least squares and rank-revealing needs.
Eigen’s SelfAdjointEigenSolver for symmetric eigenproblems and EigenSolver for general eigenproblems.

Always check .info() on solvers for success and consider scaling/conditioning if results are unstable.

Interoperability with BLAS/LAPACK

For very large dense problems, a tuned BLAS/LAPACK (OpenBLAS, Intel MKL) may outperform Eigen for some operations. Eigen can interoperate:

Use Eigen’s built-in wrappers or convert matrices to raw pointers and call BLAS/LAPACK.
Alternatively, link Eigen with MKL’s vectorized and threaded kernels by using MKL’s BLAS for heavy linear algebra, though this requires careful data layout and copying.

Practical Examples and Micro-optimizations

Fast small matrix multiply:

Use fixed-size matrices (Matrix3d, Matrix4f) and let compiler unroll loops.
Prefer stack allocation for tiny matrices to avoid heap overhead.

In-place arithmetic and reduction of temporaries:

Use .transposeInPlace(), .conservativeResize(), and .swap() to avoid allocations.

Avoid expressions that force copies:

Functions taking Eigen objects by value can cause copies. Prefer const references or templates taking Eigen expressions:


template<typename DerivedA, typename DerivedB> auto add(const Eigen::MatrixBase<DerivedA>& A,      const Eigen::MatrixBase<DerivedB>& B) { return A + B; // returns expression; evaluated by caller }

Profiling:

Use perf, VTune, or simple timing with std::chrono to identify hotspots.
Look at compiler optimization reports and assembly when needed.

Common Gotchas

Mixing row-major and column-major unexpectedly can cause slowdowns.
Forgetting .noalias() on large products can double memory traffic.
Using dynamic-size matrices in tight inner loops for small sizes.
Misaligned data on platforms requiring specific alignment for SIMD.

Checklist: Quick Performance Tips

Prefer fixed-size types for small matrices.
Compile with optimization flags and target CPU (e.g., -O3 -march=native).
Use .noalias() for heavy matrix products when safe.
Reuse buffers and avoid temporaries (.eval() when needed).
Respect memory layout (column-major by default).
Use appropriate decompositions for numerical stability.
Use sparse structures and preconditioners for large sparse systems.
Profile before optimizing; measure gains after each change.

Conclusion

Eigen combines expressive syntax with high performance, but achieving optimal speed requires attention to types, memory layout, vectorization, and avoiding temporaries. Start with clean, readable code, then profile and apply the targeted tips above—fixed-size matrices, .noalias(), blocking, and proper compiler flags—where they matter. With these techniques you can master Eigen for both correctness and speed.