Implementation of Whittaker Smoother in C++ with Eigen

2024-07-05C++,Math,Signal Processing,Spectrum

One widely used method for smoothing data containing noise such as spectra is the Savitzky-Golay method. Another powerful smoothing method similar to Savitzky-Golay is the Whittaker Smoother. I will introduce an implementation of Whittaker Smoother using C++ and Eigen.

Here is the information about the Savitzky-Golay method.

The Whittaker Smoother

The Whittaker Smoother is a smoothing method introduced by Whittaker in 1922[1]. After it was introduced by Eilers in a paper in 2003 [2] and applied to baseline estimation methods as part of the AsLS method, it gained attention as a baseline estimation method for spectral spectra and chromatograms. Various baseline estimation methods utilizing Whittaker Smoother have been developed since then.

This time, based on Eilers’ paper, I implemented the Whittaker Smoother using C++.

Features

  1. It can be achieved with relatively short code.
  2. It can smooth a large number of observations relatively quickly if it supports sparse matrix calculations.
  3. It can handle missing data.
  4. It allows for control over the smoothness.

Principle

Let’s denote the series of observed measurement data as $\boldsymbol{y} = (y_{1}, \ldots, y_{m})$. Since these values are typically contaminated with noise, the line formed by the data is not a smooth curve.

Next, consider a smooth data series $\boldsymbol{z} = (z_{1}, \ldots, z_{m})$, and think about fitting $\boldsymbol{z}$ to $\boldsymbol{y}$. To fit $\boldsymbol{z}$ to $\boldsymbol{y},$ two factors need to be considered: the goodness of fit of $\boldsymbol{z}$ to $\boldsymbol{y}$ and the roughness of $\boldsymbol{z}$.

The former can be represented by the sum of squared differences between $\boldsymbol{y}$ and $\boldsymbol{z}$, while the latter can be represented by the sum of squared $s$-th differences of $\boldsymbol{z}$, $\sum_{i}(\Delta^{s}z_{i})^{2}$. As $\boldsymbol{z}$ becomes smoother, it diverges from $\boldsymbol{y}$, so these two aspects become conflicting. In other words, it becomes a problem of penalized least squares.

$$
\displaystyle
Q = \sum_{i}w_{i}(y_{i} – z_{i})^{2} + \lambda\sum_{i}(\Delta^{s}z_{i})^{2}
\tag{1}
$$

Where, $w_{i}$ represents weights with $0 \leq w_{i} \leq 1$. It assigns values close to 0 or 0 for missing data or outliers. Additionally, $\lambda$ is the penalty coefficient. Increasing $\lambda$ makes the roughness term dominant, resulting in a smoother curve.

Expressing (1) in matrix and vector form yields (2).

$$
\displaystyle
Q = (\boldsymbol{y}-\boldsymbol{z})^{\mathrm{T}}W(\boldsymbol{y}-\boldsymbol{z}) + \lambda \boldsymbol{z}^{\mathrm{T}}(D_{s}^{\mathrm{T}}D_{s})\boldsymbol{z}
\tag{2}
$$

Where, $W$ is a diagonal matrix with weights $\boldsymbol{w} = (w_{1}, \ldots, w_{m})$ along the diagonal, and $D_{s}$ is a matrix such that $D_{s}\boldsymbol{z} = \Delta^{s}\boldsymbol{z}$. For example, when $m = 5$ and $s = 1$,

$$
\displaystyle
D_{s} = \left(
\begin{array}{rrrrr}
-1 & 1 & 0 & 0 & 0 \\
0 & -1 & 1 & 0 & 0 \\
0 & 0 & -1 & 1 & 0 \\
0 & 0 & 0 & -1 & 1
\end{array}
\right)
$$

when $m = 7$ and $s = 2$,

$$
\displaystyle
D_{s} = \left(
\begin{array}{rrrrrrr}
1 & -2 & 1 & 0 & 0 & 0 & 0 \\
0 & 1 & -2 & 1 & 0 & 0 & 0 \\
0 & 0 & 1 & -2 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 & -2 & 1 & 0 \\
0 & 0 & 0 & 0 & 1 & -2 & 1
\end{array}
\right)
$$

when $m = 9$ and $s = 3$,

$$
\displaystyle
D_{s} = \left(
\begin{array}{rrrrrrrrr}
-1 & 3 & -3 & 1 & 0 & 0 & 0 & 0 & 0 \\
0 & -1 & 3 & -3 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & -1 & 3 & -3 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & -1 & 3 & -3 & 1 & 0 & 0 \\
0 & 0 & 0 & 0 & -1 & 3 & -3 & 1 & 0 \\
0 & 0 & 0 & 0 & 0 & -1 & 3 & -3 & 1
\end{array}
\right)
$$

By taking the partial derivative of equation (2) with respect to $\boldsymbol{z}$ and setting it to $0$, we get:

$$
\frac{\partial Q}{\partial \boldsymbol{z}} = -2W(\boldsymbol{y} – \boldsymbol{z}) + 2 \lambda(D_{s}^{\mathrm{T}}D_{s})\boldsymbol{z} = 0
$$

$$
(W + \lambda D_{s}^{\mathrm{T}} D_{s}) \boldsymbol{z} = W \boldsymbol{y}
$$

Therefore,

$$
\boldsymbol{z} = (W + \lambda D_{s}^{\mathrm{T}} D_{s})^{-1} W \boldsymbol{y}
\tag{3}
$$

Whittaker’s original method corresponds to the case where $s = 3$, but it is also possible to consider the cases of $s = 1$ and $s = 2$. When $s = 1$, it is known as Bohlmann Smoothing, when $s = 2$, it is known as Hodrick-Prescott Smoothing, and when $s = 3$, it is sometimes referred to as Whittaker-Henderson Smoothing [3]. Here, we will collectively refer to these methods as the Whittaker Smoother.

According to Eilers, the method often works well when $s = 2$.

References

  1. Whittaker, E. T. On A New Method of Graduation. Proc. Edinburgh Math. Soc.1922, 41, 63-75.
  2. Eilers, P. H. C. A Perfect Smoother. Anal. Chem.2003, 75(14), 3631–3636.
  3. Orfanidis S. J. Whittaker-Henderson Smoothing, 2018. Applied Optimum Signal Processing. http://eceweb1.rutgers.edu/~orfanidi/aosp/ (accessed Mar 24, 2022)

Implementation using C++ and Eigen

We will implement the Whittaker Smoother in C++ using Eigen. Before that, Eigen does not have a function to create the matrix $D_{s}$. In Matlab, Python, and R, there is a function called diff() that serves a similar purpose, so we need to implement this as well.

I have created a function equivalent to Matlab’s diff() in the following entry, so please take a look:
Implementation of MATLAB-like diff() in C++ using Eigen

We will implement this assuming that diff.h, which was created here, is placed in the same directory. Please note that error checking is not included. The code is also available on GitHub, so feel free to check it out if you are interested.

Example of Implementation

  • whittaker.h

    #ifndef __EIGEN_WHITTAKER_H__
    #define __EIGEN_WHITTAKER_H__
    
    #include <cassert>
    #include "diff.h"
    
    // Perform the Whittaker Smoother
    //
    // y      : Actual observed data
    // w      : Weight vector corresponding to each point (0 <= w <= 1)
    // lambda : Penalty coefficient for smoothing
    // s      : Order of differences (s = 1, 2, 3)
    const Eigen::VectorXd WhittakerSmoother(
      const Eigen::VectorXd & y, const Eigen::VectorXd & w, const double lambda, const unsigned int s
    );
    
    #endif // __EIGEN_WHITTAKER_H__
  • whittaker.cpp

    #include "whittaker.h"
    
    const Eigen::VectorXd WhittakerSmoother(
      const Eigen::VectorXd & y, const Eigen::VectorXd & w, const double lambda, const unsigned int s
    )
    {
      size_t m = y.size();
    
      Eigen::VectorXd z;
      Eigen::SparseMatrix<double> I, W, D;
      Eigen::SimplicialCholesky< Eigen::SparseMatrix<double> > solver;
    
      I.resize(m, m);
      I.setIdentity();
      D = Diff(I, s);
      D = lambda * (D.transpose() * D);
      W = w.asDiagonal();
    
      // Since W + lambda * D' * D is a symmetric matrix,
      // we solve the system of equations using Cholesky decomposition.
      solver.compute(W + D);
    
      assert(solver.info() == Eigen::Success);
    
      // Solve the equation (W + lambda * D' * D) * z = W * y
      z = solver.solve(W * y);
    
      assert(solver.info() == Eigen::Success);
    
      return z;
    }

Example of Use

I will show an example of using the above code. It reads a tab-separated text file specified on the command line, performs smoothing using the Whittaker Smoother, and outputs the results along with the original data to the standard output.

#include <fstream>
#include <iostream>
#include <string>
#include <vector>

#include "whittaker.h"

int main(int argc, char * argv[])
{
    double tx, ty, lambda;
    std::string header_x, header_y;
    std::vector<double> x, y, whittaker_y;

    if(argc < 2){
        return 1;
    }

    std::fstream fin;

    fin.open(argv[1]);

    if(!fin){
        std::cerr << "file not open." << std::endl;
        return 1;
    }

    fin >> header_x >> header_y;

    while(1){
        fin >> tx >> ty;
        if(fin.eof()){
          break;
        }
        x.push_back(tx);
        y.push_back(ty);
    }

    fin.close();

    Eigen::VectorXd vy = Eigen::Map<Eigen::VectorXd>(y.data(), (int)y.size());
    Eigen::VectorXd w  = Eigen::VectorXd::Ones((int)y.size());
    lambda = 1;

    Eigen::VectorXd vz1 = WhittakerSmoother(vy, w, lambda, 1);
    Eigen::VectorXd vz2 = WhittakerSmoother(vy, w, lambda, 2);
    Eigen::VectorXd vz3 = WhittakerSmoother(vy, w, lambda, 3);

    std::cout << header_x << "\t" << header_y;
    std::cout << "\ts=1\ts=2\ts=3" << std::endl;

    for(size_t i = 0; i < x.size(); i++){
        std::cout << x[i] << "\t" << y[i] << "\t";
        std::cout << vz1(i) << "\t" << vz2(i) << "\t" << vz3(i) << std::endl;
    }

    return 0;
}

Operating Environment

  • OS
    1. Windows 10
      1. Microsoft C/C++ Optimizing Compiler 19.28.29337 for x64
      2. MinGW-W64 GCC version 8.1.0
      3. Clang version 11.0.1
    2. Linux
      1. GCC version 9.3.0
  • Eigen version 3.3.9

Example of Smoothing

I used the above code to smooth a noise-including pseudo-spectrum and visualized the output data. The graph is created using JavaScript’s Chart.js with the chartjs-plugin-zoom. You can zoom in or out on the $x$-axis and $y$-axis by rotating the mouse wheel, and slide by dragging. Additionally, you can toggle the visibility of each curve by clicking on the legend labels.

The smoothed original data can be found here.








Conclusion

I have introduced the principle of the Whittaker Smoother. While it is not as well-known as the Savitzky-Golay method, it is a relatively powerful smoothing technique that can be implemented with relatively short code to achieve similar results.

C++,Eigen,Spectrum

Posted by izadori