Implementation of Savitzky-Golay Smoothing in C++ with Eigen

2024-07-05C++,Eigen,Math,Signal Processing,Spectrum

One of the widely used methods for smoothing data containing noise, such as spectra, is the Savitzky-Golay method. I will introduce the implementation of this Savitzky-Golay method using C++ and Eigen.

The Savitzky-Golay Method

This is a data smoothing method introduced by Savitzky and Golay in 1964 [1]. For each set of $2N+1$ adjacent data points, the method fits a polynomial of degree $m$ and replaces the central value of the $2N+1$ points with the value of the approximating polynomial. This process helps to reduce noise in the data.

Essentially, this method is a weighted moving average, and the paper [1] provides the coefficients required for smoothing. It is a convenient method due to its low computational complexity and ease of use (although there are some errors in the original coefficient tables as mentioned in [2]).

In analytical chemistry, this method is widely used for smoothing raw data containing noise, especially in spectral analysis, making it a technique familiar to many.

Additionally, there is a less well-known smoothing method called the Whittaker Smoother, which is compared to the Savitzky-Golay method. If you are interested, you can check out the following entry for more information.

Features

  1. Despite the polynomial approximation, the coefficients are determined solely by the number of data points used for the approximation and the degree of the polynomial used for the approximation.
  2. It can calculate not only smoothing but also smoothed derivatives (when approximating with a polynomial of degree 3 or higher, second derivatives and above can also be calculated).
  3. Despite being a smoothing method, it tends to preserve peak shapes without significant broadening.
  4. Essentially, it is a weighted moving average, so the processing is fast.

Principle

First, let’s consider approximating a set of $2N+1$ equidistant data points with an $m$-degree polynomial $f(x) = a_{0} + a_{1}x + \cdots + a_{m}x^{m}$.

Since the data is uniformly spaced by $\Delta{x}$, we can replace the values of $x$ with the range from $-N\Delta{x}$ to $N\Delta{x}$.

In this case, if we consider approximating the data points $(x, y)$ pairs $(-N\Delta{x}, y_{-N}), \cdots, (0, y_{0}), \cdots, (N\Delta{x}, y_{N})$ with an $m$-degree polynomial, then the solution to the least squares method

$$
\displaystyle
S = \sum_{i=-N}^{N}(y_{i} – f(i\Delta{x}))^{2}
$$

is given by solving the equation

$$
\displaystyle
\begin{pmatrix}
1 & -N & \cdots & (-N)^{m} \\
\vdots & \vdots & & \vdots \\
1 & 0 & \ddots & 0 \\
\vdots & \vdots & & \vdots \\
1 & N & \cdots & N^{m}
\end{pmatrix}
\begin{pmatrix}
a_{0} \\
\vdots \\
a_{m}\Delta{x}^{m}
\end{pmatrix} =
\begin{pmatrix}
y_{-N} \\
\vdots \\
y_{0} \\
\vdots \\
y_{N} \\
\end{pmatrix}
$$

to obtain the coefficients. If we denote the left matrix as $X$, then the coefficients of the approximating polynomial $(a_{0}, \cdots, a_{m}\Delta{x}^{m})$ can be obtained as:

$$
\displaystyle
\begin{pmatrix}
a_{0} \\
\vdots \\
a_{m}\Delta{x}^{m}
\end{pmatrix} =
(X^{\mathrm{T}}X)^{-1}X^{\mathrm{T}}
\begin{pmatrix}
y_{-N} \\
\vdots \\
y_{0} \\
\vdots \\
y_{N} \\
\end{pmatrix}
$$

This is the essence of the Savitzky-Golay method: the actual replacement of values with the values of the approximating polynomial occurs at the center value of the $2N+1$ points, namely $y_{0}$, and the corresponding value of $f(x)$ is the value of $f$ evaluated at $x=0$, which is $f(0)=a_{0}$. Additionally, this $a_{0}$ can be calculated as the dot product of the first row of $(X^{\mathrm{T}}X)^{-1}X^{\mathrm{T}}$ and the vector $(y_{-N}, \cdots, y_{N})$. In fact, there is no need to solve this system of equations at all.

Moreover, since $(X^{\mathrm{T}}X)^{-1}X^{\mathrm{T}}$ is solely determined by the number of data points $N$ and the degree of the approximating polynomial $m$, it does not depend on the actual values of the data, and if the number of data points and the degree of the polynomial are the same, the same values can be used consistently.
Similarly, $a_{1}, a_{2}, \cdots$ can be calculated as well. Knowing that $f'(0) = a_{1}\Delta{x}$, $f"(0) = 2a_{2}\Delta{x}^{2}$, and so on, makes it easy to compute the derivatives of the data.

By the way, there are also methods available for cases where the spacing between data points is not uniform [3].

References

  1. Savitzky, A.; Golay, M. J. Smoothing And differentiation of Data by Simplified Least Squares Procedures. Anal. Chem.1964, 36(8), 1627-1639.
  2. Steinier, J.; Termonia, Y.; Deltour, J. Comments on Smoothing And Differentiation of Data by Simplified Least Square Procedure. Anal. Chem.1972, 44(11), 1906-1909.
  3. Gorry P. A. General Least-squares Smoothing And Differentiation of Nonuniformly Spaced Data by the Convolution Method. Anal. Chem.1991, 63(5), 534-536

Implementation using C++ and Eigen

Understanding the principle makes the implementation relatively straightforward. After preparing the matrix $X$, creating $(X^{\mathrm{T}}X)^{-1}X^{\mathrm{T}}$, and if you just need the coefficients for simple smoothing data, you can extract the first row to obtain the coefficients. If you want to find the smoothed data after $k$ times differentiation, you can extract the $(k+1)$-th row, multiply by $k!$, and then divide by $\Delta{x}^{k}$.

Example of Implemantation

Here is the function that calculates the coefficients. Please note that error checking is not included. The code is also available on GitHub. Feel free to check it out if you are interested.

#include <Eigen/Dense>

// Function to calculate the coefficients of the Savitzky-Golay method
//
// window   : Number of data points (2N+1 points)
// polyorder: Degree of polynomial
// derive   : Order of differentiation
// delta    : Data spacing
const Eigen::RowVectorXd CalcSavGolCoeff(
    const size_t window, const unsigned int polyorder, const unsigned derive, const double delta
)
{
    int n = (int)window / 2;
    double p = 1;

    // Generating the matrix X
    Eigen::VectorXd v = Eigen::VectorXd::LinSpaced(window, -n, n);
    Eigen::MatrixXd x = Eigen::MatrixXd::Ones(window, polyorder + 1);

    for(unsigned int i = 1; i <= polyorder; i++){
      x.col(i) = (x.col(i - 1).array() * v.array()).matrix();
    }

    // Calculating (X^T * X)^-1 * X^T
    Eigen::MatrixXd coeff_mat = (x.transpose() * x).inverse() * x.transpose();

    // Calculation of factorial
    if(derive > 0){
      for(unsigned int i = 1; i <= derive; i++){
        p *= i;
      }
      p /= std::pow(delta, derive)
    }

    return coeff_mat.row(derive) * p;
}

Example of Use

Here is an example of calculating the coefficients for smoothing with a 5th-degree polynomial for 9 data points.

int main()
{
    double delta = 0.01;
    auto a0 = CalcSavGolCoeff(9, 5, 0, delta);
    auto a1 = CalcSavGolCoeff(9, 5, 1, delta);
    auto a2 = CalcSavGolCoeff(9, 5, 2, delta);

    std::cout << "CalcSavGolCoeff(9, 5, 0, 0.01) = " << std::endl;
    std::cout << a0 << std::endl << std::endl;

    std::cout << "CalcSavGolCoeff(9, 5, 1, 0.01) = " << std::endl;
    std::cout << a1 << std::endl << std::endl;

    std::cout << "CalcSavGolCoeff(9, 5, 2, 0.01) = " << std::endl;
    std::cout << a2 << std::endl << std::endl;

    return 0;
}

Results

CalcSavGolCoeff(9, 5, 0, 0.01) = 
 0.034965 -0.128205 0.0699301  0.314685  0.417249  0.314685 0.0699301 -0.128205  0.034965

CalcSavGolCoeff(9, 5, 1, 0.01) = 
-2.96037  16.0956 -26.4452 -33.5548        0  33.5548  26.4452 -16.0956  2.96037

CalcSavGolCoeff(9, 5, 2, 0.01) = 
-734.266     2162  879.953  -1229.6 -2156.18  -1229.6  879.953     2162 -734.266

Operating Environment

  • OS
    1. Windows 10
      1. Microsoft C/C++ Optimizing Compiler 19.28.29337 for x64
      2. MinGW-W64 GCC version 8.1.0
    2. Linux
      1. GCC version 9.3.0
  • Eigen version 3.3.9

Example of Smoothing

I used the calculated coefficients to smooth pseudo-spectra containing noise and plotted the results. The graph was created using JavaScript’s Chart.js with the chartjs-plugin-zoom. You can zoom in and out on both the $x$-axis and $y$-axis using the mouse wheel, and slide by dragging.

The original data before and after smoothing can be found here.










Conclusion

I have introduced the widely used Savitzky-Golay method for smoothing, including its principles. With low computational complexity and the ability to preserve peak shapes, the Savitzky-Golay method is considered a very user-friendly technique.

C++,Eigen,Spectrum

Posted by izadori