Dyno Plot smoothing for the mathmatically inclined.
Thread Starter
Evolved Member
iTrader: (6)
Joined: May 2004
Posts: 2,502
Likes: 1
From: Tennessee, USA
Dyno Plot smoothing for the mathematically inclined.
When building datalogging and engine managment software, I came across the odd problem of how to properly smooth/interpoloate/visual-enhance dyno plots. Most of the software that does this kind of stuff is closed source and not particularly interested in sharing their algorithms.
Being a trained mathematician and computer programmer myself, I decided to just come up with my own solution. I'm no expert in numerical analysis, but I think my findings are pretty interesting by themselves. For a while, I thought I had a pretty good solution using what I affectionately call my "box" method. Recently I've come across a much better smoothing filter that originates in a 1964 Analytical Chemistry paper by Savitzky and Golay.
Brief background: I first became aware of smoothing issues when I took a look at the dyno generation spreadsheet that dustin@vishnu created last spring. I noticed that he was smoothing the plots by keeping a running average (called a moving window in math literature). This has the undesirable effect of pushing the peaks left or right, depending on which way your place your window. In other words, you might have power peak at 6500 rpm, but after smoothing it could appear at 6700 -- that's bad juju.
The "box" method I came up with works as the following. You pick four points in your plot, then build the smallest box that contains them all. Then pick the center of the box as a new point in your plot. Iterate...This works fairly well, but can move your peaks up or down -- worse juju.
Thirdly there is a method related to the running average where you place the window on _both_ sides of your target point. I call it Nearest Neighbor Averaging -- NNA.
Fourth is the Savitzky-Golay algorithm. I believe it's quite superior. It correctly identifies peaks and dips, but filters out the noise. It's inner workings can be a bit complicated, but I can decribe them if people are interested. It's also very fast.
Anyway, I've built all 4 of these and added them to my Xede software. As a first example, let's see how they do at approximating a noisy line.
Here is the NNA method:

Here's the "box" method:

Here's the Running Average method:

And finally here's Savitzky-Golay:

Although the endpoints are messy, notice how much better the curve features can be identified.
d
EDIT: doh, I mispelled mathematically.
Being a trained mathematician and computer programmer myself, I decided to just come up with my own solution. I'm no expert in numerical analysis, but I think my findings are pretty interesting by themselves. For a while, I thought I had a pretty good solution using what I affectionately call my "box" method. Recently I've come across a much better smoothing filter that originates in a 1964 Analytical Chemistry paper by Savitzky and Golay.
Brief background: I first became aware of smoothing issues when I took a look at the dyno generation spreadsheet that dustin@vishnu created last spring. I noticed that he was smoothing the plots by keeping a running average (called a moving window in math literature). This has the undesirable effect of pushing the peaks left or right, depending on which way your place your window. In other words, you might have power peak at 6500 rpm, but after smoothing it could appear at 6700 -- that's bad juju.
The "box" method I came up with works as the following. You pick four points in your plot, then build the smallest box that contains them all. Then pick the center of the box as a new point in your plot. Iterate...This works fairly well, but can move your peaks up or down -- worse juju.
Thirdly there is a method related to the running average where you place the window on _both_ sides of your target point. I call it Nearest Neighbor Averaging -- NNA.
Fourth is the Savitzky-Golay algorithm. I believe it's quite superior. It correctly identifies peaks and dips, but filters out the noise. It's inner workings can be a bit complicated, but I can decribe them if people are interested. It's also very fast.

Anyway, I've built all 4 of these and added them to my Xede software. As a first example, let's see how they do at approximating a noisy line.
Here is the NNA method:

Here's the "box" method:

Here's the Running Average method:

And finally here's Savitzky-Golay:

Although the endpoints are messy, notice how much better the curve features can be identified.
d
EDIT: doh, I mispelled mathematically.
Last edited by donour; Jul 6, 2005 at 12:58 PM. Reason: grammar
EDIT: doh, I mispelled mathematically.
Keep up the good work man, IM me on AIM when you get a chance, I have an idea for a few of the things we were talking about.
BTW, we call it the SMART xede for a reason.
LOL Is that algorithm you found the "scatter averaging" method I had been researching?
FWIW, My spelling and grammar are pretty atrocious. Yet for some reason people seem to think I have a clue.. LOL
FWIW, My spelling and grammar are pretty atrocious. Yet for some reason people seem to think I have a clue.. LOL
Last edited by MalibuJack; Jul 6, 2005 at 01:29 PM.
I also think that last algorithm is the same one used in graphics imaging for enhancing images and reducing abhorrant noise artifacts..
If you can e-mail me some info on that algorithm, I actually am writing a filter for photoshop that is supposed to do the same thing..
If you can e-mail me some info on that algorithm, I actually am writing a filter for photoshop that is supposed to do the same thing..
Last edited by MalibuJack; Jul 6, 2005 at 01:37 PM.
Thread Starter
Evolved Member
iTrader: (6)
Joined: May 2004
Posts: 2,502
Likes: 1
From: Tennessee, USA
Originally Posted by MalibuJack
LOL Is that algorithm you found the "scatter averaging" method I had been researching?
Numerical recipes gives the method various names: Savitzky-Golay, least-squares interpolation, and (EDIT
DISPOUsing a running average, bias is introduced if the second derivative of your curve is nonzero. From numerical recipes, "The idea of savitzky-golay filtering is to find filter coefficients c_n that preserve higher moments." In other words, it does moving window approximation, but instead of using a constant (averaging) it uses high order polynomials (quartics I believe).
EDIT: Oops, forgot some things. SavGol (as I've taken to calling it), requires that the data points be evenly spaced. In addition, the numerical recipes code requries the datasize to be a power of 2. So, I have to do some preprocessing to get the data in something that the filter can handle. It's a real pain.
The nice thing though is that the SavGol method only calculates the coefficients. It doesn't apply the filter. This means you can generate a filter and apply it to a bunch of plots very quickly. Note almost every dyno run will be 1024 samples since that is the closest power of 2. 512 samples is too few and it would take over twenty seconds to collect 2048 samples!
d
Last edited by donour; Jul 6, 2005 at 01:42 PM.
Okay EEK! 2000 lines of code! I think I get what your describing for that one.. your discarding values which are way out of the range, but still allowing the weighting to correctly bias the curve without artificially smoothing it. But what your describing might work really well in digital imaging.
Thread Starter
Evolved Member
iTrader: (6)
Joined: May 2004
Posts: 2,502
Likes: 1
From: Tennessee, USA
Originally Posted by MalibuJack
Okay EEK! 2000 lines of code!
Let's put it this way. Numerical recipes required.
1) their own vector management system
2) LU decomposition
3) FFTs
4) custom convolution function
I think I get what your describing for that one.. your discarding values which are way out of the range, but still allowing the weighting to correctly bias the curve without artificially smoothing it.
Voila:
http://www.ma.utexas.edu/documentati...fpdf/f14-8.pdf
But what your describing might work really well in digital imaging.
http://research.microsoft.com/users/...Gol/SavGol.htm
Trending Topics
Thread Starter
Evolved Member
iTrader: (6)
Joined: May 2004
Posts: 2,502
Likes: 1
From: Tennessee, USA
Another example
My first example was only a noisy line (y=mx+b). While savgol worked best, all of the methods provided fairly good approximations. Now, lets try it with something that has nonzero second derivatives -- sinusoidal curves.
The following plots have three data sets. The first is the raw data. The second is the original curve used to generate the data. The third is the curve fit by the various methods.
First is NNA:

Next is the "box" method:

Thirdly we have the running average:

And finally Savitzky-Golay:

The first time I tested this code, I was amazed at how much better SavGol preserves the peak values and locations. Whose your daddy?
d
The following plots have three data sets. The first is the raw data. The second is the original curve used to generate the data. The third is the curve fit by the various methods.
First is NNA:

Next is the "box" method:

Thirdly we have the running average:

And finally Savitzky-Golay:

The first time I tested this code, I was amazed at how much better SavGol preserves the peak values and locations. Whose your daddy?

d
Thanks for the link, reading the PDF helped alot, it also very clearly shows how its bias towards smoothing (losing data) on short duration, high peaks can be used very effectively as an imaging noise filter. This will help my work alot.. though it may require a bit of thought how to preserve intentional contrast.
Thread Starter
Evolved Member
iTrader: (6)
Joined: May 2004
Posts: 2,502
Likes: 1
From: Tennessee, USA
Originally Posted by MalibuJack
Thanks for the link, reading the PDF helped alot, it also very clearly shows how its bias towards smoothing (losing data) on short duration, high peaks can be used very effectively as an imaging noise filter. This will help my work alot.. though it may require a bit of thought how to preserve intentional contrast.
I _still_ haven't had a chance to look at those UTEC logs that you sent me. It's down in the todo list somewhere. I'm afraid it's going to get push off again, as I'm going to have new car parts and a SMART package to play with in a week or two.
d
Dude!
For the NNA, it looks like at the endpoints you are averaging in a bunch of fictitious 0.0's and that you also have some indexing problem that shifts everything off to the side.
See the attached for a simple +/-14 (i.e. 29 point) NNA does.
Also, it looks like your in your Savitzky-Golay algo it is assuming somehow that the endpoint values are 0.0 so it would do a bunch better by:
1) doing a simple pre-processing step that subtracts off a simple linear fit based only on some reasonable value for the endpoints (could even be just the 1st and last data points)
2) doing your Savitzky-Golay
3) adding back the same linear fit you subtracted off in step 1
For the NNA, it looks like at the endpoints you are averaging in a bunch of fictitious 0.0's and that you also have some indexing problem that shifts everything off to the side.
See the attached for a simple +/-14 (i.e. 29 point) NNA does.
Also, it looks like your in your Savitzky-Golay algo it is assuming somehow that the endpoint values are 0.0 so it would do a bunch better by:
1) doing a simple pre-processing step that subtracts off a simple linear fit based only on some reasonable value for the endpoints (could even be just the 1st and last data points)
2) doing your Savitzky-Golay
3) adding back the same linear fit you subtracted off in step 1
Thread Starter
Evolved Member
iTrader: (6)
Joined: May 2004
Posts: 2,502
Likes: 1
From: Tennessee, USA
Originally Posted by barney malone
Dude!
For the NNA, it looks like at the endpoints you are averaging in a bunch of fictitious 0.0's and that you also have some indexing problem that shifts everything off to the side.
See the attached for a simple +/-14 (i.e. 29 point) NNA does.
For the NNA, it looks like at the endpoints you are averaging in a bunch of fictitious 0.0's and that you also have some indexing problem that shifts everything off to the side.
See the attached for a simple +/-14 (i.e. 29 point) NNA does.
Also, it looks like your in your Savitzky-Golay algo it is assuming somehow that the endpoint values are 0.0 so it would do a bunch better by:
1) doing a simple pre-processing step that subtracts off a simple linear fit based only on some reasonable value for the endpoints (could even be just the 1st and last data points)
2) doing your Savitzky-Golay
3) adding back the same linear fit you subtracted off in step 1
1) doing a simple pre-processing step that subtracts off a simple linear fit based only on some reasonable value for the endpoints (could even be just the 1st and last data points)
2) doing your Savitzky-Golay
3) adding back the same linear fit you subtracted off in step 1
d
donour: One of my points is that other methods, that I've tried, shift the peaks around. My window size for that NNA is only 4 points. The reason you can find a "shift" which puts the interpolation back on the curve is because you know where the original location is. If you _know_ the underlying function, you can do a least squares fit or something and get a much better approximation. I have been unable to properly formulate the power/torque curve of a modern automobile -- at least in any kind of closed form.
It is totally obvious from inspection that neither of your "NNA" plots is using 4 X_consecutive points or even +/-4 X_points. I would guess your are using more like +/-50 (or 101 total) points around each point and are shifting your results by that same 50 points.
Did you look at my plot? I did not use any a priori info. I just did a simple bonehead average of the 29 Y_values (14 on either side and from the "X_target" point itself) around the noisy data shown to get my "NNA" plot.
Maybe state again what the heck you are doing, because it does not seem to be
donour: Thirdly there is a method related to the running average where you place the window on _both_ sides of your target point. I call it Nearest Neighbor Averaging -- NNA.
Thread Starter
Evolved Member
iTrader: (6)
Joined: May 2004
Posts: 2,502
Likes: 1
From: Tennessee, USA
Originally Posted by barney malone
Huh?
It is totally obvious from inspection that neither of your "NNA" plots is using 4 X_consecutive points or even +/-4 X_points. I would guess your are using more like +/-50 (or 101 total) points around each point and are shifting your results by that same 50 points.
Did you look at my plot? I did not use any a priori info. I just did a simple bonehead average of the 29 Y_values (14 on either side and from the "X_target" point itself) around the noisy data shown to get my "NNA" plot.
Maybe state again what the heck you are doing, because it does not seem to be
It is totally obvious from inspection that neither of your "NNA" plots is using 4 X_consecutive points or even +/-4 X_points. I would guess your are using more like +/-50 (or 101 total) points around each point and are shifting your results by that same 50 points.
Did you look at my plot? I did not use any a priori info. I just did a simple bonehead average of the 29 Y_values (14 on either side and from the "X_target" point itself) around the noisy data shown to get my "NNA" plot.
Maybe state again what the heck you are doing, because it does not seem to be
Also, the examples I've posted so far have fairly mild noise. The curve shape is still very obvious. Try it with raw RPM sampled (EDIT: ) Dyno data.

d
ps - I appreciate your directed criticisim though. I definitely don't think I have any perfect solutions.
Last edited by donour; Jul 7, 2005 at 12:53 PM.
Thread Starter
Evolved Member
iTrader: (6)
Joined: May 2004
Posts: 2,502
Likes: 1
From: Tennessee, USA
Here's a plot of raw dyno samples, which I've found to be quite a bit more difficult to fit correctly.

Here's a comparison of how NNA and savgol do. The first run (1) is NNA, the second (3) is savgol:

That was dustin's car which should have a very stock evo-like poweband. What I don't like about NNA is how resistant it is to sudden peaks or dips.
Perhaps there is a better windowsize to pick for this particular example. Still, a moving window average will have the tendancy to push your points "up" in the direction that you iterate. Here, you can see that the hp level appears artificially high. It even appears to keep going after power is pulled (a little before 7k rpm if memory serves).
d

Here's a comparison of how NNA and savgol do. The first run (1) is NNA, the second (3) is savgol:

That was dustin's car which should have a very stock evo-like poweband. What I don't like about NNA is how resistant it is to sudden peaks or dips.
Perhaps there is a better windowsize to pick for this particular example. Still, a moving window average will have the tendancy to push your points "up" in the direction that you iterate. Here, you can see that the hp level appears artificially high. It even appears to keep going after power is pulled (a little before 7k rpm if memory serves).
d
Last edited by donour; Jul 7, 2005 at 01:06 PM. Reason: grammar



