HydroComp Web Log

[efficiency in motion for over 20 years]


 

Neural nets and genetic algorithms
Miscellaneous Musings from the Technical Director

Monday, 9-JUL-2007 by Donald MacPherson - Technical Director

I received an email from one of our software customers mentioning a new technical paper about the use of neural network (NN) techniques for the prediction of residuary resistance. This is something that I have discussed in a past blog article (Top Ten Prognostications for 2007), and that we have internally investigated for almost 10 years. The focus of my prior blog commentary was about using NN and genetic algorithms (GA) for hydrodynamic optimization. In this new paper, however, the authors were presenting the use of NN and GA as an alternative to multiple regression of empirical test data. (There have been a few such papers in recent years.) The authors' contention was that their NN-based dimensional analysis resulted in improved quality of the prediction. Let me tell you why the authors are, at the same time, correct and yet also very, very wrong.

Whether the proposed NN approach actually produces improved quality is directly related to the definition of "quality". In that technical paper, the definition of quality was how well the prediction algorithm actually fared as compared to the original data set. This may seem like a reasonable definition - and it is, if the scope of interest were limited to the statistical analysis alone. For me, however, the term "quality" - as it pertains to resistance prediction - is how well a prediction algorithm fares against the broad landscape of real boats (perhaps I should say "seascape"?). But isn't this the same thing? Actually, no... And here is why.

1. Multiple regression and NN development of prediction coefficients are, for all intents and purposes, the same thing. Of course, the path to the coefficients use different techniques and strategies. One is typically some polynomial regression and the other an equation that is developed from the NN architecture. In the paper's example, the NN solution produced an almost precise recreation of the original data, where the widely-used published algorithm had scatter in the data. You can typically accomplish the same thing by increasing the polynomial order.

2. Scatter is expected in empirical data. As I said, for me, the purpose of a prediction is to evaluate real boats, not to exactly match the original test data. All physical testing has "noise". Anyone that has conducted model testing can tell you that the force measurement response is not a smooth function. It is typically some oscillating shape that has to be evaluated over a period of stability, at which point some average force can be determined. In addition to this potential interpretation in the recording of the force, the timing and order of test runs also can influence the force measurements. It is not uncommon to go back and re-test a speed because it does not conform to an expected result, and if the answers are different, which is real? Therefore, is a prediction valid if the NN-based algorithm is forced to remove the scatter?

3. Raw data is needed for a good analysis, not faired data. Even with the quantitative scatter that is expected, there will be qualitative trends (i.e., curve shapes) that are hydrodynamically valid and must not be "faired out". In my opinion, it is a major shortcoming of many published test series that the raw data, even with conflicting results, is not made available. Rather, some faired curve is provided. The risk in this is that the faired curve may omit the legitimate humps and hollows of the drag curve. Which of the conflicting points is actually the "real" point, and which is a "test error"? You never have the opportunity to consider this for yourself if all you have is a faired curve. Which leads me to...

4. A reliable prediction method must be built on a valid scientific foundation. A purely statistical analysis of raw data is just that - an arbitrary fit through a set of points. However, you will have a more complete and reliable formulation if the test data is fit to a scientifically justifiable curve shape. Let me give you a simple example - a sine curve. If your test points happened to fall at multiples of Pi, you get a collection of test results that are all zero. Any statistical regression of this data would result in a straight line. However, if we knew that the form of the data was a sine curve, and we intentionally selected an underlying sine function for the statistical analysis, then we have a truly valid, useful, and reliable prediction model. The same holds true for the prediction of residuary resistance. For example, the Holtrop and Oortmerssen prediction methods are based on different implementations of the Havelock wave formula. (We use a variant of the Havelock relationship when fitting a curve to model test data.) By using this formula, or any other justifiable wave shape formula, you allow for intelligent smoothing and statistical analysis of the data without losing the humps and hollows of the curve shape that are important to a reliable prediction. And, I haven't yet mentioned the choice of the independent variables (i.e., the hull form coefficients) for the analysis...

5. The hydrodynamic experience of the team conducting the research is critical. Successfully achieving the above mentioned criteria requires a solid background in hydrodynamics, more so than even a background in numerical analysis. For example, I have had first hand experience with commercial model test programs contracted with academic tanks that were conducted by students, including the test reporting. The analyses had fundamental analysis errors and crude curve fitting.  Now, I'll be the first to acknowledge that this level of error would be unlikely for a systematic series test program that is going to be published in a recognized technical journal, but it does underscore the influence that experience can play in model testing and analysis. An academic definition of "quality" can be quite different from a commercial engineering definition.

Let me close by saying that NN and GA techniques have their place as an alternative to multiple regression. But just as you can improperly employ regression analysis of data, so can you improperly employ NN and GA. Curve fitting and the broader discipline of dimensional analysis require an intelligent approach with a scientifically logical foundation. Developing a really useful, comprehensive, and reliable prediction method is much more than just pushing numbers through a regression utility or NN tool.
 


Copyright © 2007 HydroComp, Inc. Durham, NH USA. All rights reserved. www.hydrocompinc.com