Beware of tests that use simulated data to validate product performance, researchers warn

10.11.2010

If two different detectors seeking different data characteristics look at the same simulated data, the first might seem to work well because of a good matchup between what it seeks and what was represented in the data, he says. The second might not test well because it seeks characteristics not represented in the data.

The second detector might actually work better against real-world data, he says, but testers would be unable to tell from the experiments using simulated data. That could lead vendors, for example, to develop a product that doesn’t work well in the real world while abandoning work on the product that would have done well, DeVale says. “It might shut down a valid research area or you might get false confidence that the detector really works,” he says “The second detector could be better, but you’d never know it because the test was flawed.”

Designing tests that match salient characteristics of test data to the anomaly detection products being tested means more work. “It adds complexity to it, but that’s better than being blind to it,” DeVale says.

He presented research that used fairly simple and restricted data sets, one pertaining to altitude and speed of airplanes landing and one pertaining to commands sent to orbiting deep-space probes. The problem of simulation is much more complex if the data set is Internet traffic, he says. “Cyber data is harder to look at,” he says.

in Network World's Wide Area Network section.