Statistical noise is
unexplained variability within a data sample. The word "noise" has
its roots in telecom signal processing; in that context, noise describes
unexplained electrical or electromagnetic energy that can degrade the quality
of signals and corresponding data.
In
both telecom and data science, the presence of noise can significantly affect
sampling. Sampling is an analysis technique in which a representative subset of
data points is selected, manipulated and analyzed to identify signals, which are patterns in
a larger data set. Signals are important because they are the patterns the
analyst needs to examine in order to draw conclusions.
Noise
can interfere with signals, however, and cause the analyst's attention to be
misdirected. A popular solution is to use algorithms that can help separate
noise from signals, but even this can be problematic.
In
machine learning (ML), for example, statistical noise can create problems when
algorithms are not trained properly. This can be dangerous in ubiquitous
computing, because if an algorithm classifies noise as a pattern, it will use
that pattern to start making generalizations and extrapolations.
The
terms statistical noise and statistical bias are
sometimes confused. While both concepts deal with overestimating or underestimating
the importance of a variability, a bias can be reproduced reliably, while noise
cannot.
Comments
Post a Comment