Statistical
noise is unexplained variability within a data sample. The word
"noise" has its roots in telecom signal processing; in that context,
noise describes unexplained electrical or electromagnetic energy that can
degrade the quality of signals and corresponding data.
In both telecom and data science,
the presence of noise can significantly affect sampling. Sampling is an
analysis technique in which a representative subset of data points is selected,
manipulated and analysed to identify signals, which are patterns in a larger data set. Signals are
important because they are the patterns the analyst needs to examine in order
to draw conclusions.
Noise can interfere with signals,
however, and cause the analyst's attention to be misdirected. A popular
solution is to use algorithms that can help separate noise from signals, but
even this can be problematic.
In machine learning (ML), for
example, statistical noise can create problems when algorithms are not trained
properly. This can be dangerous in ubiquitous computing, because if an
algorithm classifies noise as a pattern, it will use that pattern to start
making generalizations and extrapolations.
The terms statistical noise and
statistical bias are sometimes confused. While both concepts deal
with overestimating or underestimating the importance of a variability, a bias
can be reproduced reliably, while noise cannot.
Comments
Post a Comment