Saturday, 16 March 2013

Normalization Transformations

This is a small follow-up on my post about Data Transformations, adding two easy transformations regarding data normalization. To highlight the main point of these two transformations, I'll consider this data set, containing twelve numbers:

Normalization

Many models are much easier to design if the input data is provided in the range of 0 to 1. I will assume that all data points are positive for purposes of brevity. Data normalization is performed by multiplying every data point with the multiplicative inverse of the maximum data point.
f(x) = x / max
Not surprisingly the order remains the same, as do the relative distances. The absolute differences are scaled by the same amount as the data points themselves. This follows from the fact that it is basically a linear data transformation with a very specific multiplier.

Range Normalization

If I wanted my mathematical model to place special focus on the differences of the data points then there is some more room for improvement. In the data set, every data point is greater than or equal to 50, which means that 50% of the normalization range 0 to 1 is wasted on data that is not actually worth considering for the differences of data points.
Conceptually, I perform an offset transformation with the negative minimum and a normalization with the new maximum. Practically, it is easier to calculate minimum and maximum from the original data in one sweep and use the combined range normalization transformation:
f(x) = (x - min) / (max - min)
This transformation preserves the order of the data points but modifies all relative and absolute distances. Most importantly, the output data is guaranteed to contain the data points 0 and 1, which means that the data is spread as best as can be, in the range of 0 and 1.
Interestingly, this normalization technique is also appropriate for normalizing data sets containing negative data points into the range of 0 to 1.

No comments:

Post a Comment