Friday 22 March 2013

Non-linear Data Transformation

Linear Data Transformation

My last two posts on Data Transformations and Normalization Transformations were concerned with transformations that can be grouped together under the common title: linear data transformations. Such transformations have many use cases and are quite important. More often, though, I find myself searching for ways to skew the input data set to either
  1. favour the high end of the data or
  2. favour the low end of the data,
but without changing the order of the data points.
Here is an example and some explanations for which I use the sample data from my Data Transformations post, containing all integers from and including 0 to and including 10 but with a normalization applied. This results in the following data set:

Intention

What do I mean by "favouring the high end data"? Of course, in absolute terms, the high end data is already favoured as much as possible. The maximum value of 1 could not possibly be increased while still retaining the normalization range. Instead, I really want to decrease the low end data, thus highlighting the high end data more dominantly in terms of relative distances. Of course, decreasing data points below 0 is not an option as well, as that would lead out of the normalization range.
I needed some other, non-linear transformation to achieve this output data:
Conversely, what do I mean by "favouring the low end data"? Just like above, I mean to increase the relative distances of data point but this time in the low end spectrum of the data set. This is what the output data should look like:

Properties

As you can see from the two pictures above, the order remains untouched by the transformation. The absolute distances are of course changed and the relative distances are changed as well. Since the relative distances are not changed in a proportional way, the transformation is non-linear. When I set out looking for a data transformation that does this, I wanted to find a function that utilizes a parameter with which the intensity of skewing the data towards either the low end or the high end can be modified. Also it was important to me to specify the corner cases:
  1. There is a maximum intensity, that skews the data infinitely towards the high end. Looking at the example picture above and taking the idea to an extreme, this means that at maximum intensity, the transformation turns into a constant transformation with the constant 0.
  2. There is a minimum intensity, that skews the data infinitely towards the low end. Analogous to the maximum intensity extreme, the minimum intensity extreme turns the transformation into a constant transformation with the constant 1.
  3. There is a well specified "normal" intensity which does not change the input data, thus turning the transformation into an identity transformation.
Here is a picture of what that should look like:
The range of the intensity parameter is 0 to 1 and the specific intensities for the transformations above are 0, 1/20, 1/4, 1/2, 3/4, 19/20 and 1.

Mathematica

The following function does almost all of this:
The parameter i is the intensity and x is the input data. Sadly, this function has two undefined points at:
  1. x = 0, i = 0 
  2. x = 1, i = 1.
These are the two points when either the low end data point has to jump up to 1 in order to satisfy the minimum intensity or the high end data point has to jump to down 0 to satisfy the maximum intensity. Luckily, the conditions are easy to catch and you can just return the defined result. In Mathematica I use this definition:

No comments:

Post a Comment