Wednesday, 13 March 2013

Data Transformations

I started writing a post about my most recent work on mathematical models but couldn't quite get started because I felt that wherever I stepped, I had to explain something in more detail or define some words that I wanted to use in order to make a succinct statement. There are some preliminaries to be covered before my work makes any sense. Unfortunately, this also means that this post is rather dull. Have a good flight over it ... there should be no surprises here.

What do I mean by data transformation?

Whenever I try to develop a mathematical model for a certain phenomenon I experiment with functions that manipulate input data in such a way as to properly map onto my expected output result. Thus by data transformation I am talking about a function, that maps from the input data format to some data format and changes the value in some meaningful way. To make it blatantly clear, here is a trivial example:
f(x) = 2 * x
The important thing about these transformations is not that they are mathematical simple but how they behave for sets of input data points. Does a given transformation
  • ... preserve the data point order?
  • ... reverse the data point order?
  • ... temper with the order in more complex ways?
  • ... change the distances between data points?
  • ... change the relative distances between data points?
  • ... highlight specific parts of the data set?
There are many ways to transform data and in this post I want to cover four very easy and fundamental ones:
  1. Constant transformation
  2. Identity transformation
  3. Offset transformation
  4. Linear transformation
To investigate these functions I use this simple data set. It contains all integers from and including 0 to and including 10.

Constant transformation

This one is so simple that, often, I overlook it.
f(x) = c
This is the great equalizer. It levels the playing field and makes all data points the same. Here is an example for c = 4 applied to the example data set.


In itself it is not an important transformation and has no other surprising properties, except being an equalizer. But it serves well to be aware of this transformation as an extreme case for other, more complex transformations as well as its part in combined transformations.

Identity transformation

Another very simple transformation is:
f(x) = x
Judging by its interestingness, I'd say it is even below the constant transformation. But again, it should be ever present in my mind for being a most interesting special case of more complex transformations.
Let's make the obvious statement that, since the output is identical to the input, it preserves order and differences and move on.

Offset transformation

Still nothing too fancy here. This transformation just shifts the input data set by a constant value.
f(x) = x + c
It preserves the order and the absolute differences, although it changes the relative differences, and is, overall, rather boring.

Linear transformation

To anyone with at least a tangent grasp on mathematics, it is no surprise: the identity transformation is only a special case of the linear transformation.
f(x) = m * x
Let's split some hairs here: Of course I could add a "+ c" at the end of that definition and be mathematically more correct. But for me it is more important to be principally concise.
Each of these transformations has a specific task to perform on the input data. Of course, mathematically, the constant, identity and offset transformations are just special cases of the general linear transformation, but then, the linear transformation is just a special case of other, more complex transformations. In the end, I could only talk about the most abstract and complex transformations that subsume all other cases, at the risk of having lost everybody not willing or able to follow me to abstract fairy land. I'll stick to calling them individually and remain content that I can combine them any way I want, to reach more complex forms.
A simple example for a linear transformation with m = 0.4.
Obviously, it preserves the order and the relative differences, but it modifies the absolute differences by multiplying them with m as well.
It gets slightly more interesting with a negative value for m which reverses the order. I set m = -0.7 in this example.
 
Not surprisingly, as with a positive m, this changes all absolute differences and because m is negative flips their sign. Relative differences are preserved, though.

No comments:

Post a Comment