Saturday 7 February 2015

A Well-Behaved Bell Curve For Mathematical Models

Often, when developing mathematical models, I need a bell curve-like function, somthing like the Gaussian bell curve. But I want a simple one; one that conforms to certain requirements, which makes its use somewhat sensible. Now, mind you, I don't mean to apply this function to the domain of statistics - that domain certainly belongs to the Gaussian bell curve and there is nothing wrong with it. I just don't particularly like all of its properties in the context of a well-defined mathematical model.

This is what the target function looks like:

Properties

From very far away, this function looks somwhat similar to a Gaussian bell curve in that it features one maximum and falls off symmetrically to both sides. But it has a few properties, that make it very useful in constructing mathematical models. Most importantly, and this is the main difference from the Gaussian curve, the function reaches 0 on both sides of the bell. More specifically, it has:
  • a value of 0 for x ≤ -0.5.
  • a value of 0 for x ≥ 0.5.
  • a maximum at x = 0 with a value of 1.
  • an inflection point at x = -0.25.
  • an inflection point at x = 0.25.

Solution


Disclaimer: I'm a proponent of τ (= 2 π), so I will use it in my definitions.

The basis for my suggestion is the cosine function with some modifications to stretch it into the target area and a cut-off to the left and right as to break the periodic behaviour. In Mathematica terms, it can be defined as follows:

Extensions

1. Amplitude

It can easily be scaled to reach any amplitude at the maximum by multiplying it with that maximum value like this:
In the simple version this parameter effectively defaults to 1. Here are two examples for a = 1.8 and a = -2.5 respectively:

2. Width

It is also possible to widen or narrow the curve using a width argument. The width is given between the two points where the bell curve reaches zero and defaults to 1 in the simple version:
Here are two examples for w = 4.8 and w = 0.6, respectively:

3. Offset

To move the function to the left or right, you can use an offset argument. The value of the offset specifies the position of the maximum and therefore defaults to 0 in the simple version.

Here are two examples for o = -2 and o = 0.2, respectively:

General Combination

Of course, the three extensions mentioned above are orthogonal and can be combined any way you like. This is the definition of the general function supporting all three ways to modify the bell curve:
Here are two examples. One has an amplitude of 0.1, a width of 6 and an offset of 1.5. The other one has an amplitude of -1.2, a width of 0.4 and an offset of -0.8.

Sunday 1 February 2015

Calculating The Scale Under Perspective Projection

In the recent past, I did a lot of 3D graphics programming again. On the one hand for my game project Galactic Fall and on the other hand for the modeling software that goes along with it. One problem that I encountered was, how to calculate the effective amount of scaling for distant objects. In 3D applications, the world is rendered to the screen using a perspective projection, given as a transformation matrix. Under such a perspective projection, objects in the distance get scaled down, so that an illusion of depth is created on screen.

What is the factor of scale for an object in the distance?

As I started thinking about this, one of the first things that became obvious was this: It's not really 3D objects that get scaled. Imagine a very large object, for instance an oil tanker, with your eyes (and the rest of you) placed at the bridge, looking down over the ship towards the bow. Clearly, it is not the whole tanker,  that gets scaled down. Rather, parts of the tanker that are farther away from the eye get scaled down more than those parts near to the eye. So, the scaling factor is not a discreet property of the object but a continuous scalar property of space itself and can be calculated for every distinct point in space.

It took me quite some time to appreciate this! The scaling factor is a property of any point in space, yet scaling points per se doesn't make much sense, because they have no extent which could be scaled. The only thing that can appear smaller is something that has spacial extent, for example a line or a triangle, or a cube. However, in general, each point of an object has a slightly different distance from the eye and thus a slightly different scaling factor. This means, that, generally speaking, no single scaling factor can be given to represent the scale of an object under a perspective projection.

What is the factor of scale for any point in space?

Nevertheless, I can determine the specific scaling factor for every single point in space. Trying to deduce the correct formula, I will now turn to a certain type of objects for my investigations. I need objects that have a spacial extent but a constant factor of scaling across all points of the object. Obviously, that means that all points have to have the same distance from the eye position and therefore have to lie on the surface of a sphere with the eye at its centre.

Target Function

Firstly, to simplify things a bit, I'll assume that all coordinates are given in eye space. That is, the eye/camera/viewer sits firmly at coordinates (0, 0, 0) and all other points in space are given relative to that origin. The target function looks something like this
ScaleP=...
returning a single floating point value, which expresses the apparent scaling at the point P.

Reference scale

Suppose, this function returns 0.2. So we know, an object at point P appears 5 times smaller than ... than what? What is the reference? What is the "normal" size of the object, when the scale is 1? Or, asked rather more purposefully: What is the distance, of a point P from the eye, so that ScaleP equals 1?

The good thing is, that it doesn't matter - it is purely a matter of definition. However, as with all definitions, you should choose your definition to be meaningful in the context that you are going to use it in.

Easy answer

Well, if we can define it any way we like, why not use 1. And the beautiful answer is:
Scale(P)=1P
Or in words: the scale is inversely proportional to the distance. Anything at a distance of 7 would thus appear to be a seventh of the size that it appears to have at distance 1.

Understanding the answer

To understand, why this is the case, I will analyse the following picture. It is a further simplification of the problem, by moving to the two-dimensional space, but it can be transformed to three dimensions with relative ease. It's just a lot harder to draw.

In three dimensional space, I would use the geodesic segment, which is a straight line projected onto a the surface of a sphere. In two dimensional space, this is called an arc and is a segment of a circle.

  • E is, of course the eye position, the origin of the coordinate system.
  • O is an object at a near distance.
  • P is the same object at a far distance.
  • A is the vector from the E to the object O.
  • B is the vector from the E to the object P.

    Preliminaries

    Firstly, what, in this picture, represents the scaled down apparent size of the object P? It is the angle between the grey lines that limit the objects O and P. This angle is called the visual angle of the objects O and P respectively, and I will use θ for the visual angle of O and φ for the visual angle of P.

    Using the distance to an object and its visual angle, it is possible to calculate the physical size of the object. In this case, it is the length of the arc which is defined as the product of those two values. (see Wikipedia: Arc for reference)
    SizeO=A×θ1SizeP=B×φ2
    Now, I know, that the physical size of P is the same as the physical size of O, because the former is just a translated version of the latter. This gives me:
    SizeO=SizePA×θ=B×φusing 1 and 2A×θB=φ3

    Solution

    So, what is the target function? In this case, I want to know the factor of scale for the point that is indicated by the vector B - so that will be the input. The output is the scale, and it is expressed by the relation between the two angles θ and φ and I know that the result should be smaller than 1, so the larger angle (θ) goes to the divisor.
    ScaleB=φθScale(B)=A×ϕBϕScale(B)=AB
    This is a good function to calculate the scale as it is independent from the actual object, that is meant to be scaled. It only relates the two distances to each other and is actually more powerful than what I limited myself to above. The magnitude of A in the dividend is the reference distance which may be 1, as assumed above, or may be any other distance that seems opportune or serves some specific purpose.

    Tuesday 21 May 2013

    O(n) Iterative Depth-First Post-Order Tree Traversal

    The title really is already a mouth full. But every part is important and no strict subset of descriptive words is quite enough to grasp the whole range of the problem. Especially when turning to Google looking for information about this "thing", I quickly realized that although the answer may be out there, I would not find it quickly. In the end, I'd rather spend my time dipping my hands into algorithms again, developing my own solution and writing up a blog entry.

    Properties

    Let's go through those points in a meaningful order:
    1. Tree: nodes, edges, root, free of cycles, yadayada.
      • I expect trees to have a root node (root).
      • I expect nodes to have a container full of sub nodes (subs).
      • I assume the order of sub nodes to be of no importance. (See at the end for details and remedies.)
      It is important to note that the algorithm places no further restrictions on the structure of the trees.
      • It works for unidirectional trees, although bi-directionality certainly doesn't hinder it.
      • It makes no assumptions on the arity of the trees. Every node may have variable and potentially unlimited number of sub nodes.
      • It does not use any information in or for the nodes, like flags, tags or additional pointers to pre- or post-order nodes.
    2. Traversal: The algorithm visits each node in those trees once. Visiting as in processing, not as in touching, because nodes may need to be touched multiple times.
    3. Depth-First: Each visiting branch is examined to the deepest nodes before backtracking up the tree.
    4. Post-Order: The algorithm visits all of a node's sub nodes (and their sub nodes, ...) before visiting the node itself.
    5. Iterative: As opposed to recursive. The algorithm doesn't need recursion but a loop and data structures.
    6. O(n): The algorithm takes linear time to complete with regards to the number of nodes. This also requires the helping data structures to be sufficiently fast. Especially when called inside the main loop of the algorithm, the data structures need to provide access in O(1).

    Algorithm

    Here goes the algorithm. It is easy enough to understand, once it is written in plain sight. Generally speaking, the algorithm touches every node twice:
    1. Once on the way down to examine it and push its sub nodes onto the stack.
    2. Once on the way up to visit it.
    The following implementation contains one small optimization in that it visits nodes without sub nodes immediately upon examination, thus touching them only once.

       traverse(tree, visit)
          todo : stack
         
          if tree.root
    null
              todo.push([tree.root, down])
              while todo.empty = false
                  node, direction = todo.pop()
                  if node
    .subs.empty = true or direction = up
                      visit(node)

                  else
                      todo.push([node, up])
                      foreach sub in node.subs
                          todo.push([sub, down])

    That's it. If the programming language of your choice doesn't support pairs or makes them difficult to handle, here is another implementation that uses two stacks which, combined, can be interpreted as those up and down flags. But it is not as plainly to understand.

       traverse(tree, visit)
          depth : stack
          todo : stack
         
          if tree.root
    null 
              todo.push(tree.root)
              while todo.empty = false
                  node = todo.top
                  if node
    .subs.empty = true or (depth.empty = false and depth.top = node)
                      visit(node)

                      if depth.empty = false and depth.top = node
                          depth.pop()
                      todo
    .pop()
                  else
                      depth.push(node)
                      foreach sub in node.subs
                          todo.push(sub)

    Note

    Trees assign no importance at all to the order of sub nodes inside a node, but that is just a mathematical theory. In the practical world of programming languages, nodes have a container of sub nodes that is iterable and iteration necessarily sequentializes the sub nodes thereby giving order where none is required. The algorithm as written above pushes the sub nodes onto the nodes stack in interation order of the sub nodes container. When poping the elements from that stack, the sub nodes are visited in reverse order. If that is undesirable, the sub nodes should be pushed onto the stack in reverse iteration order. Most programming languages allow reverse iteration or use a container that provides reverse iteration, but if yours fails to offer that convenience, just use another stack, push the sub nodes into it and pop them again. Now you have the sub nodes in reverse iteration order, ready to push onto the todo stack. It's not beautiful, but it doesn't change the complexity of the algorithm.

    Friday 22 March 2013

    Non-linear Data Transformation

    Linear Data Transformation

    My last two posts on Data Transformations and Normalization Transformations were concerned with transformations that can be grouped together under the common title: linear data transformations. Such transformations have many use cases and are quite important. More often, though, I find myself searching for ways to skew the input data set to either
    1. favour the high end of the data or
    2. favour the low end of the data,
    but without changing the order of the data points.
    Here is an example and some explanations for which I use the sample data from my Data Transformations post, containing all integers from and including 0 to and including 10 but with a normalization applied. This results in the following data set:

    Intention

    What do I mean by "favouring the high end data"? Of course, in absolute terms, the high end data is already favoured as much as possible. The maximum value of 1 could not possibly be increased while still retaining the normalization range. Instead, I really want to decrease the low end data, thus highlighting the high end data more dominantly in terms of relative distances. Of course, decreasing data points below 0 is not an option as well, as that would lead out of the normalization range.
    I needed some other, non-linear transformation to achieve this output data:
    Conversely, what do I mean by "favouring the low end data"? Just like above, I mean to increase the relative distances of data point but this time in the low end spectrum of the data set. This is what the output data should look like:

    Properties

    As you can see from the two pictures above, the order remains untouched by the transformation. The absolute distances are of course changed and the relative distances are changed as well. Since the relative distances are not changed in a proportional way, the transformation is non-linear. When I set out looking for a data transformation that does this, I wanted to find a function that utilizes a parameter with which the intensity of skewing the data towards either the low end or the high end can be modified. Also it was important to me to specify the corner cases:
    1. There is a maximum intensity, that skews the data infinitely towards the high end. Looking at the example picture above and taking the idea to an extreme, this means that at maximum intensity, the transformation turns into a constant transformation with the constant 0.
    2. There is a minimum intensity, that skews the data infinitely towards the low end. Analogous to the maximum intensity extreme, the minimum intensity extreme turns the transformation into a constant transformation with the constant 1.
    3. There is a well specified "normal" intensity which does not change the input data, thus turning the transformation into an identity transformation.
    Here is a picture of what that should look like:
    The range of the intensity parameter is 0 to 1 and the specific intensities for the transformations above are 0, 1/20, 1/4, 1/2, 3/4, 19/20 and 1.

    Mathematica

    The following function does almost all of this:
    The parameter i is the intensity and x is the input data. Sadly, this function has two undefined points at:
    1. x = 0, i = 0 
    2. x = 1, i = 1.
    These are the two points when either the low end data point has to jump up to 1 in order to satisfy the minimum intensity or the high end data point has to jump to down 0 to satisfy the maximum intensity. Luckily, the conditions are easy to catch and you can just return the defined result. In Mathematica I use this definition:

    Saturday 16 March 2013

    Normalization Transformations

    This is a small follow-up on my post about Data Transformations, adding two easy transformations regarding data normalization. To highlight the main point of these two transformations, I'll consider this data set, containing twelve numbers:

    Normalization

    Many models are much easier to design if the input data is provided in the range of 0 to 1. I will assume that all data points are positive for purposes of brevity. Data normalization is performed by multiplying every data point with the multiplicative inverse of the maximum data point.
    f(x) = x / max
    Not surprisingly the order remains the same, as do the relative distances. The absolute differences are scaled by the same amount as the data points themselves. This follows from the fact that it is basically a linear data transformation with a very specific multiplier.

    Range Normalization

    If I wanted my mathematical model to place special focus on the differences of the data points then there is some more room for improvement. In the data set, every data point is greater than or equal to 50, which means that 50% of the normalization range 0 to 1 is wasted on data that is not actually worth considering for the differences of data points.
    Conceptually, I perform an offset transformation with the negative minimum and a normalization with the new maximum. Practically, it is easier to calculate minimum and maximum from the original data in one sweep and use the combined range normalization transformation:
    f(x) = (x - min) / (max - min)
    This transformation preserves the order of the data points but modifies all relative and absolute distances. Most importantly, the output data is guaranteed to contain the data points 0 and 1, which means that the data is spread as best as can be, in the range of 0 and 1.
    Interestingly, this normalization technique is also appropriate for normalizing data sets containing negative data points into the range of 0 to 1.

    Wednesday 13 March 2013

    Data Transformations

    I started writing a post about my most recent work on mathematical models but couldn't quite get started because I felt that wherever I stepped, I had to explain something in more detail or define some words that I wanted to use in order to make a succinct statement. There are some preliminaries to be covered before my work makes any sense. Unfortunately, this also means that this post is rather dull. Have a good flight over it ... there should be no surprises here.

    What do I mean by data transformation?

    Whenever I try to develop a mathematical model for a certain phenomenon I experiment with functions that manipulate input data in such a way as to properly map onto my expected output result. Thus by data transformation I am talking about a function, that maps from the input data format to some data format and changes the value in some meaningful way. To make it blatantly clear, here is a trivial example:
    f(x) = 2 * x
    The important thing about these transformations is not that they are mathematical simple but how they behave for sets of input data points. Does a given transformation
    • ... preserve the data point order?
    • ... reverse the data point order?
    • ... temper with the order in more complex ways?
    • ... change the distances between data points?
    • ... change the relative distances between data points?
    • ... highlight specific parts of the data set?
    There are many ways to transform data and in this post I want to cover four very easy and fundamental ones:
    1. Constant transformation
    2. Identity transformation
    3. Offset transformation
    4. Linear transformation
    To investigate these functions I use this simple data set. It contains all integers from and including 0 to and including 10.

    Constant transformation

    This one is so simple that, often, I overlook it.
    f(x) = c
    This is the great equalizer. It levels the playing field and makes all data points the same. Here is an example for c = 4 applied to the example data set.


    In itself it is not an important transformation and has no other surprising properties, except being an equalizer. But it serves well to be aware of this transformation as an extreme case for other, more complex transformations as well as its part in combined transformations.

    Identity transformation

    Another very simple transformation is:
    f(x) = x
    Judging by its interestingness, I'd say it is even below the constant transformation. But again, it should be ever present in my mind for being a most interesting special case of more complex transformations.
    Let's make the obvious statement that, since the output is identical to the input, it preserves order and differences and move on.

    Offset transformation

    Still nothing too fancy here. This transformation just shifts the input data set by a constant value.
    f(x) = x + c
    It preserves the order and the absolute differences, although it changes the relative differences, and is, overall, rather boring.

    Linear transformation

    To anyone with at least a tangent grasp on mathematics, it is no surprise: the identity transformation is only a special case of the linear transformation.
    f(x) = m * x
    Let's split some hairs here: Of course I could add a "+ c" at the end of that definition and be mathematically more correct. But for me it is more important to be principally concise.
    Each of these transformations has a specific task to perform on the input data. Of course, mathematically, the constant, identity and offset transformations are just special cases of the general linear transformation, but then, the linear transformation is just a special case of other, more complex transformations. In the end, I could only talk about the most abstract and complex transformations that subsume all other cases, at the risk of having lost everybody not willing or able to follow me to abstract fairy land. I'll stick to calling them individually and remain content that I can combine them any way I want, to reach more complex forms.
    A simple example for a linear transformation with m = 0.4.
    Obviously, it preserves the order and the relative differences, but it modifies the absolute differences by multiplying them with m as well.
    It gets slightly more interesting with a negative value for m which reverses the order. I set m = -0.7 in this example.
     
    Not surprisingly, as with a positive m, this changes all absolute differences and because m is negative flips their sign. Relative differences are preserved, though.

    Friday 8 March 2013

    Getting 0 To ∞ On A Linear Slider

    Problem

    I use Mathematica for all my attempts at developing mathematical models for all kinds of problems. The beautiful thing about Mathematica is its ability to provide custom user interface boxes that allow fiddling around with the parameters in those models.


    This shows a generated user interface with a slider for changing the value of "m", which is the slope in the depicted function. But the slider has a maximal value (100 in this case) and therefore you could never conveniently explore the whole range of values for "m". Every concrete value for "m", no matter how high, would always be just at the beginning of the slider.

    Solution

    What I needed was a function that would map the interval 0 to ∞ on a linear scale. The actual linear scale doesn't matter but the most versatile one is, of course, a normalized scale from 0 to 1. One function that does this is:
    f(x) = x / (1 - x)
    and it looks like this.
     The function has these properties:
    1. at x = 0, it returns 0
    2. at x = 0.5, it returns 1
    3. for x approaching 1 it approaches infinity
    4. at x = 1 it is undefined

    Mathematica

    In Mathematica I use the following definition, which makes it easier to use in the context of parameter boxes as mentioned above.
    This avoids the error with an undefined result at x = 1 and, in this context, is the correct interpretation of the undefined value.