søndag den 9. august 2009

Arrays and higher-order functions in the type system

Lets look closer at how types are handled in Mascara in relation to Arrays and higher-order functions. This provides an interesting view into the type system.

Since this is not a general introduction to parameterized types, it is probably best understood if you already knows parameterized types, e.g. from Java or C# (where they are called "generics", but otherwise looks a lot the same.)

An Array can by default contain values of any type. However, an array can also be instantiated with a type parameter:

var x = new Array.<int>

This creates a new array which may only contain integer values.
If you initialize with an array literal like this:


var x = [1,2,3]

The compiler will determine the Array type from the initial values. In this case it will assume an array if integers, ie. Array.<int>, because all of the items are integers.


The Array class is declared like this:


dynamic class Array.<T> { ... methods ... }

The T is the type parameter which represent the type of the items. when a new array is constructed, the type parameter T is initialized with a concrete type.

Usually the compiler requires you to explicitly provide type arguments. However Array is special-cased (for backwards compatibility), so that when it is initialized without a type argument, it defaults to use the “star” type argument, which is the "anything goes" type.

Hence new Array gets translated into new Array.<*>.

[Aside: dynamic is a modifier which indicates than any property can be attached to the object at runtime without the compiler complaining. This is supported for backwards compatibility]

Now let’s look at the type signature for every:

function every(callbackfn:(function(item:T, ix:int):boolean)) : boolean {...}

This signature may seem daunting because of the nested function signature. every takes one argument, callbackfn which is in turn a function which takes two arguments.

The first argument to the callback function is a list item, hence its type is T, the type parameter for the Array. Hence if the list is a list of ints, the function has to take an int as the first argument.

The second argument to the callback is the index of the current item in the array. This is sometimes useful to have, but we may choose to ignore this argument as we have done in the examples above.

There is a certain amount of flexibility in what callback functions can be supplied. For example, as we have seen above, arguments can be ignored/left out. However we cannot supply a function with more required parameters than the signature expects.

The parameters may be more accepting than what is declared. For example we can provide a function which expects a double as the item type:

[1,2,3].every(function(x:double) x / 2 > 2)

This will work even though T is int, since int is a subset of double. (Technically parameters are said to be contravariant.) This is allows us to supply a function without type annotations on the parameters, which is pretty nice, especially for backwards compatibility.
The return value of the callback has to be a boolean as declared. However, in the above case the compiler can figure out on its own that the callback returns a boolean, because the result of a comparison is always a boolean.

Next, lets look at filter:

function filter(callbackfn:(function(item:T, ix:int):boolean)) : Array.<T> {...}

The parameters for the callback function are the same as above. The result type is itself a parameterized type - the same as the original array. Recall that T is defined as a type parameter for Array.

Hence if you filter an array of stings, the result is always a new array of strings (although the resulting list may be empty, the type is still Array.).

Now the most complex of the function signatures, map:

function map.<Q>(callbackfn:(function(item:T, ix:int):Q)) : Array.<Q> { ... }

Note that the function takes a type parameter, which must match the result type of the supplied function, and which also determines the type the resulting array.

Hence, an explicitly typed invication of map:

x.map.<string>(function(x)x.toString()) --> results in an array of type Array.<string>
x.map.<double>(function(x) x*2) --> results in an array of type Array.<double>

Obviously, if there is a mismatch between the type parameter and the return type of the function, you get a complaint from the compiler

x.map.<int>(function(x) x.toString()) --> compiler whines!

Now, here is the nice part: The type parameter to the function can be left out, since the compiler can infer the type from the type of the supplied function.

E.g.

x.map(function(x) x.toString()) --> results in an array of type Array.<string>, no complaints from the compiler

Of course the type argument makes it explicit what type you expect, and it may help catch type errors. On the other hand you save a bit of typing when calling the method by relying on the inference.

Again, this is a question of preference. I just like that you can write in a "typeless" manner, and then later turn it into more explicitly typed code.

As shown we can get pretty far in a type safe manner without specifying types explicitly. But consider this example:

[1,2,3].map(function(x) x*2)

This function will return an array of doubles, not an array of integers which you might expect. The reason is that the parameter to the callback is not specified, so the compiler has to be cautious. Multiplication is only guaranteed to return a double (since we consider int a subset of double).

An even worse example:

[1,2,3].map(function(x) x+2)

Here we get a compiler warning, because + can be used on anything, but has different meanings depending on the types of input.

Of course we can choose to ignore the warning. But it is better style to annotate the parameters:

[1,2,3].map(function(x:int) x+2)

This results in an array of integers.

Now you may wonder why the compiler can't infer the types in this case, since it is pretty obvious for us that the function will only be called with integers. However in the general case, the compiler cannot infer the types of function parameters, since there in no general solution for that. (Future versions of Mascara might attempt to infer function parameter types, but I doubt it will be possible in all cases.)

Therefore the general advice is to at least type-annotate function parameters. Typically the compiler will then be able to figure out the rest (e.g. local variables and return value).

The flamewars have raged for decades between proponents of static and dynamic typing. My experience is that static and explicit typing is a drag when experimenting and prototyping, however the bigger and more complex a project becomes, the more valuable static analysis becomes. I really likes that Mascara allows you to begin with dynamic and implicit typing, and then gradually add static guarantees as you find it appropriate.

Ingen kommentarer:

This blog has moved to http://blog.mascaraengine.com. See you!

This blog is (was) for news, announcements and questions regarding the
Mascara ECMAScript 6 -> JavaScript translator.