Mascara JavaScript compiler

onsdag den 16. september 2009

Blog moved

This blog has moved to http://blog.mascaraengine.com/. See you!

søndag den 9. august 2009

Arrays and higher-order functions in the type system

Lets look closer at how types are handled in Mascara in relation to Arrays and higher-order functions. This provides an interesting view into the type system.

Since this is not a general introduction to parameterized types, it is probably best understood if you already knows parameterized types, e.g. from Java or C# (where they are called "generics", but otherwise looks a lot the same.)

An Array can by default contain values of any type. However, an array can also be instantiated with a type parameter:

var x = new Array.<int>

This creates a new array which may only contain integer values.
If you initialize with an array literal like this:


var x = [1,2,3]

The compiler will determine the Array type from the initial values. In this case it will assume an array if integers, ie. Array.<int>, because all of the items are integers.


The Array class is declared like this:


dynamic class Array.<T> { ... methods ... }

The T is the type parameter which represent the type of the items. when a new array is constructed, the type parameter T is initialized with a concrete type.

Usually the compiler requires you to explicitly provide type arguments. However Array is special-cased (for backwards compatibility), so that when it is initialized without a type argument, it defaults to use the “star” type argument, which is the "anything goes" type.

Hence new Array gets translated into new Array.<*>.

[Aside: dynamic is a modifier which indicates than any property can be attached to the object at runtime without the compiler complaining. This is supported for backwards compatibility]

Now let’s look at the type signature for every:

function every(callbackfn:(function(item:T, ix:int):boolean)) : boolean {...}

This signature may seem daunting because of the nested function signature. every takes one argument, callbackfn which is in turn a function which takes two arguments.

The first argument to the callback function is a list item, hence its type is T, the type parameter for the Array. Hence if the list is a list of ints, the function has to take an int as the first argument.

The second argument to the callback is the index of the current item in the array. This is sometimes useful to have, but we may choose to ignore this argument as we have done in the examples above.

There is a certain amount of flexibility in what callback functions can be supplied. For example, as we have seen above, arguments can be ignored/left out. However we cannot supply a function with more required parameters than the signature expects.

The parameters may be more accepting than what is declared. For example we can provide a function which expects a double as the item type:

[1,2,3].every(function(x:double) x / 2 > 2)

This will work even though T is int, since int is a subset of double. (Technically parameters are said to be contravariant.) This is allows us to supply a function without type annotations on the parameters, which is pretty nice, especially for backwards compatibility.
The return value of the callback has to be a boolean as declared. However, in the above case the compiler can figure out on its own that the callback returns a boolean, because the result of a comparison is always a boolean.

Next, lets look at filter:

function filter(callbackfn:(function(item:T, ix:int):boolean)) : Array.<T> {...}

The parameters for the callback function are the same as above. The result type is itself a parameterized type - the same as the original array. Recall that T is defined as a type parameter for Array.

Hence if you filter an array of stings, the result is always a new array of strings (although the resulting list may be empty, the type is still Array.).

Now the most complex of the function signatures, map:

function map.<Q>(callbackfn:(function(item:T, ix:int):Q)) : Array.<Q> { ... }

Note that the function takes a type parameter, which must match the result type of the supplied function, and which also determines the type the resulting array.

Hence, an explicitly typed invication of map:

x.map.<string>(function(x)x.toString()) --> results in an array of type Array.<string>
x.map.<double>(function(x) x*2) --> results in an array of type Array.<double>

Obviously, if there is a mismatch between the type parameter and the return type of the function, you get a complaint from the compiler

x.map.<int>(function(x) x.toString()) --> compiler whines!

Now, here is the nice part: The type parameter to the function can be left out, since the compiler can infer the type from the type of the supplied function.

E.g.

x.map(function(x) x.toString()) --> results in an array of type Array.<string>, no complaints from the compiler

Of course the type argument makes it explicit what type you expect, and it may help catch type errors. On the other hand you save a bit of typing when calling the method by relying on the inference.

Again, this is a question of preference. I just like that you can write in a "typeless" manner, and then later turn it into more explicitly typed code.

As shown we can get pretty far in a type safe manner without specifying types explicitly. But consider this example:

[1,2,3].map(function(x) x*2)

This function will return an array of doubles, not an array of integers which you might expect. The reason is that the parameter to the callback is not specified, so the compiler has to be cautious. Multiplication is only guaranteed to return a double (since we consider int a subset of double).

An even worse example:

[1,2,3].map(function(x) x+2)

Here we get a compiler warning, because + can be used on anything, but has different meanings depending on the types of input.

Of course we can choose to ignore the warning. But it is better style to annotate the parameters:

[1,2,3].map(function(x:int) x+2)

This results in an array of integers.

Now you may wonder why the compiler can't infer the types in this case, since it is pretty obvious for us that the function will only be called with integers. However in the general case, the compiler cannot infer the types of function parameters, since there in no general solution for that. (Future versions of Mascara might attempt to infer function parameter types, but I doubt it will be possible in all cases.)

Therefore the general advice is to at least type-annotate function parameters. Typically the compiler will then be able to figure out the rest (e.g. local variables and return value).

The flamewars have raged for decades between proponents of static and dynamic typing. My experience is that static and explicit typing is a drag when experimenting and prototyping, however the bigger and more complex a project becomes, the more valuable static analysis becomes. I really likes that Mascara allows you to begin with dynamic and implicit typing, and then gradually add static guarantees as you find it appropriate.

fredag den 7. august 2009

Higher-order array methods

A most welcome addition in ES5 is the new higher-order array-methods: map, filter, reduce, some, every, forEach, reduce and reduceRight. They have long been available in various JavaScript frameworks, but it is nice to have them as part of the core language. In relation to Mascara the really cool thing is how they integrate with the type system - but I'll get back to that.

A higher-order function (or method) is defined as a function which takes another function as one of the arguments. This is especially useful when working with arrays, since the supplied function can be executed on some or all items in the array.

map

map applies a function to every item in an array, and returns a new array with the results. Example:

var numbers = [1,2,3,4]
function double(x) {
return x * 2;
}
numbers.map(double) --returns--> [2,4,6,8]

If you only use the supplied function in this context, you can also provide an anonymous function inline:

numbers.map(function(x) { return x * 2; })

In these cases the new shorthand-function syntax comes in handy:

If a function body is a single expression, the brackets and "return"-statement can be left out, like:

numbers.map(function(x) x * 2)

This shorthand syntax is also known as a "expression closure" or "lambda".

filter

Filter filters a list by only returning the items for which a condition is true. The condition is checked by supplying a function which returns true or false for a given item:

numbers.filter(function(x) x>2) --> [3,4]

While map and filter are quite useful tool, I want to point out that array comprehensions are a very cool "syntactic sugar" for mapping and filtering. This filter/map method chaining:

var x = [1,2,3,4,5].filter(function(x)x<7).map(function(x)x*2)

is equivalent to this array comprehension:

var x = [x*2 for each (x in [1,2,3,4,5]) if (x<7)]

Personally I find the array comprehension syntax more elegant, although that may be a matter of taste. One technical advantage of array comprehensions as implemented in Mascara is that they support not just Array-instances, but also collections like HTML node lists:

[e.value
for each (e in document.getElementsByTagName("INPUT"))
if (e.type=="text")]

But I digress.

some and every

some and every are related to filter since they also take functions that returns a boolean for an item.

some returns true if the condition is true for at least one item in the list, while every return true if the condition is true for all items.

Example:

numbers.some(function(x) x>2) --> true (because some of the values are greater than 2)

numbers.every(function(x) x>2) --> false (because not all values are greater than 2)

forEach simply executes a function for each item in the list. It doesn’t return anything, so this is done for side-effects. E.g.

x.forEach(function(x) { alert(x); });

Note that I am using a real function body here, not a shortcut. Since the function passed to forEach should not return anything, it is not appropriate to use an expression.

The forEach method is pretty similar to using the for-each loop:

for each (x in numbers) { alert(x); }

Lastly we have reduce and its mirror-twin reduceRight. They are a bit tricky to explain. reduce takes a function and an initial value as arguments. It executes the provided function with the initial value and the first item as arguments. The result of this is used as input to execute the function on the next item, and so on. The result of executing the function on the last item is the result of the reduce.

For example, the product of all items in a list can be calculated like this:

x.reduce(function(product, item) product * item, 1) ----> 24

And the sum:

x.reduce(function(sum : int, item : int) sum + item, 0) ----> 10

(Aside: In the last example I provide type annotations to indicate that the parameters are integers. This is because + works on both numbers and strings, so I help the compiler by indicating that they will be numbers (and hence the result will always be numeric. This is not necessary with *, since * only works on numbers anyway)

reduceRight is the same except it traverses the list backwards, i.e. it first executes the function on the last item, then the next-to last and so on.

To be honest I think reduce and reduceRight are somewhat confusing - especially with more complex operations than just summing. The same thing can always be achieved with an ordinary for-each loop, and - to me at least - in a more obvious way:

var sum = 0;
for each (var item in x) sum += x;

However, your mileage may vary. If you have a background in functional programming, the reduces may seem more natural, since you avoid a mutable variable.

The library that implements the methods is only included in the generated code if some these methods are actually used. Generally Mascara attempt to create as lean code as possible, hence boilerplate is only included when it is necessary.

Mascara 1.1 and ECMAScript 5

Mascara version 1.1 has been released. New in this release is support for a number of features from ECMAScript 5.

The ECMAScript 5 spec (PDF link) is the latest spec released by the ECMA group. It is in "candidate draft" stage, which means that it is more or less final, but it awaits experience from implementations.

It may seem curious that ECMA releases an ECMAScript 5 version when there haven't been released a final ECMAScript 4 version. (The previous release was ECMAScript 3 from 1999!) The reason is this: The ECMAScript 4 version was a large, monolithic specification where a great number of new features were introduced all at once. Some parts of the working group felt that this was far to ambitious, and wanted lesser, incremental improvements to the language. It was therefore decided to evolve the language in smaller steps over the course of several spec releases.

Perhaps to avoid confusion, the working group have decided to simply skip the version number 4, and jump straight to ECMAScript 5 - although ECMAScript 5 is a small subset of what as planned for ECMAScript 4. The next version after that, ECMAScript 6 (code named "Harmony") is under active development, and it is still not settled what it will encompass. There may very well be a ECMAScript 7 or 8 beyond that.

Since Mascara already implements a large part of what was proposed for ECMAScript 4, it is far more advanced than ECMAScript 5, and perhaps even more advanced than ECMAScript 6 will turn out. We will see.

Here is a overview of ECMAScript 5 and how it relates to Mascara.

New features
There is a number of new features, like getters/setter and the higher-order array methods like map, reduce, forEach etc. These features are available in Mascara 1.1 (and will be detailed in a later post).
They are also the foundation for features like array comprehensions and for-each loops which may be introduced in ES6 (and which already are available in Mascara).

Attribute-control and object-lockdown
There is a number of core methods in ECMAScript 5 which allows the programmer to "seal" and "freeze" objects, and to to manipulate mutability and enumerability of object properties. John Resig have a written a good overview of this. This level of control cannot be implemented or emulated in Mascara, since they require direct support in the language engine. However Mascara solves many of the uses cases for this using const variables, non-dynamic objects and in general the type system ensures at compile time that you don't modify values that are supposed to be immutable.

ECMA is definitely doing the right thing by strengthening the foundation of the language with this new level of control. However, it will probably take many years before this is broadly supported, so until then static guarantees (like what Mascara can provide) is a lot better than nothing.

Strict mode
ECMAScript 5 introduces a strict mode, where some of the more reckless styles of programming JavaScript is disallowed. For example the notoriously dangerous with-statement is disallowed, it is illegal to assign to a variable which have not been declared and so on. This should help catch a bunch of bugs.

Compared to this Mascara is only halfway strict. Some parts of strict mode, like the requirement that variables must be declared, is also required in Mascara. On the other hand, with is still allowed, although the compiler will give you a warning. Generally Mascara takes a passive-aggressive approach to strictness, where the compiler will complain (using warnings) about a lot of potentially unsafe code (e.g. using with, type errors and so on), but will still allow it. A compilation will only fail if it is totally impossible to the compiler to generate meaningful output. The filp side to this is that users of Mascara should pay attention to warnings, since unsafe code which would be flagged as a compiler error in more strict languages (like Java or C#) only will lead to a warning in Mascara.

I am contemplating introducing a way to customize the pedantic level of the Mascara compiler. I think it would be useful with a "strict" mode where all type mismatches led to errors, not just warnings. Suggestions from readers are welcome!

In general ECMAScript 5 is a well thought-out and important spec, but it is also somewhat unexciting for most developers. Many of the features are foundational work, and they only become interesting when you start to build more advanced language features on top of them. However, the future looks bright!

tirsdag den 28. juli 2009

What is an execution backend?

The latest Mascara release have improved support for changing execution backends, and now includes Mozilla Rhino as the default backend.

The execution backend is not a necessary part of the compilation/translation process and you can run Mascara fine without one.

However an execution backend is useful if you want to compile and execute the generated code in one step. This is especially useful if you have a set of unit-tests you want to run on the compiled code immediately after compilation.

Previously cscript.exe (Microsofts command-line version of JScript) were the default backend, but it is only available on Windows. Rhino is in Java, and will run on any system with Java, and since it is distributed as part of Mascara, you don't have to do anything extra to get it to work.

I still recommend cscript.exe above Rhino though. The reason is that cscript.exe reports line number and character position of runtime errors. This allows Mascara to trace back and find the corresponding position in the original source code, which makes debugging a lot easier. Rhino sadly does not always report line number of runtime errors.

The execution backend is invoked on the command line like this:

translate.py inputfile.esx --execute --backend rhino

The --execute parameter indicated that the translated code should be executed by the backend. The --backend parameter indicates the name of the backend (rhino or cscript). If the --backend parameter is left out, it defaults to the one specified in config.py (which is rhino by default).

lørdag den 28. marts 2009

Mascara 1.0 Released

The Mascara ECMAScript 4 translator version 1.0 has been released. Download here.
You can try it out online here.

Thanks to everyone who supported along the way!

To sum up, Mascara is a tool which extends JavaScript with a number of powerful features like Classes, namespaces, type-verifications and so on, and then translates this advanced code into "object-code" which will run in any browser. This allows forward-thinking developers to take advantage of future improvement to JavaScript, today.

The background is this: JavaScript was initially designed for small scripting tasks, and does not support features like classes, packages and types useful for structuring larger programs. However, today JavaScript is increasingly used for larger and more complex applications, and these shortcomings becomes a problem. Larger JavaScript applications easily becomes slow and error-prone to develop and costly to maintain.

Luckily, the ECMA organization which maintains the JavaScript standard, have developed a number of improvements to JavaScript, to alleviate these problems. This new standards is known as ECMAScript 4 or ECMAScript Harmony. However, due to browser wars and politics, it may take years before it is implemented in all new browsers, and even then we will have to worry about backwards compatibility with legacy browsers.

Mascara solves this problem, and allows us to use ECMAScript 4 today.

Some of the features in Mascara are:
Of course this is just the beginning. The ECMAScript standard is not yet finalized, and some features may change. Mascara is also working on better packaging and deployment, and we are working on editor-integration.

Mascara can be tried out and explored without installing anything, by using the online tool.
Have fun!

lørdag den 7. marts 2009

Mascara 1.0 Release Candidate

The Mascara 1.0 release candidate has been released. If no bugs shows up in this release, this is the version which will be officially released as Mascara 1.0.
This blog has moved to http://blog.mascaraengine.com. See you!

This blog is (was) for news, announcements and questions regarding the
Mascara ECMAScript 6 -> JavaScript translator.