Samstag, 1. August 2015

Creating powerful languages with Xtext

[This is a reprint of an article I wrote in 2013 for AltDevBlogADay, which ceased to exist, which is why I felt like putting it on my blog. No doubt, I would write this article completely differently today, but I think it is actually kind of nice to preserve my perspective I had back then by not altering the article.]

Today, I would like to introduce the Xtext framework to you. Xtext is an open source framework which allows you to define your own textual programming languages before you know it. Xtext was not developed specifically with game developers as a target group in mind. It is rather independent in terms of its application context, and I’m convinced that every tool department that ever needed to define their own programming language can immensely benefit from Xtext. So, although there already exists convenient documentation, examples, forums, and so forth, I would like to provide an article that uses a small, game-related example to introduce the technology.

Xtext ships as a plugin for the eclipse IDE. Hence, we will need to cover some basics of eclipse, too. I tried to do this in a pragmatic way and within a reasonable scope. 

So, what exactly is Xtext?

Xtext is a free to use, open source language development framework, available under the Eclipse Public License. It comes with some powerful language design tools, which provide you with
  • an easy way to design new programming languages, be it large general purpose or little domain-specific languages,
  • powerful default configuration to automatically build rich tooling for your new language
  • a highly modular architecture and API which enable you to customize and enhance almost every aspect of your language and its tools
  • an open and active community.

Our goal

The hardest part of writing such an article is to construct a decent example. So I didn’t. Instead, I use a script format example suggested by Steve Ince in his book Writing for Video Games (thanks Steve!). We iterated a bit over the language design to come up with a more concise version, but anyway, the screencast shows the language and the editor we want to create throughout this article.



Let’s not put the specific language design to discussion here. Instead, we will learn how to build such a language + rich editor using Xtext, such that you are able to create your own language as you desire.

The cool thing is: In order to get this language plus the tooling I show in the video, all we have to do are these steps:
  1.  Create a new Xtext project
  2.  Define the grammar of our language in the grammar definition file
  3.  Trigger Xtext to generate everything for us

Getting started

Xtext ships as an Eclipse plugin. Hence, you can
  • either add it to your existing Eclipse application using the update mechanism of Eclipse,
  • or, if you don’t already use Eclipse, download a complete distribution of it, containing Xtext and all the necessary dependencies, at once
You can find the download as well as a short installation guide here. Note that you need to choose an Eclipse version that matches your installed Java Runtime Environment. That means that you can only use 64 bit version of Eclipse if you are running a 64 bit JRE on your system. Let me know in the comments section if you encounter problems running Xtext.

In this tutorial, I work with the Eclipse Kepler release and Xtext version 2.4.2. The file I downloaded from the linked site is “eclipse-dsl-kepler-R-win32.zip”. 

Download the example

Refer to my blog to download the language as a standalone product with some enhancements . You can also download the Xtext projects (sources+runtime project) that are shown in this article. 

Intended benefits

What are the benefits creating such a modeling language in the first place? First, the language features domain abstractions that allow users to express information in a natural way. In combination with the rich tooling, users are strongly supported in creating syntactically and semantically correct contents. In our case, it supports game writers to stick with the dialog format and create dialog scripts using a simple language.


Since the language conforms to an underlying model, every dialog script automatically has an object graph we can access programmatically. This means that we can create generators that translate dialog scripts into other formats, like
  • XML, C++ or C#, to feed the scripts into a dialog engine,
  • Excel and screenplay formats for localization purposes,
  • or statistical reports, e.g. how many lines each character has.
Alternatively, we could create an interpreter, a program that is able to execute a dialog script directly, by working on the object graph of a script. A use case for this would be to allow writers to play through their dialogs already during creation.

Moreover, the language comes with syntactic validation, so that writers know if there dialogs are syntactically correct. But since we also have the underlying data model, we can additionally provide semantic validation. This turns the language into a powerful tool, as we will cover in an upcoming article.

Put another way, it makes dialog scripts in their meaning equal to source code, since they provide detailed and concise information, while they still can be written by non-programmers, thanks to the used abstractions and limited scope of a DSL. In combination with the tooling that comes with an Xtext language, the users—in our case game writers—are supported in many ways to efficiently create their content.

But before I start to talk about shortened turnaround times, let’s move on with the article and discover the benefits step by step.

Setting up an Xtext project

Note that I work on a PC, but if you’re on a Unix-based system, you should be able to follow the described steps accordingly.

Start your Eclipse application by launching the eclipse.exe/*.app. By default, Eclipse asks you for a workspace location. The workspace is the root directory where this instance of Eclipse will store your projects, as well as the related metadata Eclipse requires. Check the box at the bottom if you want to set this workspace as your default.



Eclipse starts with showing us a “Welcome page”, which we can just ignore. Instead, select File --> New --> Project from the main menu. This leads us to a wizard selection dialog. Type Xtext in the filter field at the top. As you will see in the filtered list of available project types, Xtext already ships with many example projects, which you might want to explore later for yourself. For now, select Xtext Project and press Next >.



We are directed to the New Xtext Project wizard which already features some defaults. Let’s go through the fields step-by-step. 
  • Project name: Xtext projects need to feature a project name that starts with a lower case letter since it will derive Java packages from that name. Let’s call our project adbad.dialogScript.sample. 
  • Use default location: Let it checked so that the project will be stored in our workspace 
  • Language --> Name: reuse the project name and append a valid Java identifier as your language name, e.g. adbad.dialogScript.sample.DialogScriptDSL 
  • Language --> Extensions: Here we can define several file extensions. For now, one is sufficient. Let’s use dialog.
  • Layout --> Create SDK Feature Project: Uncheck that. It wouldn’t do any harm to have it created, but we don’t need it for this article
  • Working sets --> Add project to working sets: We can group projects in Eclipse into so called working sets. We don’t need to do that.
This is how it should look like if you want to follow the suggestions:



Hit Finish and Xtext starts working. We will be redirected to Eclipse. If you haven’t closed the welcome page yet, do it now. Xtext has created three projects for us:
  • adbad.dialogScript.sample: The actual language project. All sources regarding the language itself belong here. Note that this runtime project is independent from the user interface and Eclipse. That means we can use our language definition outside of Eclipse. 
  • adbad.dialogScript.sample.tests: A convenient project for testing our language (we won’t cover testing in this article, hence we won’t have to look at this project any further)
  • adbad.dialogScript.sample.ui: All sources related to the user interface of our language, like the rich editor, its features like content assist, highlighting, etc., and the outline view, will be stored here. The ui project is a so called Eclipse plugin, meaning that it depends on Eclipse as a platform.



All the code we are going to create should be stored within the src folders, which are available in every project. The folders src-gen resp. xtend-gen are to separate all generated code from the manually written one. Xtext takes care of that, so it’s just important to not place any manually written code in one of these folders.

Exploring the project in the package explorer shows us that there already exist some files in our language project:

  • DialogScriptDSL.xtext: This file is already opened in the editor area. It’s our central resource to define our language’s grammar. It features a grammar definition language, which has been created—you guessed it—with Xtext itself. The language resembles EBNF and it allows us to define context free grammars.
  • GenerateDialogScriptDSL.mwe2: This file is to configure Xtext, i.e. to tell Xtext what features we want to use in our language and what the framework should generate for our language. We can stick with the default settings for now. 

The Grammar Definition File (DialogScriptDSL.xtext)

Let’s see what we have so far in our grammar file to get a first idea of how to define grammars in Xtext.

HINT – Showing line numbers in Eclipse editors: In order to see line numbers in your editors

  • Select Window --> Preferences from the main menu
  • Type text editors in the filter field at the top left corner of the new window
  • Select the item Text Editors (highlighted) from below the filter field to open its configuration page
  • Check the box show line numbers (fourth from above) there and press OK

The grammar definition file starts with the “grammar” keyword, followed by the language name we defined earlier in the New Xtext Project wizard. Grammars can make use of other grammars. With the statement in line 2, we tell Xtext to use the org.eclipse.xtext.common.Terminals grammar, which comes with Xtext. It provides us with some handy language features, like:
  • Single- and multi-line comments you might know from programming languages like Java or C#. They allow users to annotate programs with additional, free text information that is ignored by the parser.
  • An ID rule that allows us to define identifiers in our language. Identifiers have a specific semantic in programming languages, and so they have in Xtext. Things like classes, variables, and methods, or in our case characters and conditions can be named using identifiers. This makes them identifiable and thus referable from other locations.
  • INT and STRING rules to use integers and strings in our language. Both rules also feature some specific semantics, as they are mapped to specific data types in our underlying meta model… wait, meta-what?
Okay, I mentioned earlier that our language needs to comply with an underlying model. In my experience, the whole model / meta-model / meta-meta-model terminology tends to be more confusing than helpful. Still, I want you to understand what’s going on in the grammar file, and I’m referring to line 4 now. So let’s try this.

Whenever we define a textual modeling language with Xtext, we define its so called concrete syntax, i.e. how the editor displays programs written in our language to us, the users. In order to work with our language programmatically, we also need an object graph that represents our language (an abstract syntax tree, AST). The parser creates this representation for us when it parses a dialog script file. The abstract syntax of our language determines how such trees can look like. But where does the abstract syntax, the structure of our language, come from? Xtext offers us two possibilities:

  • Either, we can define a concrete syntax for an existing abstract syntax by importing an abstract syntax model
  • or, we let Xtext derive the abstract syntax from our grammar definition file automatically.

The latter is done by the statement in line 4. We need to provide a name (dialogScriptDSL) for the abstract syntax model as well as a namespace URI ("http://www.dialogScript.adbad/sample/DialogScriptDSL") that makes it referable.

The remaining part of the file is to define the actual grammar of our language. Lines 6-10 show two grammar rules that already define a simple language. Now, instead of diving into the details of defining context-free grammars with Xtext, let’s approach it pragmatically:

Defining a language can be seen as a top-down process, where you divide your language step-by-step into its components until you have defined all its tokens. The rules in the grammar language allow us to do exactly that.

Again, we are not able to discuss the grammar file for the used example in detail in this article (instead, I made a screencast), but the important part is to understand that this file is the central resource where we define our language’s concrete syntax.



You can copy the rules from the provided grammar file to your own grammar file. I inserted some comments that help you understand the grammar.

Creating the language infrastructure and editor

In order to use our language, we need to make Xtext generate the infrastructure for it. We can do this by opening the context menu somewhere in the Xtext grammar editor and selecting Run As --> Generate Xtext Artifacts. This invokes the GenerateDialogScriptDSL.mwe2 workflow and generates the infrastructure for our language according to the information provided in the workflow file. This might take some seconds, and you might be asked in the Console View at the bottom to download the ANTRL parser generator, which is necessary, so enter ‘y’ in order to proceed.
 

Running the editor

After that, we can launch a new Eclipse instance from within our current environment. That instance will contain our language, plus the editor as an Eclipse plugin, so we can try it out immediately. We need to select the run configuration first (we need to do this only once). To do so, select Run Configuration … from the Run toolbar menu (click the small black arrow pointing down):

 
The Run Configurations dialog appears. There, you can see an Eclipse Application entry on the top left side to create a new configuration. Grouped below, we find the configuration we want to start. It is called Launch Eclipse Runtime. Select it and hit the Run button at the bottom. (Note that I renamed it to DialogScriptRunner):


This starts a new Eclipse instance just from within our development environment. Note that, from now on, you can always directly select the DialogScriptRunner configuration to directly start Eclipse!



In order to test the language and the editor, create a new project, e.g. by pressing CTRL+N and then selecting the wizard of your choice (I often use the General Project Wizard without any bells and whistles).



Give the project a name, like sample, and feel free to add sub-folders using the project’s context menu in the Package Explorer. You can now add a new dialog script, again using the project’s context menu:


It is important to explicitly state the file extension of our language when naming the dialog script file. Since we defined dialog as our file extension back when we have created the Xtext project, we might just name our first file sample.dialog. Now, you should be asked whether or not we want to add the Xtext nature to your project, and since we want to have the full Xtext support in our sample project, we sure do.

The editor that Xtext has created for us is used by default now whenever we open *.dialog files, and we can already test our language and editor. Actually, the language already features everything we saw in the first screen cast. It is always great to see how many tooling features are provided by default. You can try out the features that I show in the first screencast now for yourself.

Note that we just edited the grammar file so far, and we receive a fully-fledged editor in combination with our language.

Summing up

Game developers use a multitude of development tools for all kinds of purposes. Due to the individual requirements, it is rare that the same ecosystem of tools is used twice. Instead, developers often are in need of introduce new tools to address the specialties a game project and in most cases there is little to none budget for that. Especially when it comes to non-technical domains, like game design or writing, makeshift solutions like screenwriting or office software are often the status quo to describe how a game should ‘work’.

Technologies like Xtext support tool smiths in creating their own programming languages with a corresponding development environment. We explored a very basic example that shows—just by defining a single language grammar file—how Xtext provides us with a complete language runtime as well as a rich editor. The additionally provided application gives you an impression of how the development environment can be enhanced, and I’m looking forward to provide you with some subsequent articles on how to do that.

With great powers ..

[This is a reprint of an article I wrote in 2011 for AltDevBlogADay, which ceased to exist, which is why I felt like putting it on my blog. No doubt, I would write this article completely differently today, but I think it is actually kind of nice to preserve my perspective I had back then by not altering the article.]

Once, James Gosling (inventor of Java) was asked what he'd change if he could do Java over again. He replied: "I'd leave out classes". I've read about this in this —kind of controversial— article by Allen Holub: Why extends is evil.

But to set things clear: I don't want to start the same discussion here as the article got ("he's so wrong, 'extends' rulez!" vs. "he's absolutely right, worship 'implements'!"), and I'm pretty sure it wasn't Holub's intension either. Anyway, Gosling explained right after his statement that he actually addresses implementation inheritance to be the problem, not classes in general.

Now, when I recap my own programming education, I remember that object-orientation was always taught as something that is strongly connected to the mechanism of inheritance (which is not necessarily wrong, but only part of the truth).

And, talking to my students nowadays highlights the same issues I had back then: It is hard for novices to differentiate between implementation inheritance (as a reuse mechanism) and interface inheritance (as a software design mechanism), especially when you learn OO with Java or C++, where implementation inheritance always comes with interface inheritance automatically (reusing a class' implementation by extending it implicitly means that you inherit its interface).

So, soon you got statements like: "Why should I use explicit interfaces anyway?"… or, "I don't get the idea of interfaces, I use inheritance instead". What's more, other important aspects of object-orientation, like polymorphism, are also intertwined with inheritance in statically-typed languages (nothing to blame them for, it's just how it works).

My point here is, that this — in the minds of programming novices often, and in the minds of veterans often enough — leads to a simplified relationship: "inheritance is object-orientation"… which we could display in UML like this:

Beware! Not true!

In this post, I would like to introduce and discuss the fragile base class problem (FBCP). I think, it is a very good showcase why the introduction of an explicit interface concept in Java or C# has its reasons, but, first and foremost, I hope that it will illustrate how tricky your code can get when you use implementation inheritance (strong coupling). I also hope that this is not only interesting for the novices among us ;).

Note that the examples are dead simple and not good quality code, but intended to highlight the basics of the FBCP. If you are interested in getting a deeper insight, I recommend the paper A Study of The Fragile Base Class Problem.

Let us imagine the following classes, where the Collection class is part of a framework (base system) and the CountingCollection class is part of an extension somewhere else (sub system):

// Version 1
import java.util.ArrayList;


public class Collection {

 ArrayList data = new ArrayList();

 public void addItem (Object item) {
  data.add(item);
 }

 public void addRange (ArrayList items) {
  for(Object o : items) {
   this.addItem(o);
  }
 }
}
 
public class CountingCollection extends Collection {
 int n = 0;

 public void addItem (Object item) {
  n++;
  super.add(item);
 }

 public int getSize() {
  return n;
 }
}

The Collection class represents a collection of items and you can add either a single item or a range of them. The extension, CountingCollection, adds a counter variable to be aware the number of added items. Everything works as intended.

Now, after a revision of the base system, the base class got changed.


// Version 2
import java.util.ArrayList;


public class Collection {
 ArrayList data = new ArrayList();

 public void addItem (Object item) {
  data.add(item);
 }

 public void addRange (ArrayList items) {
  // revised
  data.addRange(items)
 }
}

This change is, considering the base system, valid, since it does not change the externally observable behavior of objects of type Collection. However, it breaks the sub system. This is because the subclass relies on the self-call in the first version of the base system in line 13, which means that it relies on the internal behavior (the implementation) of Collection. Here we face the FBCP.

Having a more general look at this circumstance, it means that "any open system applying code inheritance and self-recursion in an ad-hoc manner is vulnerable to this problem."

The fact that the immediate cause and the observable effect of the FBCP can spread between different systems makes it hard to track down, though the goal should be to avoid its occurrence in the first place. But how?

Well, in their above mentioned study, the authors introduce a flexibility property that must not be harmed by the programmers in order to avoid the FBCP. In short, the property describes that a modification M to a class C (the actual extension) must remain a valid refinement of C when applied to a refined version of C (C' in the figure below; mod reads "modifies").


Flexibility Property to avoid the FBCP

This is a bit theoretical, but in the essence it means that it is the duty of the programmer to ensure that everything's coded fine; in the end, everyone can easily google for things like the Open-Closed Principle, can't we?

Let's take a more cynical or maybe naive perspective while looking at the upcoming example. It is also borrowed from the mentioned study, and it is only one of five examples that show orthogonal aspects of the FBCP, making it more than a trivial thing.


public class BaseClass {

 int x = 0;

 public void aMethod() {
  x = x+1;
 }

 public void anotherMethod() {
  x = x+1;
 }
}
 
public class SubClass {

 public void anotherMethod() {
  this.aMethod();
 }
}
 
//New base class
public class BaseClass {
 
 int x = 0;

 public void aMethod() {
  this.anotherMethod();
 }

 public void anotherMethod() {
  x = x+1;
 }
}

This example highlights the aspect of "Unanticipated Mutual Recursion", and it could make us ask "Why do modern languages even allow that these problems can arise?", or in other words "Why don't we have languages that eliminate such issues by definition?"

Well, on the one hand, there are code validation and checking tools that already support us programmers in writing good quality code. But I don't think that, especially considering the last example, tools are able to detect fragile base classes automatically.

On the other hand, the questions address something that accompanies the history of programming from the very beginning. Take pointers, for example. In the hands of an expert powerful weapon, but amateurs can do horrible things (while having good intensions!). And every one of us knows a guy who still swears that Algol 60 is the best language ever.

So, maybe there will be a new language in the near future that explicitly separates implementation inheritance and interface inheritance (and maybe no one will consider it useful), but until then, we, as lecturers and senior programmers, need to make sure that the upcoming generation of programmers is aware of the dangers in implementation inheritance and that they understand object-orientation more like this:


Object-Orientation how it should be considered
In the end, it is just like Stan Lee once said: "With great power there must also come — great responsibility!"