How to Write Your Atom Language Package

This post is about handcrafting an atom package to highlight syntax of your language. We’ll walk through the whole process from scratch: creating package template, writing syntax rules, adding snippets for your language, and along the way, we discuss the workflow of developing Atom plugins and a mindset shift of the development in the web world.

Audience is whoever wants to write Atom packages. This post is mainly about writing syntax rules; It’s easy enough that it doesn’t require much knowledge about Atom. If you have no idea about selector (jQuery selectors), you will have no difficulty understanding this post, but may struggle on coming posts about writing Atom packages that provide interactive features.

A Brief Introduction to Pollen Grammar

We’ll work on one simple language called https://docs.racket-lang.org/pollenPollen. Pollen is a DSL for document processing. The main elements of Pollen are

Pollen files normally have pm, pp, p as extension
First line always starts with #lang pollen.
Pollen uses tags to annotate text. Tags all start with command char ◊.
A tag may or may not enclose a piece of text. Tag uses braces to enclose text.

Here is an example:

◊ul{
  ◊li{First line always starts with ◊v{#lang pollen}.}
  ◊li{Pollen uses ◊em{tags} to annotate text. Tags all start with ◊em{command char} ◊v|{◊}|.}
  ◊li{A tag may or may not enclose a piece of text. Tag uses braces to enclose text.}
}

Note that

Tags can be nested
{} and |{}| and are both used to enclose text

Tools and Workflow

Atom provides command-line tool apm for developers. Before going into details, you should learn what apm can do by running

apm --help
apm help <command>

In this post, we’ll use apm init to generate a language package for Pollen and then install that package locally (for development).

apm init -l pollen
cd language-pollen
apm install --link

Now everything is in place.

The workflow is to have two windows open: one for source code editing, the other for visual test on a Pollen file. You reload only the testing window (Run Window: Reload in Command Palette) when you want the code to take effect.

Again, remember to reload your Atom.

Identify Pollen Source Code

Atom manual never mentions this, but Atom uses exacty what Textmates uses for defining language grammars. So the official reference for handcrafting a language package is http://manual.macromates.com/en/language_grammarsat macromates website.

There are two ways to help Atom identify a language for a newly opened file: file extension and first line patterns.

Let’s edit language-pollen.cson in the grammars folder.

name: "Pollen"
scopeName: "source.pollen"
fileTypes: [
  "pm",
  "pp",
  "ptree",
  "p"
]

scopeName is documented in Atom Flight Manual. All it does is to define css class names that sit at the top and guard all DOMs of the editor. Also see below how we use scopeName for mixing languages

This configure lets Atom identify Pollen source using file extensions (listed in fileTypes). scopeName specifies a naming scope to use later. If this is not concrete enough, think of it as C++ namespace, Java Package. scopeName will be useful when we want our other packages to run only under a certain language.

Now reload our pollen file and see if Atom can identify it as Pollen source (bottom right cornor now is supposed to show “Pollen”).

Grammars

Now it’s the time to skim the reference.

From a quick skim, we know there are two language rules (patterns to match language components), and naming conventions. Even though you’ve gotten the official reference, I’d like to show you a few examples in the following subsections to speed up the learning process.

highlight invalid command char

In a pollen program, it’s quite often to have a dangling command char. Which of the two rules is efficient to highlight such command chars? match rule! Which name should we give it? If you scroll down the reference, you’ll see that among the 11 names, invalid.illegal is what we want.

{
  match: "◊\\s+"
  name: "invalid.illegal"
}

highlight command char

{
  match: "◊[^\\s\\{\\}\\(\\)\\[\\]#\\|,\"]+"
  captures: {
    0: { name: 'entity.name.function.pollen' }
  }
}

Is this good enough? Yes, for now. When we come to writing autocomplete plugin, we’ll revisit the syntax theme again, specifically all the name. Names are not just for styling, in JavaScript world, name is also for identification.

highlight multi-line comments

Here are two kinds of multiline comments:

◊;{ multiline comment }
◊;|{ multiline comment }|

Which of the two rules are we supposed to use? begin-end rule! Because we want to give all lines between the braces the same name. And which name to use? comment of course! Here we are

{
  begin: "◊;\\{"
  end: "\\}"
  name: "comment.multiline.pollen"
}

If you tested a bit, you will see unbalanced brace matching for the following cases.

◊;{How to highlight ◊em{this} one?}

The first } would close the comment. Here comes to patterns.

handle nested braces

To handle nested braces, we can create a pattern and pointing to itself.

repository:
  braces:
    begin: "\\{"
    end: "\\}"
    patterns: [
      { include: "#braces" }
      { include: "$self" }
    ]

And then update the rule to

{
  begin: "◊;"
  end: "\\s"
  patterns: [
    { include: "#special_brace" }
    { include: "#braces" }
  ]
  name: "comment.multiline.pollen"
}

Final touch

Your package is now fully working. There is some optional work to do if you want.

In settings directory, you can create language-xxx.cson file to initialize some options. Pollen is a language for writers, so softWrap makes sense

'.source.pollen':
  'editor':
    'softWrap': true

In snippets folder, you can create language-xxx.cson to include code snippets specifically for your language. Pollen’s command char is not easy to type, so that goes into the following snippets

'.source.pollen':
  'Command Char':
    'prefix': '@'
    'body': '◊$1{}'

Discussion

$self

You need to include self here to support nested tags.

repository:
  braces:
    begin: "\\{"
    end: "\\}"
    patterns: [
      { include: "#braces" }
      { include: "$self" }
    ]

Mixing language highlights

Pollen actually supports racket code if parens are used. For example

◊(require "config/ocaml-internals.rkt")

Here is a way to highlight the tag and then let the racket package to highlight what’s inside the parens.

{
  begin: "◊\\("
  end: "\\)"
  patterns: [
    { include: "source.racket" }
  ]
}

A mindset shift

If you haven’t written editor plugins using the web tech, be advised to shift your mindset. Take this example,

To highlight syntax in CodeMirror, you write the tokenizer and parser. You manipulate the parsing state to implement new features like tracking braces, validating the correctness of syntax.

To highlight syntax in Emacs, you write a major mode in Emacs’s framework. You provide your features in the major mode.

Working with Atom is different. The package for highlighting syntax is different from packages for interactive interface. Therefore, you need at least two Atom packages for the language you care–one for grammar highlighting (also called syntax theme), the other for actual functions.

Problems?

This design of language syntax theme is good enough for most languages. However, the fixed parsing rule has problems for Pollen.

Pollen is a flexible DSL. The command char can be any characters. In this case, it doesn’t make sense to have fixed regexp rules specifying tokens, as tokens are supposed to be constructed dynamically.

Where to go from here

Writing syntax theme doesn’t require much knowledge about how things are working in atom. The language-pollen package is on https://github.com/lijunsong/language-pollengithub. It also handles the special brace case where |{}| is used. Go rolling your own language packages in Atom!