This post is about handcrafting an atom package to highlight syntax of your language. We’ll walk through the whole process from scratch: creating package template, writing syntax rules, adding snippets for your language, and along the way, we discuss the workflow of developing Atom plugins and a mindset shift of the development in the web world.
Audience is whoever wants to write Atom packages. This post is mainly about writing syntax rules; It’s easy enough that it doesn’t require much knowledge about Atom. If you have no idea about selector (jQuery selectors), you will have no difficulty understanding this post, but may struggle on coming posts about writing Atom packages that provide interactive features.
A Brief Introduction to Pollen Grammar
We’ll work on one simple language called
- Pollen files normally have pm, pp, p as extension
- First line always starts with #lang pollen.
- Pollen uses tags to annotate text. Tags all start with command char ◊.
- A tag may or may not enclose a piece of text. Tag uses braces to enclose text.
Here is an example:
◊ul{
◊li{First line always starts with ◊v{#lang pollen}.}
◊li{Pollen uses ◊em{tags} to annotate text. Tags all start with ◊em{command char} ◊v|{◊}|.}
◊li{A tag may or may not enclose a piece of text. Tag uses braces to enclose text.}
}
Note that
- Tags can be nested
- {} and |{}| and are both used to enclose text
Tools and Workflow
Atom provides command-line tool apm for developers. Before going into details, you should learn what apm can do by running
apm --help
apm help <command>
In this post, we’ll use apm init to generate a language package for Pollen and then install that package locally (for development).
apm init -l pollen
cd language-pollen
apm install --link
Now everything is in place.
The workflow is to have two windows open: one for source code editing, the other for visual test on a Pollen file. You reload only the testing window (Run Window: Reload in Command Palette) when you want the code to take effect.
Identify Pollen Source Code
Atom manual never mentions this, but Atom uses exacty what Textmates uses for defining language grammars. So the official reference for handcrafting a language package is
There are two ways to help Atom identify a language for a newly opened file: file extension and first line patterns.
Let’s edit language-pollen.cson in the grammars folder.
name: "Pollen"
scopeName: "source.pollen"
fileTypes: [
"pm",
"pp",
"ptree",
"p"
]
This configure lets Atom identify Pollen source using file extensions (listed in fileTypes). scopeName specifies a naming scope to use later. If this is not concrete enough, think of it as C++ namespace, Java Package. scopeName will be useful when we want our other packages to run only under a certain language.
Now reload our pollen file and see if Atom can identify it as Pollen source (bottom right cornor now is supposed to show “Pollen”).
Grammars
Now it’s the time to skim the reference.
From a quick skim, we know there are two language rules (patterns to match language components), and naming conventions. Even though you’ve gotten the official reference, I’d like to show you a few examples in the following subsections to speed up the learning process.
highlight invalid command char
In a pollen program, it’s quite often to have a dangling command char. Which of the two rules is efficient to highlight such command chars? match rule! Which name should we give it? If you scroll down the reference, you’ll see that among the 11 names, invalid.illegal is what we want.
{
match: "◊\\s+"
name: "invalid.illegal"
}
highlight command char
{
match: "◊[^\\s\\{\\}\\(\\)\\[\\]#\\|,\"]+"
captures: {
0: { name: 'entity.name.function.pollen' }
}
}
Is this good enough? Yes, for now. When we come to writing autocomplete plugin, we’ll revisit the syntax theme again, specifically all the name. Names are not just for styling, in JavaScript world, name is also for identification.
highlight multi-line comments
Here are two kinds of multiline comments:
◊;{ multiline comment }
◊;|{ multiline comment }|
Which of the two rules are we supposed to use? begin-end rule! Because we want to give all lines between the braces the same name. And which name to use? comment of course! Here we are
{
begin: "◊;\\{"
end: "\\}"
name: "comment.multiline.pollen"
}
If you tested a bit, you will see unbalanced brace matching for the following cases.
◊;{How to highlight ◊em{this} one?}
The first } would close the comment. Here comes to patterns.
handle nested braces
To handle nested braces, we can create a pattern and pointing to itself.
repository:
braces:
begin: "\\{"
end: "\\}"
patterns: [
{ include: "#braces" }
{ include: "$self" }
]
And then update the rule to
{
begin: "◊;"
end: "\\s"
patterns: [
{ include: "#special_brace" }
{ include: "#braces" }
]
name: "comment.multiline.pollen"
}
Final touch
Your package is now fully working. There is some optional work to do if you want.
In settings directory, you can create language-xxx.cson file to initialize some options. Pollen is a language for writers, so softWrap makes sense
'.source.pollen':
'editor':
'softWrap': true
In snippets folder, you can create language-xxx.cson to include code snippets specifically for your language. Pollen’s command char is not easy to type, so that goes into the following snippets
'.source.pollen':
'Command Char':
'prefix': '@'
'body': '◊$1{}'
Discussion
$self
You need to include self here to support nested tags.
repository:
braces:
begin: "\\{"
end: "\\}"
patterns: [
{ include: "#braces" }
{ include: "$self" }
]
Mixing language highlights
Pollen actually supports racket code if parens are used. For example
◊(require "config/ocaml-internals.rkt")
Here is a way to highlight the tag and then let the racket package to highlight what’s inside the parens.
{
begin: "◊\\("
end: "\\)"
patterns: [
{ include: "source.racket" }
]
}
A mindset shift
If you haven’t written editor plugins using the web tech, be advised to shift your mindset. Take this example,
To highlight syntax in CodeMirror, you write the tokenizer and parser. You manipulate the parsing state to implement new features like tracking braces, validating the correctness of syntax.
To highlight syntax in Emacs, you write a major mode in Emacs’s framework. You provide your features in the major mode.
Working with Atom is different. The package for highlighting syntax is different from packages for interactive interface. Therefore, you need at least two Atom packages for the language you care–one for grammar highlighting (also called syntax theme), the other for actual functions.
Problems?
This design of language syntax theme is good enough for most languages. However, the fixed parsing rule has problems for Pollen.
Pollen is a flexible DSL. The command char can be any characters. In this case, it doesn’t make sense to have fixed regexp rules specifying tokens, as tokens are supposed to be constructed dynamically.
Where to go from here
Writing syntax theme doesn’t require much knowledge about how things are working in atom. The language-pollen package is on