| xah lee ( @ 2008-11-20 13:58:00 |
| Entry tags: | emacs, lisp, major mode, syntax highlighting |
How To Write A Emacs Major Mode For Syntax Coloring
• How To Write A Emacs Major Mode For Syntax Coloring http://xahlee.org/emacs/elisp_syntax_co
plain text version follows. (lisp code formatting may be screwed up)
-------------------
How To Write A Emacs Major Mode For Syntax Coloring
Xah Lee, 2008-11
This page gives a practical example of writing a emacs major mode to do syntax coloring of your own language. You should have at least few months experience of coding emacs lisp.
The Problem
Your company uses its own in-house language. You want to write a major mode for that language, so that the keywords of the language will be highlighted.
Solution
Suppose your language source code looks like this:
Sin[x]^2 + Cos[y]^2 == 1 Pi^2/6 == Sum[1/x^2,{x,1,Infinity}]
You want the words “Sin”, “Cos”, “Sum”, colored as functions, and “Pi” and “Infinity” colored as constants.
Here's how you define the mode:
(setq myKeywords '(("Sin\\|Cos\\|Sum" . font-lock-function-name-face) ("Pi\\|Infinity" . font-lock-constant-face) ) )
(define-derived-mode math-lang-mode fundamental-mode (setq font-lock-defaults '(myKeywords))) The string “"Sin\\|Cos\\|Sum"” is a regex, the “font-lock-function-name-face” is a pre-defined variable that holds the value for the default font face used for functions.
The line “define-derived-mode” defines your mode, named math-lang-mode, based on the fundamental-mode (which is the most basic mode). The line (setq font-lock-defaults '(myKeywords)) tells emacs that when your mode is active, the syntax coloring should be set according to your keywords.
That's all there is to it. Now, when you invoke “math-lang-mode”, emacs will now syntax color the buffer's text. (you must have font-lock-mode on, if not, do “Alt+x font-lock-mode”.) Here's what it looks like:
Sin[x]^2 + Cos[y]^2 == 1 Pi^2/6 == Sum[1/x^2,{x,1,Infinity}] O My GOD, Emacs is beautiful!
(info "(elisp)Font Lock Mode") (info "(elisp)Major Modes") (info "(elisp)Faces for Font Lock") More Complex Example
Typically, a language may have hundreds of keywords. Elisp provide a way to generate regex for your keywords.
Suppose the you are writing a mode for the Linden Scripting Language↗, which has close to 6 hundred keywords. Here's a example of how to code it.
;; define several class of keywords (defvar mylsl-keywords '("break" "default" "do" "else" "for" "if" "return" "state" "while") "LSL keywords.")
(defvar mylsl-types '("float" "integer" "key" "list" "rotation" "string" "vector") "LSL types.")
(defvar mylsl-constants '("ACTIVE" "AGENT" "ALL_SIDES" "ATTACH_BACK") "LSL constants.")
(defvar mylsl-events '("at_rot_target" "at_target" "attach") "LSL events.")
(defvar mylsl-functions '("llAbs" "llAcos" "llAddToLandBanList" "llAddToLandPassList") "LSL functions.") In the above, first we define several lists, each one is a class of keywords in the language. Note that the keyword list in the above is truncated. Each list can have hundreds of keywords.
;; create the regex string for each class of keywords (defvar mylsl-keywords-regexp (regexp-opt mylsl-keywords 'words)) (defvar mylsl-type-regexp (regexp-opt mylsl-types 'words)) (defvar mylsl-constant-regexp (regexp-opt mylsl-constants 'words)) (defvar mylsl-event-regexp (regexp-opt mylsl-events 'words)) (defvar mylsl-functions-regexp (regexp-opt mylsl-functions 'words)) In the above, we generate the regex for each keyword class, using the built-in function “regexp-opt”. We gave regexp-opt a second optional argument “'words”. This will create a regex to match whole word only. So that, when a word is contained inside a longer word, it will not be highlighted. (For example, “for” is usually a looping keyword, but if you have a user created function named “inform”, you don't want part of the word colored as “for”.)
(info "(elisp)Regexp Functions")
;; create the list for font-lock. each class of keyword is given a particular face (setq mylsl-font-lock-keywords `( (,mylsl-type-regexp . font-lock-type-face) (,mylsl-constant-regexp . font-lock-constant-face) (,mylsl-event-regexp . font-lock-builtin-face) (,mylsl-functions-regexp . font-lock-function-name-face) (,mylsl-keywords-regexp . font-lock-keyword-face) note: order above matters. “mylsl-keywords-regexp” goes last because otherwise the keyword “state” in the function “state_entry” would be highlighted. )) In the above, we create a list in preparation to feed it to “font-lock-defaults”.
Note that the highlighting mechanism of font-lock-defaults is based on first-come-first-serve basis, and once a piece of text got its coloring, it won't be changed. So, the order of your list is important. Make sure the smallest lengthed text goes last. (this won't fix all cases where a keyword matches part of other keywords. If your language has a lot such keywords, you need to use other forms to solve this problem. (info "(elisp)Search-based Fontification"))
The “`( ,a ,b ...)” is a lisp special syntax to evaluate parts of element inside the list. Inside the paren, elements preceded by a “,” will be evaluated.
Finally, we define our mode like this:
;; define the mode (define-derived-mode mylsl-mode fundamental-mode "lsl mode" "Major mode for editing LSL (Linden Scripting Language)..." ;; ...
;; code for syntax highlighting (setq font-lock-defaults '((mylsl-font-lock-keywords)))
;; ... ) In the above, we based our mode on fundamental-mode, which is the most basic mode. If you are actually writing a mode for LSL, it makes sense to base it on c-mode, since the syntax is similar. Basing on a similar language's mode will save you time in coding many features, such as handling comment and indentation.
Also, the above code only covers syntax coloring. A full featured major mode will also have commands to handle comments, indentation, keyword completion, function documentation lookup, function templates, graphical menus, or any other features.