Javascript Syntax Highlighting

I occasionally post samples of source code in a handful of languages on this site; to present these in the most readable form I started looking for client-side syntax highlighting scripts, always with an eye towards matching the behavior of emacs and its major modes. I made attempts with the most popular choices in this area, but was disappointed with them for various reasons:

  • google-code-prettify did not highlight C correctly; it marked types (char, for example) as keywords sometimes, and it's not actually hosted on the google CDN, which I would think is one of the major reasons to use a google project for something like this.
  • highlight.js relies on a mysterious process of automatically "detecting" the language used in a particular code block, which failed in my test case when it mismarked C as perl. Despite that, it comes with some nice color themes and, unlike google-code-prettify, is hosted externally, although I would be more comfortable if it was hosted on a major CDN like google's.
  • SyntaxHighlighter has extensive language support and uses an autoloader to load only the scripts necessary for highlighting the languages contained in the current page, a significant point in its favor. However, the html produced by this script is heavy with non-semantic <div>'s (one for each line!) and wraps every highlighted token in its own <code> tags. I prefer client-side scripts not to throw semantics out the window when possible; one <span> with appropriate class names for each token should be sufficient.

But just as I was about to turn away from this project and put syntax highlighting on the shelf for awhile, I found another script: Rainbow, by Craig Campbell. This library makes heavy use of regular expressions, which I'm fond of, and seems to have been designed from the beginning for ease of extensibility. It's also hosted on GitHub, and I've already taken advantage of that to make some small contributions to the project. Rainbow is relatively young and doesn't support very many languages as yet, it's true, but this script is so easy to extend that I don't anticipate that will be much of a problem—I'll probably just write a new mode myself if I need one.

For testing purposes, here's a simple C implementation of Unix's tee command, one of the homework problems in Michael Kerrisk's The Linux Programming Interface (which I highly recommend):

/* Simple implementation of tee */	
#include <stdio.h>
#include <stdarg.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#ifndef BUF_SIZE
#define BUF_SIZE 1024
#endif

static void die (const char * format, ...)
{
    va_list vargs;
    va_start(vargs, format);
    vfprintf(stderr, format, vargs);
    fprintf(stderr, ".\n");
    va_end(vargs);
    _exit(1);
}

int main (int argc, char *argv[])
{
    int outFD, opt, openFlags = O_WRONLY;
    char buf[BUF_SIZE];
    ssize_t charCount;

    while ((opt = getopt(argc, argv, ":a")) != -1) {
	switch (opt) {
	case 'a': 
	    openFlags |= O_APPEND;
	default:
	    die("Unrecognized option");
	}
    }

    outFD = open(argv[1], O_WRONLY);
    while ((charCount = read(STDIN_FILENO, buf, BUF_SIZE) > 0)) {
	if (charCount != write(STDOUT_FILENO, buf, BUF_SIZE))
	    die("Couldn't write same number of bytes to stdout");
	if (charCount != write(outFD, buf, BUF_SIZE))
	    die("Couldn't write same number of bytes to output file");
    }
    close(outFD);

    return 0;
}

Looks good to me!

Add a Comment

Archives: