Trick to use CoffeeScript in Hexo

Hexo has a scripts folder, and the files under which are loaded by Hexo when start. I usually uses this folder as the development folder for my plug-in scripts. And extract them into independent package after polished it into package-ready quality.

Usually, the files under scripts should be javascripts. But as I’m a fan of Coffee Script, so I wish to use coffee-script to write the plug-ins. For the formal package, I compile the coffee scripts into javascripts before release. But for development, I wish to use coffee script directly.

In node.js, it is possible to require coffee-script directly, if you registered the coffee-script runtime compiler:

1
require('coffee-script/register');

And as how node.js require function is implemented, you cannot register coffee-script runtime compiler in .coffee file. Or the compiler will complain:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[error] HexoError: Script load failed: plugin.coffee
SyntaxError: Unexpected string
at Module._compile (module.js:439:25)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Module.require (module.js:364:17)
at require (module.js:380:17)
at /usr/local/lib/node_modules/hexo/lib/loaders/scripts.js:17:11
at Array.forEach (native)
at /usr/local/lib/node_modules/hexo/lib/loaders/scripts.js:15:13
at /usr/local/lib/node_modules/hexo/lib/util/file2.js:339:7
at done (/usr/local/lib/node_modules/hexo/node_modules/async/lib/async.js:135:19)
at /usr/local/lib/node_modules/hexo/node_modules/async/lib/async.js:32:16
at /usr/local/lib/node_modules/hexo/lib/util/file2.js:335:11
at Object.oncomplete (evalmachine.<anonymous>:107:15)

Theoretically, it is possible to put the coffee-script registration code in a javascript file and also put it under /scripts folder. SO Hexo will load it when start-up.

Well, this approach doesn’t really work. If you try it, it is very likely to get the exactly same error as before. The reason is related to Hexo implementation. Hexo uses Scripts Loader to require files under /scripts. The loader doesn’t really provide an explicit way to specify which file is loaded before another. So the registration file is guaranteed to be loaded before any .coffee.

So far, it seems that there is no cure to this problem! But actually it does. There is undocumented feature will help to solve this issue.

Hexo uses [Script Loader] to load the scripts. In Scripts Loader use hexo.util.file2 to populate the source files under /scripts. And hexo.util.file2 use fs.readdir to actully populate the file system entries. For fs.readdir, there is a undocumented feature, that the populated entries are actually sorted alphabetically, which means a.js is loaded before b.coffee.

With this feature, we can put our coffee-script registration in a file with lower alphabetic-order name. Personally, I’d like called ___register_coffeescript.js, since _ is smaller than any letter or number.

⚠️WARNING: fs.readdir yielding sorted files is an undocumented behavior, which means it is not guaranteed either to work across platforms or not get changed in the future. So for, it works on Mac, and I expect it behaves similar on Linux. But not sure about Windows, since fs uses a different native binding on Windows.

Remove Bower from your build script

The mysterious broken build

This morning, our QA told us that knockout, a javascript library that we used in our web app is missing on staging environment. Then we checked the package she got from CI server, and the javascript library was indeed not included. But when we tried to generate the package on our local dev box, we found that knockout is included.

It is a big surprise to us, because we share the exact same build scripts and environment between dev-boxes and CI agents and because we manage the front-end dependencies with bower. In our gulp script, we ask bower to install the dependencies every time to make sure they are up to date.

The root cause of the broken build

After spending hours on diagnosing the CI agents, we finally figure out the reason, a tricky story:

When the Knockout maintainer released the v3.1 bower package, they made a mistake in bower.json config file, which packaged the spec folder instead of the dist folder. So this package is actually broken, because the main javascript file dist/knockout.js , described in bower.json doesn’t exist.

Later, the engineers realized they made a mistake, and they fixed the issue by releasing a new package. Maybe they think they haven’t changed any script logic, so they release the new package under the same version number, which is the criminal who broke our builds.

We’re so unlucky that the broken package is downloaded on our CI server when our build script was executed there for the first time. And the broken package is stored in bower cache at that time.

Because of Bower’s cache mechanism, the broken package is used unless the version is bumped or cache is expired. This is the reason why our build is broken on the CI server.

But on our dev box, for some reason, we had run bower cache clean, which invalidated the cache. So we have a good build on our local dev box. This is the reason why we can generate good package on our dev box.

It is a very tricky issue when using bower to manage dependencies. Although it is not completely our fault, but it is kind of the worst case then we can face. The build broke silently, there were no error logs or messages that helped to figure out the reason. (Well, we haven’t got a chance to setup the smoke test for our app yet, so it could be kind of our fault.)

We thought we had been careful enough to clean the bower_components folder every time, but that prevented us from figuring out the real cause.

After fixing this issue, discussed with my pair Rafa and we came up some practices that could be helpful to avoid this kind of issue:

Best practices

  • Avoid bower install or any equivalent step (such as gulp-bower, grunt-bower, etc.) in the build script
  • Check bower_components into the code repository or download the dependencies from our self managed repository for large projects.
  • When dependencies are changed, manually install them and make sure they’re good.

After doing this, our build script runs even faster, because we don’t need to check all dependencies are up-to-date every time. This is a bonus from removing bower install from our build script.

Some thoughts on the package system

Bower components are maintained by the community, and there is no strict quality control to ensure the package is bug-free or being released in an appropriate way. So it could be safer if we can check them manually, and lock them down across environments.

This could be common issue for all kind of community managed package system. Not just Bower, it could be Maven, Ruby Gem, Node.js package, Python pip package, nuget package or even Docker containers!

Use Jade as client-side template engine

Jade is a powerful JavaScript HTML template engine, which is concise and powerful. Due to its awesome syntax and powerful features, it almost become the default template engine for node.js web servers.

Jade is well known as a server-side HTML template, but actually it can also be used as a client-side template engine, which is barely known by people! To have a better understanding of this issue, firstly we should how Jade engine works.

When we’re translating a jade file into HTML, Jade engine actually does 2 separate tasks: Compiling and Rendering.

Compiling

Compiling is almost a transparent process when rendering jade files directly into HTML, including, rendering a jade file with jade cli tool. But it is actually the most important step while translating the jade template to HTML.
Compiling will be translate the jade file into a JavaScript function. During the process, all the static content has been translated.

Here is simple example:

Jade template
1
2
3
4
5
6
7
8
9
10
11
12
13
doctype html
html(lang="en")
head
title Title
body
h1 Jade - node template engine
#container.col
p You are amazing
p.
Jade is a terse and simple
templating language with a
strong focus on performance
and powerful features.
Compiled template
1
2
3
4
5
6
function template(locals) {
var buf = [];
var jade_mixins = {};
buf.push('<!DOCTYPE html><html lang="en"><head><title>Title </title></head><body><h1>Jade - node template engine</h1><div id="container"class="col"> <p>You are amazing</p><p>Jade is a terse and simple\ntemplating language with a\nstrong focus on performance\nand powerful features.</p></div></body></html>');
return buf.join("");
}

As you can see, the template is translated into a JavaScript function, which contains all the HTML data. In this case, since we didn’t introduce any interpolation, so the HTML content has been fully generated.

The case will become more complicated when interpolation, each, if statement is introduced.

Jade template with interpolation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
doctype html
html(lang="en")
head
title =title
body
h1 Jade - node template engine
#container.col
ul
each item in items
li= item
if usingJade
p You are amazing
else
p Get it!
p.
Jade is a terse and simple
templating language with a
strong focus on performance
and powerful features.
Compiled template with interpolation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
function template(locals) {
var buf = [];
var jade_mixins = {};
var locals_ = locals || {}, items = locals_.items, usingJade = locals_.usingJade;
buf.push('<!DOCTYPE html><html lang="en"><head><title>=title </title></head><body><h1>Jade - node template engine</h1><div id="container"class="col"></div><ul>');
(function() {
var $$obj = items;
if ("number" == typeof $$obj.length) {
for (var $index = 0, $$l = $$obj.length; $index < $$l; $index++) {
var item = $$obj[$index];
buf.push("<li>" + jade.escape(null == (jade.interp = item) ? "" : jade.interp) + "</li>");
}
} else {
var $$l = 0;
for (var $index in $$obj) {
$$l++;
var item = $$obj[$index];
buf.push("<li>" + jade.escape(null == (jade.interp = item) ? "" : jade.interp) + "</li>");
}
}
}).call(this);
buf.push("</ul>");
if (usingJade) {
buf.push("<p>You are amazing</p>");
} else {
buf.push("<p>Get it!</p>");
}
buf.push("<p>Jade is a terse and simple\ntemplating language with a\nstrong focus on performance\nand powerful features.</p></body></html>");
return buf.join("");
}
Data for interpolation
1
2
3
4
5
6
7
8
9
{
"title": "Jade Demo",
"usingJade": true,
"items":[
"item1",
"item2",
"item3"
]
}
Output Html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<!DOCTYPE html>
<html lang="en">
<head>
<title>=title </title>
</head>
<body>
<h1>Jade - node template engine</h1>
<div id="container" class="col"></div>
<ul>
<li>item1</li>
<li>item2</li>
<li>item3</li>
</ul>
<p>You are amazing</p>
<p>
Jade is a terse and simple
templating language with a
strong focus on performance
and powerful features.
</p>
</body>
</html>

Well, as you can see, the function has become quite complicated than before. It could become more complicated when extend, include or mixin introduced, you can trial it on your own.

Rendering

After the compiling, the rendering process is quite simple. Just invoking the compiled function, the return string is rendered html. The only thing need to mentioned here is the interpolation data should be passed to the template function as locals.

Using Jade as front-end template engine

Still now, you probably have got my idea. To use jade a front-end template, we can compose the template in jade. Later compile it into JavaScript file. And then we can invoke the JavaScript function in front-end to achieve dynamic client-side rendering!

Since Jade template has been precompiled at server side, so there is very little runtime effort when rendering the template at client-side. So it is a cheaper solution when you have lots of templates.

To compile the jade files into JavaScript instead of HTML, you need to pass -c or --client option to jade cli tool. Or calling jade.compile instead of jade.render while using JavaScript API.

Configure Grunt

Well, since Grunt is popular in node.js world. So we can also use Grunt to do the stuff for us.
Basically, use grunt for jade is straightforward. But it is a little bit tricky when you want to compile the back-end template into HTML as well as to compile the front-end template into JavaScripts.

I used a little trick to solve the issue. I follow the convention in Rails, that prefix the front-end template files with underscore.
So

1
2
3
4
5
/layouts/default.jade -> Layout file, extended by back-end/front-end templates, should not be compiled.
/views/settings/index.jade -> Back-end template, should be compiled into HTML
/views/settings/_item.jade -> Front-end template, should be compiled into JavaScript
Gruntfile.coffee
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
module.exports = (grunt) ->
grunt.initConfig
pkg: grunt.file.readJSON('package.json')
jade:
options:
pretty: true
compile:
expand: true
cwd: 'views'
src: ['**/*.jade', '!**/_*.jade']
dest: 'build/'
ext: '.html'
template:
options:
client: true
namespace: 'Templates'
expand: true
cwd: 'views'
src: ['**/_*.jade']
dest: 'build/'
ext: '.js'
grunt.loadNpmTasks('grunt-contrib-jade')

I distinguish the layouts and templates by file path. And distinguish the front-end/back-end templates by prefix. The filter !**/_*.jade excludes the front-end templates when compiling the back-end templates.

This approach should work fine in most cases, but if you are facing more complicated situation, and can’t be handled with this trick, try defining your own convention, and recognizing it with custom filter function to categorize them.

Node over Express - Autoload

Preface

This is the 2nd post of the Node over Express series (the previous one is Configuration). In this post, I’d like to discuss a famous pain point in Node.js.

Pain Point

There is well known Lisp joke:

A top hacker successfully stole the last 100 lines of a top secret program from the Pentagon. Because the program is written in Lisp, so the stolen code is just close brackets.

It is a joke that there are too many brackets in Lisp. In Node.js there is a similar issue that there are too many require. Open any node.js file, usually one could find several lines of require.

Due to the node’s sandbox model, the developer has to require resources time and time again in every files. It is not so exciting to write or read lines of meaningless require. And the worst, it could be a nightmare once a developer wishes to replace some library with another.

Rails Approaches

“Require hell” isn’t only for node.js, but also for Ruby apps. Rails has solved it gracefully, and the developer barely needs to require anything manually in Rails.

There are 2 kinds of dependencies in rails app, one is the external resource, another is the internal resource.

External Resources

External resources are classes encapsulated in ruby gems. In ruby application, developer describe the dependencies in Gemfile, and load them with Bundler. Some frameworks have already integrated with Bundler, such as Rails. When using them, developer doesn’t need to do anything manually, all the dependencies are required automatically. For others, use bundle execute to create the ruby runtime with all gems required.

Internal Resources

Internal Resources are the classes declared in the app, they could be models, the services or the controllers. Rails uses Railtie to require them automatically. The resource is loaded the first time it is used, the requiring process is “Lazy”. (In fact, this description isn’t that precise because Rails behaves differently in production environment. It loads all the classes during the launching for performance reason).

Autoload in Node.js

Rails avoids the “require-hell” with two “autoload” mechanisms. Although there are still debates about whether autoload is good or not. But at least, autoload frees the developer from the dull dependency management and increases the productivity of developers. Developers love autoload in most cases.

So to avoid “require-hell” in Node.js, I prefers autoload mechanism. But because there are significant differences in type system between Node.js and Ruby, we cannot copy the mechanism from ruby to node as is. Therefore before diving into the solution, we need to understand the differences first.

Node.js Module System

There are a number of similarities between Node.js and ruby; things in node.js usually have the equivalences in ruby. For example, package in node is similar to the gem in Ruby, npm equals to Gem and Bundler, package.json takes the responsibility of Gemfile and Gemfile.lock. The similarity enables porting autoload from ruby to node.

In some aspect, there are similarities between Node.js and Ruby, but there are also significant differences between them in some other aspects. One of the major differences is the type system and module sandbox in Node.js, which works in a quite different way to Ruby type system.

JavaScript isn’t a real OO language, so it doesn’t have real type system. All the types in JavaScript are actually functions, which are stored in local variables instead of in type system. Node.js loads files into different sandboxes, all the local variables are isolated between files to avoid “global leak”, a well-known deep-seated bad part of JavaScript. As a result, a Node.js developer needs to require used types again and again in every file.

In ruby, it is a lot better. With the help of the well designed type system, types are shared all over the runtime, a developer just needs to require the types not yet loaded.

So in node.js programs, there are many more require statements than in ruby. And due to the design of node.js and javascript, the issue is harder to be resolved.

Global Variable

In the browser, the JavaScript runtime other than node, global variables are very common. Global variable could be abused easily, which brings global leak to bad written JavaScript programs, and drives thousands of developers up to the wall. The JavaScript developers are scared of global leak so much so that they designed such a strict isolation model in node.js. But to my understanding, the isolation avoided global leaks effectively. But at the same time, it brought tens of require statements to every files, which is also not acceptable.

In fact, with the help of JSLint, CoffeScript and some other tools, developers can avoid global leak easily. And global sharing isn’t the source of evil. If abuse is avoided, I believes a reasonable level of global sharing could be useful and helpful. Actually Node.js have built-in a global sharing mechanism.

To share values across file, a special variable global is needed, which could be accessed in every file, and the value of which is also shared across files.

Besides sharing value around, global has another important feature: node treats global as default context, whose child you can refer to without explicitly identifying. So SomeType === global.SomeType.

With the help of global, we find a way to share types across files naturally.

JS Property

Rails’ autoload mechanism loads the classes lazily. It only loads the class when it is used for first time. It is a neat feature, and Rails achieve it by tracking the exception of “Uninitialized Constant”. To implement similar feature in Node.js, tracking exception is hardly feasible, so I choose a different approach, I use Property.

Property (Attribute in Ruby) enables method (function) being invoked as the field of an object is accessed. Property is a common feature among OO languages, but is a “new” feature to JavaScript. Property is a feature declared in ECMAScript 5 standard, which enables the developers to declare property on object by using the API Object.defineProperty. With the property, we’re able to hook the callback on the type variables, and require the types when the type is accessed. So the module won’t be required until it is used. On the other hand, node.js require function has built in the cache mechanism; it won’t load the file twice, instead it return the value from its cache.

With property, we make the autoload lazy!

My Implementation

To make autoload work, we need to create a magic host object to hold the type variables. In my implementation, I call the magic object Autoloader
we need to require a bootstrap script when the app starts, which is used to describe which types and how they should be required.

Bootstrap Script: initEnvironment.coffee
1
2
3
4
5
6
7
8
9
10
11
12
13
global.createAutoLoader = require('./services/AutoLoader')
global.createPathHelper = require('./services/PathHelper')
global.rootPath = createPathHelper(__dirname, true)
global.Configuration = require(rootPath.config('configuration'))
global.Services = createAutoLoader rootPath.services()
global.Routes = createAutoLoader rootPath.routes()
global.Records = createAutoLoader rootPath.records()
global.Models = createAutoLoader rootPath.models()
global.assets = {} # initialize this context for connect-assets helpers

The script sets-up the autoload hosts for all services, routes, records, models for my app. And we can reference the types as following:

Sample Usage
1
2
3
4
Records.User.findById uid, (err, user) ->
badge = new Models.Badget(badgeInfo)
user.addBadge badge
user.save()

In the initEnvironment.coffee script, there are 2 very important classes that are used:

  • AutoLoader: The class that works as the type variable hosts. All the magic happens here.
  • PathHelper: The class used to handle the path combination issue.

The detailed implementation is here:

Autoload
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
_ = require('lodash')
path = require('path')
fs = require('fs')
createPathHelper = require('./PathHelper')
createLoaderMethod = (host, name, fullName) ->
host.__names.push name
Object.defineProperty host, name,
get: ->
require(fullName)
class AutoLoader
constructor: (source) ->
@__names = []
for name, fullName of source
extName = path.extname fullName
createLoaderMethod(this, name, fullName) if require.extensions[extName]? or extName == ''
expandPath = (rootPath) ->
createPathHelper(rootPath).toPathObject()
buildSource = (items) ->
result = {}
for item in items
extName = path.extname(item)
name = path.basename(item, extName)
result[name] = item
result
createAutoLoader = (option) ->
pathObj = switch typeof(option)
when 'string'
expandPath(option)
when 'object'
if option instanceof Array
buildSource(option)
else
option
new AutoLoader(pathObj)
createAutoLoader.AutoLoader = AutoLoader
exports = module.exports = createAutoLoader

PathHelper
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
_ = require('lodash')
fs = require('fs')
path = require('path')
createPathHelper = (rootPath, isConsolidated) ->
rootPath = path.normalize rootPath
result = (args...) ->
return rootPath if args.length == 0
parts = _.flatten [rootPath, args]
path.join.apply(this, parts)
result.toPathObject = ->
self = result()
files = fs.readdirSync(self)
pathObj = {}
for file in files
fullName = path.join(self, file)
extName = path.extname(file)
name = path.basename(file, extName)
pathObj[name] = fullName
pathObj
result.consolidate = ->
pathObj = result.toPathObject()
for name, fullName of pathObj
stats = fs.statSync(fullName)
result[name] = createPathHelper(fullName) if stats.isDirectory()
result
if isConsolidated
result.consolidate()
else
result
exports = module.exports = createPathHelper

The code above are part of the Express over Node, to access the complete codebase, please check out the repo on github.


Besides of the content, I want to say thank you to my English teacher Marina Sarg, who helped me on this series of blog a lot. Without her, there won’t be this series of blogs. Marina, thank you very much.

Node over Express - Configuration

Preface

I have been working on Node.js related projects for quite a while, and have built apps with node both for the clients or personal projects, such as LiveHall, CiMonitor, etc. I have promised some one to share my experience on node. Today I’ll begin to work on this. This will be the first blog of the series.

Background

In this blog, I would like to talk about the configuration in node, which is common problem we need to solve in apps.

Problems related to configuration aren’t new, and there have been a dozens of mature solutions, but for Node.js apps, there is still something worth to be discussed.

Perhaps configuration could be treated as a kind of special data. Usually developers prefer to use data language to describe their configurations. Here are some examples:

  • .net and Java developer usually uses Xml to describe their configuration
  • Ruby developer prefers Yaml as the configuration language
  • JavaScript developer tend to use Json

Data languages are convenient, because developers can easily build DSL on it, then they describe the configuration with the DSL. But is the data language the best option available? Is it really suitable to be used in all scnearios?

Before we answer the questions, I would like to say something about the problem we’re facing. There is one common requirement to all kinds of configuration solutions, which is default values and overriding.

For example, as a Web app default, we use port 80; but in development environment, we prefer to use a port number over 1024, 3000 is a popular choice. That means we need to provide 80 as the default value of the port, but we wish to override the value with 3000 in the development environment.

For the languages I mentioned above, except for Yaml, Xml and Json, doesn’t provide native support of inheritance and overriding. It means we need to implement the mechanism by our own. Take Json as example, we might write the configuration in this way:

Sample Json configuration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"default": {
"port": 80,
"serveAssets": true
},
"development": {
"port": 3000,
"database": "mongodb://localhost/development"
},
"test": {
"database": "mongodb://localhost/test"
},
"production": {
"serveAssets": false,
"database": "mongodb://ds0123456.mongolab.com:43487/my_sample_app"
}
}

The previous Json snippet is a typical example of web app configuration; it has a default section to provide the default values for all environments. Three sections for specific environments. To apply it corecctly to our app, we need to load and parse the Json file to get all data first, then load the values of the default section, then override the value with the values from specific environment. In addition, we might wish to have the validation that yields error when the provided environment doesn’t exist.

This solution looks simple and seems to work, but when you try to apply this approach to your app in real life, you need to watch out some pitfalls.

Issue 1: Confidential Values

In the real world, values in configuration sometimes could be sensitive and need to be kept confidential. It could contain the credential to access your database, Or it could contain the key to decrypt the cookies. It may also contain private certificate that identifies and authenticates the app to other services. In these scenarios, you need to protect your configuration in order to avoid big trouble!

To solve the issue, you might think about adding new feature that enable you to to encrypt confidential values or to load it from a different safe source. To achieve it, you might need to add another layer of DSL which add more complexities to your app and make your code harder to debug or to maintain.

Issue 2: Dynamic Data

A solution to first issue, one could store the environment related but sensitive data in the environment variables. The solution is simple and works perfectly, so I highly recommend it. However, to do this means you need the capability to load the value not only from Json directly but also from the environment variables.

Sometimes, such as deploying your app to Heroku/Nojitsu, might give rise that make the case trickier. After deployed the app to Heroku/Nojitsu, the default values are provided in Json directly, and some of which need to be overrode with the values from environment variables or you need to do it vice versa. These tricky requirements might blow your mind and your code away easily. It causes complicated DSL design and hundreds lines of implementation, but just to load your configuration properly. Obviously it is not a good idea.

Issue 3: Complicated Inheritance Relationship

Scared about above cases? No, then how about complicated inheritance relationship between environments?

In some big and complicated web apps, there might be more than 3 basic environments, such as:

  • Development: for developers to develop the app locally
  • Test: for developers to run unit or function test locally, such as mocha tests
  • Regression: for developers or QAs to run regression tests, such as cucumber tests
  • Integration: for QAs or Ops to test the integration with other apps
  • Staging: for ops and QAs to test the app in production like environment before it really goes live
  • Production: the environment serves your real users

When try to write configurations for these environments, one might find there are only a few differences between environments. To make life easier, to avoid the redundancy, introducing the inheritance between configurations might be a good idea.

As the consequence, the whole configuration becomes environments with complex inheritance relationship. And to support this kind of configuration inheritance, a more complex DSL and hundreds lines of codes are needed.

Some Comments

My assumption above seems to be a little too complex. From some people, it might be the “WORST CASE SCENERIO” and hard to come by. But according to my experience, it is very common when building real web app with node. So if to solve it isn’t too hard, it could be better to consider it seriously and solve it gracefully.

Ruby developer might think they’re lucky because Yaml supports inheritance natively. But confidential data and dynamic data still troubles.

My Solution

After learnt a number of painful lessons, I figured out a simple but working solution: Configuration as Code - describe the configuration with the same language that the business logic is described!

Configuration as code isn’t a new concept, but it is extremely handy when you use it in node applications! Let me explain why and how it works:

To protect the confidential configuration values, one should store them with environment variables, which are only accessible in the specific server.
Then one can load these values from the environment variables as dynamically values.

To do it in a data language such as Xml, Json or Yaml could be hard, but it will become as easy as taking a candy from a baby if it is done in the programming language that application applied/used, such as ruby or javascript.

To the configuration inheritance, OO languages have already provided very handy inheritance mechanism. Why do we need to invent one? Why not just use it? To the value overriding, OO programming tells us that it is called polymorphism. The only difference here from the typical scenario is that we override the values instead of the behaviors. But it isn’t an issue, because the value could be the result of the behavior, right?

Now I assume that everyone got a pretty good idea of what I am saying. If that is the case, then the below code should be able to be understood quite clearly, which is a standard Node.js file written in coffee script:

Configuration as Code Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
process.env.NODE_ENV = process.env.NODE_ENV?.toLowerCase() ? 'development'
class Config
port: 80
cookieSecret: '!J@IOH$!BFBEI#KLjfelajf792fjdksi23989HKHD&&#^@'
class Config.development extends Config
port: 3009
redis:
uri: 'redis://localhost:6379'
mongo:
uri: 'mongodb://localhost'
class Config.test extends Config.development
class Config.heroku extends Config
cookieSecret: process.env.COOKIE_SECRET
redis:
uri: process.env.REDISCLOUD_URL
mongo:
uri: process.env.MONGOLAB_URI
module.exports = new Config[process.env.NODE_ENV]()

See, with the approach, one can describe the configuration easily and clearly in a few lines of code, but with built-in loading dynamical values capability and configuration inheritance and overriding capability.

In fact, with my suggestions, it might work better than expected! Here are the additional free benefits:

  1. Only one configuration is needed when the app deployed to the cloud. Because all the host specific configurations are usually provided via the environment variables in Paas.
  2. Have some simple and straightforward logic in the configuration, which could be very useful, especially if there is some naming convention in the configuration. But complicated or tricky logic should be strictly avoided, because it is hurts the readability and maintainability.
  3. Easy to write tests for configurations, to ensure the values are properly set. It could be very handy when there are complicated inheritance relationships between configurations, or have some simple logic in your configuration.
  4. Avoid to instantiate and execute the code that isn’t related to the current environment, which could be helpful to avoid overhead to instantiate unused expensive resources or to avoid errors caused because of incompatibility between environments.
  5. Get runtime error when the configuration for the environment doesn’t exist.

Besides of the content, I want to say thank you to my English teacher Marina Sarg, who helped me on this series of blog a lot. Without her, there won’t be this series of blogs. Marina, thank you very much.

Manage configuration in Rails way on node.js by using inheritance

Application is usually required to run in different environments. To manage the differences between the environments, we usually introduce the concept of Environment Specific Configuration.
In Rails application, by default, Rails have provided 3 different environments, they are the well known, development, test and production.
And we can use the environment variable RAILS_ENV to tell Rails which environment to be loaded, if the RAILS_ENV is not provided, Rails will load the app in development env by default.

This approach is very convenient, so we want to apply it to anywhere. But in node.js, Express doesn’t provide any configuration management. So we need to built the feature by ourselves.

The environment management usually provide the following functionalities:

  • Allow us to provide some configuration values as the default, which will be loaded in all environments, usually we call it common.
  • Specific configuration will be loaded according to the environment variable, and will override some values in the common if necessary.

Rails uses YAML to hold these configurations, which is concise but powerful enough for this purpose. And YAML provided inheritance mechanism by default, so you can reduce the duplication by using inheritance.

Inheritance in Rails YAML Configuration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
development: &defaults
adapter: mysql
encoding: utf8
database: sample_app_development
username: root
test:
<<: *defaults
database: sample_app_test
cucumber:
<<: *defaults
database: sample_app_cucumber
production:
<<: *defaults
database: sample_app_production
username: sample_app
password: secret_word
host: ec2-10-18-1-115.us-west-2.compute.amazonaws.com

In express and node.js, if we follow the same approach, comparing to YAML, we prefer JSON, which is supported natively by Javascript.
But to me, JSON isn’t the best option, there are some disadvantages of JSON:

  • JSON Syntax is not concise enough
  • Matching the brackets and appending commas to the line end are distractions
  • Lack of flexility

As an answer to these issues, I chose coffee-script instead of JSON.
Coffee is concise. And similar to YAML, coffee uses indention to indicate the nested level. And coffee is executable, which provides a lot of flexibilities to the configuration. So we can implement a Domain Specific Language form

To do it, we need to solve 4 problems:

  1. Allow dev to declare default configuration.
  2. Load specific configuration besides of default one.
  3. Specific configuration can overrides the values in the default one.
  4. Code is concise, clean and reading-friendly.

Inspired by the YAML solution, I work out my first solution:

Configuration in coffee script
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
_ = require('underscore')
config = {}
config['common'] =
adapter: "mysql"
encoding: "utf8"
database: "sample_app_development"
username: "root"
config['development'] = {}
config['test] =
database:"sample_app_test"
config['cucumber'] =
database:"sample_app_cucumber"
config['production'] =
database:"sample_app_production"
username:"sample_app"
password:"secret_word"
host:"ec2-10-18-1-115.us-west-2.compute.amazonaws.com"
_.extend exports, config.common
specificConfig = config[process.env.NODE_ENV ?'development']
if specificConfig?
_.extend exports, specificConfig

YAML is data centric language, so its inheritance is more like “mixin” another piece of data. So I uses underscore to help me to mixin the specific configuration over the default one, which overrides the overlapped values.

But if we jump out of the YAML’s box, let us think about the Javascript itself, Javascript is a prototype language, which means it had already provide an overriding mechanism natively. Each object inherits and overrides the value from its prototype.
So I worked out the 2nd solution:

Prototype based Configuration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
config = {}
config['common'] =
adapter: "mysql"
encoding: "utf8"
database: "sample_app_development"
username: "root"
config['development'] = {}
config['development'].__proto__ = config['common']
config['test] =
__proto__: config['common']
database:"sample_app_test"
config['cucumber'] =
__proto__: config['test']
database:"sample_app_cucumber"
config['production'] =
__proto__: config['common']
database:"sample_app_production"
username:"sample_app"
password:"secret_word"
host:"ec2-10-18-1-115.us-west-2.compute.amazonaws.com"
process.env.NODE_ENV = process.env.NODE_ENV?.toLowerCase() ?'development'
module.exports = config[process.env.NODE_ENV]

This approach works, but looks kind of ugly. Since we’re using coffee, which provides the syntax sugar for class and class inheritance.
So we have the 3rd version:

Class based configuration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
process.env.NODE_ENV = process.env.NODE_ENV?.toLowerCase() ? 'development'
class Config
adapter: "mysql"
encoding: "utf8"
database: "sample_app_development"
username: "root"
class Config.development extends Config
class Config.test extends Config
database: "sample_app_test"
class Config.cucumber extends Config
database: "sample_app_cucumber"
class Config.common extends Config
database: "sample_app_production"
username: "sample_app"
password: "secret_word"
host: "ec2-10-18-1-115.us-west-2.compute.amazonaws.com"
module.exports = new Config[process.env.NODE_ENV]()

Now the code looks clean, and we can improve it a step further if necessary. We can try to separate the configurations into files, and required by the file name:

Class based configuration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# config/config.coffee
configName = process.env.NODE_ENV = process.env.NODE_ENV?.toLowerCase() ? 'development'
SpecificConfig = requrie("./envs/#{configName}")
module.exports = new SpecificConfig()
# config/envs/commmon.coffee
class Common
adapter: "mysql"
encoding: "utf8"
database: "sample_app_development"
username: "root"
module.exports = Common
# config/envs/development.coffee
Common = require('./common')
class Development extends Common
module.exports = Development
# config/envs/test.coffee
Common = require('./common')
class Test extends Common
database: "sample_app_test"
module.exports = Test
# config/envs/cucumber.coffee
Test = require('./common')
class Cucumber extends Test
database: "sample_app_cucumber"
module.exports = Cucumber
# config/envs/production.coffee
Common = require('./common')
class Production extends Common
database: "sample_app_production"
username: "sample_app"
password: "secret_word"
host: "ec2-10-18-1-115.us-west-2.compute.amazonaws.com"
module.exports = Production

Pitfall in node crypto and base64 encoding

Today, we found there is a huge pitfall in node.js crypto module! Decipher has potential problem when processing Base64 encoding.

We’re building RESTful web service based on Node.js, which talks to some other services implemented with Ruby.

Ruby

In ruby, we use the default Base64 class to handle Base64 encoding.

Base64#encode64 has a very interesting feature:
It add line break (\n) to output every 60 characters. This format make the output look pretty and be friendly for human reading:

Ruby Base64 Block
1
2
3
4
5
6
7
MSwyLDMsNCw1LDYsNyw4LDksMTAsMTEsMTIsMTMsMTQsMTUsMTYsMTcsMTgs
MTksMjAsMjEsMjIsMjMsMjQsMjUsMjYsMjcsMjgsMjksMzAsMzEsMzIsMzMs
MzQsMzUsMzYsMzcsMzgsMzksNDAsNDEsNDIsNDMsNDQsNDUsNDYsNDcsNDgs
NDksNTAsNTEsNTIsNTMsNTQsNTUsNTYsNTcsNTgsNTksNjAsNjEsNjIsNjMs
NjQsNjUsNjYsNjcsNjgsNjksNzAsNzEsNzIsNzMsNzQsNzUsNzYsNzcsNzgs
NzksODAsODEsODIsODMsODQsODUsODYsODcsODgsODksOTAsOTEsOTIsOTMs
OTQsOTUsOTYsOTcsOTgsOTksMTAw

The Base64#decode64 class ignores the line break (\n) when parsing the base64 encoded data, so the line break won’t pollute the data.

Node.js

Node.js take Base64 as one of the 5 standard encodings (ascii, utf8, base64, binary, hex). Ideally the data or string can be transcoded between these 4 encodings without data loss.

The Buffer class is the simplest way to transcode the data:

Base64 Encoder in Node.js
1
2
3
4
5
6
7
8
Base64 =
encode64: (text) ->
new Buffer(text, 'utf8').toString('base64')
decode64: (base64) ->
new Buffer(base64. 'base64').toString('utf8')

Although encode64 function in node.js won’t add line break to the output, but the decode64 function does ignore the line break when parsing the data. It keeps the consistent behavior with ruby Base64 class, so we can use this decode64 function to decode the data from ruby.

Since base64 is one of the standard encodings, and some of the node.js API does allow set encoding for input and output. So ideally, we can complete the base64 encoding and decoding during processing the data.
It seems Node.js is more convenient comparing to Ruby when dealing with Base64.

e.g. We can combine reading file and base64 encoding the content into one operation by setting the encoding to readFileSync API.

Write and Read string as Base64
1
2
3
4
5
6
fs = require('fs')
fileName = './binary.dat' # this file contains binary data
base64 = fs.readFileSync(fileName, 'base64') # file content has been base64 encoded

It looks like we can always use this trick to avoid manually base64 encoding and decoding when the API has encoding parameter! But actually it is not true! There is a BIG pitfall here!

In our real case, we uses crypto module to decrypt the the JSON document that encrypted and base64 encoded by Ruby:

Base64 Deocde and Decrypt
1
2
3
4
5
6
7
8
9
10
11
crypto = require('crypto')
parse = (data, algorithm, key, iv) ->
decipher = crypto.createDecipheriv(algorithm, key, iv)
decrypted = decipher.update(data, 'base64', 'utf8') # Set input encoding to 'base64' to ask API to base64 decode the input before decryption
decrypted += dechiper.final('utf8')
JSON.parse(decrypted)
Manually Base64 Decoding
1
2
3
4
5
6
7
8
9
10
11
12
13
crypto = require('crypto')
parse = (data, algorithm, key, iv) ->
decipher = crypto.createDecipheriv(algorithm, key, iv)
binary = new Buffer(data,'base64') # Manually Base64 Decode
decrypted = decipher.update(binary, 'binary', 'utf8') # Set input encoding to 'binary'
decrypted += dechiper.final('utf8')
JSON.parse(decrypted)

The previous 2 implementations are very similar except the second one base64 decoded the data manually by using Buffer. Ideally they should be equivalent in behavior. But in fact, they are NOT equivalent!

The previous implementation throws “TypeError: DecipherFinal fail”.
And the reason is that the shortcut way doesn’t ignore the line break, but Buffer does!!! So in the previous implementation, the data is polluted by the line break!

Conclusion

Be careful, when you try to ask the API to base64 decode the data by setting the encoding argument to ‘base64’. It has inconsistent behavior comparing to Buffer class.

I’m not sure whether it is a node.js bug, or it is as is by design. But it is indeed a pitfall that hides so deep. And usually is extremely hard to figure out. Since encrypted binary is hard to human to read, and debugging between 2 languages are also kind of hard!

Pitfall in fs.watch: fs.watch fails when switch from TextMate to RubyMine

I’m writing a cake script that helps me to build the growlStyle bundle.
And I wish to my script can watch the change of the source file, and rebuild when file changed.
So I wrote the code as following:

Watching code change
1
2
3
4
5
files = fs.readdirSync getLocalPath('source')
for file in files
fs.watch file, ->
console.log "File changed, rebuilding..."
build()

The code works when I edits the code with TextMate, but fails when I uses RubyMine!

Super weird!

After half an hour debugging, I found the following interesting phenomena:

  • Given I’m using TextMate
    When I changed the file 1st time
    Then a ‘change’ event is captured
    When I changed the file 2nd time
    Then a ‘change’ event is captured
    When I changed the file 3rd time
    Then a ‘change’ event is captured

  • Given I’m using RubyMine
    When I change the file 1st time
    Then a ‘rename’ event is captured
    When I changed the file 2nd time
    Then no event is captured
    When I changed the file 3rd time
    Then no event is captured

From the result, we can easily find out that the script fails is because “change” event is not triggered as expected when using RubyMine.
And the reason of RubyMine’s “wried” behavior might be that RubyMine what to keep the file integrity so they “write” the file in an atomic way as following:

  1. RubyMine write the file content to a temp file
  2. RubyMine remove the original file
  3. RubyMine rename the temp file to original file

This workflow ensures that the content is fully written or not written. So in a word, RubyMine does not actually write the file, it actually replace the original file with another one, and the original one is removed or stored to some special location.

And on the other hand, according to Node.js document of fs.watch, node uses kqueue on Mac to implement this behavior.
And according to kqueue document, it uses file descriptor as identifier, and file descriptor is bound to the file itself rather than its path. So when the file is renamed, we will keep to track the file with new name. That’s why we lost the status of the file after the first ‘rename’ event.
And in our case, we actually wish to identify the file by file path rather than by ‘file descriptor’.

To solve this issue, we have 2 potential solutions:

  1. Also apply fs.watch to the directory that holds the source file besides of the source file itself.
    When the file is directly updated as TextMate does, the watcher on the file will raise the “change” event.
    When the file is atomically updated as RubyMine does, the watcher on the directory will raise 2 “rename” events.
    So theoretically, we could track the change of the file no matter how it is updated.

  2. Use the old fashioned fs.watchFile function, which tracks the change the with fs.stat.
    Comparing to fs.watch, fs.watchFile is less efficient because its polling mechanism, but it does track the file with file name rather than file descriptor. So it won’t be charmed by the fancy atomic writing.

Obviously, the 1st solution looks better than the 2nd one, because its uses the event rather than old-fashioned polling. Even document of fs.watchFile also says that try to use fs.watch instead of fs.watchFile when possible.

But actually it is kind of painful to write such code, since ‘rename’ event on the directory is not only triggered by the file update, it also can be triggered by adding file and removing file.

And the ‘rename’ event will be triggered twice when updating the file. Obviously we cannot rebuild the code when the first ‘rename’ event fired, or the build might fail because of the absence of the file. And we will trigger the build twice in a really short period of time.

So in fact, to solve our problem, the polling fs.watchFile is more useful, its old-fashion protected itself being charmed by the ‘fancy’ atomic file writing.

So finally, we got the following code:

fs.watchFile
1
2
3
4
5
6
7
8
9
10
11
runInWatch = (options, task) ->
action(options) unless options.watch
console.info "INFO: Watching..."
files = fs.readdirSync getLocalPath('source')
console.log '"Tracking files:'
for file in files
console.log "#{file}"
fs.watchFile getLocalPath('source', file), (current, previous) ->
unless current.mtime == previous.mtime
console.log "#{file} Changed..."
task(options)

HINT: Be careful about the differens of fs.watch and fs.watchFile:

  • The meaning of filename parameter
    The filename parameter of fs.watch is path sensitive, which accept ‘source.jade’ or ‘/path/to/source.jade’ The filename parameter of fs.watchFile isn’t path sensitive, which only accept ‘/path/to/source.jade’
  • Callback is invocation condition
    fs.watch invokes callback when the file is renamed or changed fs.watchFile invokes callback when the file is accessed, including write and read.
    So you need to compare the mtime of the fstat, file is changed when mtime changed.
  • Response time
    fs.watch uses event, which captures the “change” almost in realtime. fs.watchFile uses ‘polling’, which might differed for a period of time. By default, the maximum could be 5s.

exports vs module.exports in node.js

I was confused about how require function works in node.js for a long time. I found when I require a module, sometimes I can get the object I want, but sometimes, I don’t I just got an empty object, which give an imagination that we cannot export the object by assigning it to exports, but it seems somehow we can export a function by assignment.

Today, I re-read the document again, and I finally make clear that I misunderstood the “require” mechanism and how I did that.

I clearly remember this sentence in the doc

In particular module.exports is the same as the exports object.

So I believed that the exports is just a shortcut alias to module.exports, we can use one instead of another without worrying about the differences between them two.
But this understanding is proved to be wrong. exports and module.exports are different.

Today I found this in the doc:

The exports object is created by the Module system. Sometimes this is not acceptable, many want their module to be an instance of some class. To do this assign the desired export object to module.exports.

So it says that module.exports is different from exports. And it you exports something by assignment, you need to assign it to module.exports.

Let’s try to understand these sentences deeper by code examples.

In the saying

The exports object is created by the Module system.

The word “created by” actually means when node.js try to load a javascript file, before executing any line of code in your file, the module system executes the following code first for you:

1
var exports = module.exports

So the actual interface in node.js’s module system is module object. the actual exported object is module.exports not exports.
And the exports is just a normal variable, and there is not “magic” in it. So if you assign something to it, it is replaced absolutely.

That’s why I failed to get the exported object I want when I assign the it to exports variable.

So to export some variable as a whole, we should always assign it to module.exports.
And at same time, if there is no good excuse, we’d better to keep the convention that exports is the shortcut alias to module.exports. So we should also assign the module.exports to exports.

As a conclusion, to export something in node.js by assignment, we should always follow the following pattern:

1
2
3
exports = module.exports = {
...
}

A way to expose singleton object and its constructor in node.js

In Node.js world, we usually encapsulate a service into a module, which means the module need to export the façade of the service. In most case the service could be a singleton, all apps use the same service.

But in some rare cases, people might would like to create several instances of the service ,which means the module also need to also export the service constructor.

A very natural idea is to export the default service, and expose the constructor as a method of the default instance. So we could consume the service in this way:

Ideal Usage
1
2
var defaultService = require('service');
var anotherService = service.newService();

So we need to write the module in this way:

Ideal Export
1
2
3
4
5
function Service() { }
module.exports = new Service();
moudle.exports.newService = Service;

But for some reason, node.js doesn’t allow module to expose object by assigning the a object to module.exports.
To export a whole object, it is required to copy all the members of the object to moudle.exports, which drives out all kinds of tricky code.

I misunderstood how node.js require works, and HERE is the right understanding. Even I misunderstood the mechanism, but the conclusion of this post is still correct. To export function is still a more convenient way to export both default instance and the constructor.

And things can become much worse when there are backward reference from the object property to itself.
So to solve this problem gracefully, we need to change our mind.
Since it is proved that it is tricky to export a object, can we try to expose the constructor instead?

Then answer is yes. And Node.js does allow we to assign a function to the module.exports to exports the function.
So we got this code.

Export Constructor
1
2
function Service() { }
module.exports = Service;

So we can use create service instance in this way:

Create Service
1
2
var Service = require('service');
var aService = new Service();

As you see, since the one we exported is constructor so we need to create a instance manually before we can use it. Another problem is that we lost the shared instance between module users, and it is a common requirement to share the same service instance between users.

How to solve this problem? Since as we know, function is also kind of object in javascript, so we can kind of add a member to the constructor called default, which holds the shared instance of the service.

This solution works but not in a graceful way! A crazy but fancy idea is that can we transform the constructor itself into kind of singleton instance??!! Which means you can do this:

Export Singleton
1
2
3
4
var defaultService = require('service');
defaultService.foo();
var anotherService = service();
anotherService.foo();

The code style looks familiar? Yes, jQuery, and many other well-designed js libraries are designed to work in this way.
So our idea is kind of feasible but how?

Great thank to Javascript’s prototype system (or maybe SELF’s prototype system is more accurate.), we can simply make a service instance to be the constructor’s prototype.

Actual Export
1
2
3
function Service() { }
module.exports = Service;
Service.__proto__ = new Serivce;

Sounds crazy, but works, and gracefully! That’s the beauty of Javascript.